Waveform Datasets
Dummy datasets
The dummy datasets mostly exist for testing purposes. They are very small datasets that can be used, for example, to demonstrate certain functionality.
- class ChunkedDummyDataset(**kwargs)[source]
Bases:
WaveformBenchmarkDatasetA chunked dummy dataset visualizing the implementation of custom datasets with chunking
- class DummyDataset(**kwargs)[source]
Bases:
WaveformBenchmarkDatasetA dummy dataset visualizing the implementation of custom datasets
AQ2009 dataset
- class AQ2009Counts(**kwargs)[source]
Bases:
WaveformBenchmarkDatasetAQ2009 aftershocks digital units dataset from Bagagli et al. (2023)
- class AQ2009GM(**kwargs)[source]
Bases:
WaveformBenchmarkDatasetAQ2009 aftershocks ground motion dataset from Bagagli et al. (2023)
Bohemia dataset
- class BohemiaSaxony(eida_token=None, **kwargs)[source]
Bases:
BenchmarkDatasetRegional benchmark dataset of waveform data and metadata for the North-West Bohemia and Saxony region in Germany/Czech Republic.
Warning
This dataset contains restricted data from the West Bohemia Local Seismic Network (WEBNET). To compile the full dataset, you will need to provide an EIDA token. Please see the WEBNET site for more information.
- async get_station_waveform_data(event, picks, inventory, sampling_rate=100.0, time_before=60.0, time_after=60.0)[source]
- Return type:
tuple[EventParameters,TraceParameters,ndarray]
CEED dataset
- class CEED(**kwargs)[source]
Bases:
WaveformBenchmarkDatasetThe CEED dataset for California from Zhu et al. (2025)
CREW dataset
- class CREW(**kwargs)[source]
Bases:
WaveformBenchmarkDatasetCurated Regional Earthquake Waveforms (CREW dataset)
CWA dataset
- class CWA(**kwargs)[source]
Bases:
CWABaseCWA dataset - Events and traces.
- src_repo_name = 'NLPLabNTUST/Merged-CWA'
- class CWABase(**kwargs)[source]
Bases:
WaveformBenchmarkDataset,ABCAn abstract class for downloading datasets. The CWA dataset comprises data from two seismographic networks: CWASN and TSMIP. The dataset spans from 2011 to 2021 and primarily includes P and S wave arrivals. Additionally, a subset of noise data is provided.
- classmethod available_chunks(force=False, wait_for_file=False)[source]
Returns a list of available chunks. Queries both the local cache and the remote root.
- chunk2file = {'_2011': 'merge2011_2014.tar.gz', '_2012': 'merge2011_2014.tar.gz', '_2013': 'merge2011_2014.tar.gz', '_2014': 'merge2011_2014.tar.gz', '_2015': 'merge2015_2018.tar.gz', '_2016': 'merge2015_2018.tar.gz', '_2017': 'merge2015_2018.tar.gz', '_2018': 'merge2015_2018.tar.gz', '_2019': 'merge2019_2021.tar.gz', '_2020': 'merge2019_2021.tar.gz', '_2021': 'merge2019_2021.tar.gz', '_noise1': 'noise_chunk1.tar.gz', '_noise2': 'noise_chunk2.tar.gz'}
- citation = 'Kuan-Wei Tang, Kuan-Yu Chen, Da-Yi Chen, Tai-Lin Chin, and Ting-Yu Hsu. (2024)The CWA Benchmark: A Seismic Dataset from Taiwan for Seismic Research.Seismological Research Letters 2024.doi: https://doi.org/10.1785/0220230393'
- src_repo_name = None
ETHZ dataset
- class ETHZ(**kwargs)[source]
Bases:
WaveformBenchmarkDatasetRegional benchmark dataset of publicly available waveform data & corresponding metadata in Swiss Seismological Service (SED) archive. Contains data from (2013-2020). A pre-compiled version of the benchmark dataset in compatible SeisBench format is available for download from remote root. In case of download issues, benchmark dataset is downloaded directly from source via FDSN client and converted SeisBench format.
- property client
GEOFON dataset
- class GEOFON(**kwargs)[source]
Bases:
WaveformBenchmarkDatasetGEOFON dataset consisting of both regional and teleseismic picks. Mostly contains P arrivals, but a few S arrivals are annotated as well. Contains data from 2010-2013. The dataset will be downloaded from the SeisBench repository on first usage.
INSTANCE dataset
- class InstanceCounts(**kwargs)[source]
Bases:
InstanceTypeDatasetINSTANCE dataset - Events with waveforms in counts
- class InstanceCountsCombined(**kwargs)[source]
Bases:
MultiWaveformDatasetConvenience class to jointly load
InstanceCountsandInstanceNoise.- Parameters:
kwargs – Passed to the constructors of both
InstanceCountsandInstanceNoise
Iquique dataset
- class Iquique(**kwargs)[source]
Bases:
WaveformBenchmarkDatasetIquique Benchmark Dataset of local events used for training in Woollam (2019) study (see citation).
Splits are set using standard random sampling of
seisbench.data.base.BenchmarkDataset.
ISC-EHB dataset
- class ISC_EHB_DepthPhases(**kwargs)[source]
Bases:
WaveformBenchmarkDatasetDataset of depth phase picks from the ISC-EHB bulletin.
LenDB dataset
- class LenDB(**kwargs)[source]
Bases:
WaveformBenchmarkDatasetLen-DB dataset from Magrini et al.
LFE stack datasets
- class LFEStacksCascadiaBostock2015(component_order='Z12', **kwargs)[source]
Bases:
WaveformBenchmarkDatasetLow-frequency earthquake stacks underneath Vancouver Island, Cascadia, Canada/USA based on the catalog by Bostock et al (2015). Compiled to SeisBench format by Münchmeyer et al (2024).
- class LFEStacksMexicoFrank2014(component_order='Z12', **kwargs)[source]
Bases:
WaveformBenchmarkDatasetLow-frequency earthquake stacks underneath Guerrero, Mexico based on the catalog by Frank et al (2014). Compiled to SeisBench format by Münchmeyer et al (2024).
- class LFEStacksSanAndreasShelly2017(component_order='Z12', **kwargs)[source]
Bases:
WaveformBenchmarkDatasetLow-frequency earthquake stacks on the San Andreas Fault, California, USA based on the catalog by Shelly (2014). Compiled to SeisBench format by Münchmeyer et al (2024).
NEIC datasets
- class MLAAPDE(**kwargs)[source]
Bases:
WaveformBenchmarkDatasetMLAAPDE dataset from Cole et al. (2023)
Note that the SeisBench version is not identical to the precompiled version distributed directly through USGS but uses a different data selection. In addition, custom versions of MLAAPDE can be compiled with the software provided by the original authors. These datasets can be exported in SeisBench format.
- class NEIC(**kwargs)[source]
Bases:
WaveformBenchmarkDatasetNEIC dataset from Yeck and Patton
OBS datasets
- class OBS(component_order='Z12H', **kwargs)[source]
Bases:
WaveformBenchmarkDatasetOBS Benchmark Dataset of local events
Default component order is ‘Z12H’. You can easily omit one component like, e.g., hydrophone by explicitly passing parameter ‘component_order=”Z12”’. This way, the dataset can be input to land station pickers that use only 3 components.
- class OBST2024(**kwargs)[source]
Bases:
WaveformBenchmarkDatasetThe OBS dataset from Niksejel & Zhang (2024)
PiSDL dataset
- class PiSDL(**kwargs)[source]
Bases:
WaveformBenchmarkDatasetA dataset for induced seismicity from different regions in Canada, Switzerland, Germany, and France. Induced seismic events are caused by hydraulic-fracturing based fluid injection, geothermal power plants, and coal mine flooding. In addition, the dataset contains all available low magnitude events (M_L <= 2) from the Swiss Seismological Service (SED) between 2009 and 2023.
PNW datasets
- class PNW(**kwargs)[source]
Bases:
WaveformBenchmarkDatasetPNW ComCat dataset from Ni et al. (2023)
- class PNWAccelerometers(**kwargs)[source]
Bases:
WaveformBenchmarkDatasetPNW Accelerometers dataset from Ni et al. (2023)
- class PNWExotic(**kwargs)[source]
Bases:
WaveformBenchmarkDatasetPNW Exotic dataset from Ni et al. (2023)
- class PNWNoise(**kwargs)[source]
Bases:
WaveformBenchmarkDatasetPNW Noise dataset from Ni et al. (2023)
Southern California datasets
- class Meier2019JGR(**kwargs)[source]
Bases:
WaveformBenchmarkDatasetSouthern californian part of the dataset from Meier et al. (2019) Note that due to the missing Japanese data, there is a massive overrepresentation of noise samples.
Meier, M.-A., Ross, Z. E., Ramachandran, A., Balakrishna, A., Nair, S., Kundzicz, P., et al. (2019). Reliable real‐time seismic signal/noise discrimination with machine learning. Journal of Geophysical Research: Solid Earth, 124. https://doi.org/10.1029/2018JB016661
- class Ross2018GPD(**kwargs)[source]
Bases:
WaveformBenchmarkDatasetPick dataset belonging to the publication: Zachary E. Ross, Men‐Andrin Meier, Egill Hauksson, Thomas H. Heaton; Generalized Seismic Phase Detection with Deep Learning. Bulletin of the Seismological Society of America 2018;; 108 (5A): 2894–2901. https://doi.org/10.1785/0120180080
- class Ross2018JGRFM(component_order='Z', **kwargs)[source]
Bases:
WaveformBenchmarkDatasetFirst motion polarity dataset belonging to the publication: Ross, Z. E., Meier, M.‐A., & Hauksson, E. (2018). P wave arrival picking and first‐motion polarity determination with deep learning. Journal of Geophysical Research: Solid Earth, 123, 5120– 5129. https://doi.org/10.1029/2017JB015251
Note that this dataset contains picks as well.
Warning
This dataset only contains traces for the Z component. It therefore ignores the default SeisBench the component_order.
- class Ross2018JGRPick(component_order='Z', **kwargs)[source]
Bases:
WaveformBenchmarkDatasetPick dataset belonging to the publication: Ross, Z. E., Meier, M.‐A., & Hauksson, E. (2018). P wave arrival picking and first‐motion polarity determination with deep learning. Journal of Geophysical Research: Solid Earth, 123, 5120– 5129. https://doi.org/10.1029/2017JB015251
Note that this dataset contains polarities as well.
Warning
This dataset only contains traces for the Z component. It therefore ignores the default SeisBench the component_order.
- class SCEDC(**kwargs)[source]
Bases:
WaveformBenchmarkDatasetSCEDC waveform archive (2000-2020).
Splits are set using standard random sampling of :py:class: BenchmarkDataset.
STEAD dataset
- class STEAD(**kwargs)[source]
Bases:
WaveformBenchmarkDatasetSTEAD dataset from Mousavi et al.
Using the train/test split from the EQTransformer Github repository train/dev split defined in SeisBench
TXED dataset
- class TXED(**kwargs)[source]
Bases:
WaveformBenchmarkDatasetTEXD dataset from Chen et al.
train/dev/test split defined in SeisBench.
VCSEIS dataset
- class VCSEIS(**kwargs)[source]
Bases:
WaveformBenchmarkDatasetA data set of seismic waveforms from various volcanic regions: Alaska, Hawaii, Northern California, Cascade volcanoes.