Waveform Datasets

Dummy datasets

The dummy datasets mostly exist for testing purposes. They are very small datasets that can be used, for example, to demonstrate certain functionality.

class ChunkedDummyDataset(**kwargs)[source]

Bases: WaveformBenchmarkDataset

A chunked dummy dataset visualizing the implementation of custom datasets with chunking

class DummyDataset(**kwargs)[source]

Bases: WaveformBenchmarkDataset

A dummy dataset visualizing the implementation of custom datasets

AQ2009 dataset

class AQ2009Counts(**kwargs)[source]

Bases: WaveformBenchmarkDataset

AQ2009 aftershocks digital units dataset from Bagagli et al. (2023)

class AQ2009GM(**kwargs)[source]

Bases: WaveformBenchmarkDataset

AQ2009 aftershocks ground motion dataset from Bagagli et al. (2023)

Bohemia dataset

class BohemiaSaxony(eida_token=None, **kwargs)[source]

Bases: BenchmarkDataset

Regional benchmark dataset of waveform data and metadata for the North-West Bohemia and Saxony region in Germany/Czech Republic.

Warning

This dataset contains restricted data from the West Bohemia Local Seismic Network (WEBNET). To compile the full dataset, you will need to provide an EIDA token. Please see the WEBNET site for more information.

get_catalog()[source]
Return type:

Catalog

get_inventory(catalog, force_download=False)[source]
Return type:

Inventory

async get_station_waveform_data(event, picks, inventory, sampling_rate=100.0, time_before=60.0, time_after=60.0)[source]
Return type:

tuple[EventParameters, TraceParameters, ndarray]

CEED dataset

class CEED(**kwargs)[source]

Bases: WaveformBenchmarkDataset

The CEED dataset for California from Zhu et al. (2025)

classmethod available_chunks(force=False, wait_for_file=False)[source]

Returns a list of available chunks. Queries both the local cache and the remote root.

CREW dataset

class CREW(**kwargs)[source]

Bases: WaveformBenchmarkDataset

Curated Regional Earthquake Waveforms (CREW dataset)

static available_chunks(*args, **kwargs)[source]

Returns a list of available chunks. Queries both the local cache and the remote root.

CWA dataset

class CWA(**kwargs)[source]

Bases: CWABase

CWA dataset - Events and traces.

src_repo_name = 'NLPLabNTUST/Merged-CWA'
class CWABase(**kwargs)[source]

Bases: WaveformBenchmarkDataset, ABC

An abstract class for downloading datasets. The CWA dataset comprises data from two seismographic networks: CWASN and TSMIP. The dataset spans from 2011 to 2021 and primarily includes P and S wave arrivals. Additionally, a subset of noise data is provided.

classmethod available_chunks(force=False, wait_for_file=False)[source]

Returns a list of available chunks. Queries both the local cache and the remote root.

chunk2file = {'_2011': 'merge2011_2014.tar.gz', '_2012': 'merge2011_2014.tar.gz', '_2013': 'merge2011_2014.tar.gz', '_2014': 'merge2011_2014.tar.gz', '_2015': 'merge2015_2018.tar.gz', '_2016': 'merge2015_2018.tar.gz', '_2017': 'merge2015_2018.tar.gz', '_2018': 'merge2015_2018.tar.gz', '_2019': 'merge2019_2021.tar.gz', '_2020': 'merge2019_2021.tar.gz', '_2021': 'merge2019_2021.tar.gz', '_noise1': 'noise_chunk1.tar.gz', '_noise2': 'noise_chunk2.tar.gz'}
citation = 'Kuan-Wei Tang, Kuan-Yu Chen, Da-Yi Chen, Tai-Lin Chin, and Ting-Yu Hsu. (2024)The CWA Benchmark: A Seismic Dataset from Taiwan for Seismic Research.Seismological Research Letters 2024.doi: https://doi.org/10.1785/0220230393'
src_repo_name = None
tar_file(filepath, savepath)[source]
class CWANoise(**kwargs)[source]

Bases: CWABase

CWA dataset - Noise samples.

src_repo_name = 'NLPLabNTUST/Merged-CWA-Noise'

ETHZ dataset

class ETHZ(**kwargs)[source]

Bases: WaveformBenchmarkDataset

Regional benchmark dataset of publicly available waveform data & corresponding metadata in Swiss Seismological Service (SED) archive. Contains data from (2013-2020). A pre-compiled version of the benchmark dataset in compatible SeisBench format is available for download from remote root. In case of download issues, benchmark dataset is downloaded directly from source via FDSN client and converted SeisBench format.

property client

GEOFON dataset

class GEOFON(**kwargs)[source]

Bases: WaveformBenchmarkDataset

GEOFON dataset consisting of both regional and teleseismic picks. Mostly contains P arrivals, but a few S arrivals are annotated as well. Contains data from 2010-2013. The dataset will be downloaded from the SeisBench repository on first usage.

INSTANCE dataset

class InstanceCounts(**kwargs)[source]

Bases: InstanceTypeDataset

INSTANCE dataset - Events with waveforms in counts

class InstanceCountsCombined(**kwargs)[source]

Bases: MultiWaveformDataset

Convenience class to jointly load InstanceCounts and InstanceNoise.

Parameters:

kwargs – Passed to the constructors of both InstanceCounts and InstanceNoise

class InstanceGM(**kwargs)[source]

Bases: InstanceTypeDataset

INSTANCE dataset - Events with waveforms in ground motion units

class InstanceNoise(**kwargs)[source]

Bases: InstanceTypeDataset

INSTANCE dataset - Noise samples

Iquique dataset

class Iquique(**kwargs)[source]

Bases: WaveformBenchmarkDataset

Iquique Benchmark Dataset of local events used for training in Woollam (2019) study (see citation).

Splits are set using standard random sampling of seisbench.data.base.BenchmarkDataset.

ISC-EHB dataset

class ISC_EHB_DepthPhases(**kwargs)[source]

Bases: WaveformBenchmarkDataset

Dataset of depth phase picks from the ISC-EHB bulletin.

static available_chunks(*args, **kwargs)[source]

Returns a list of available chunks. Queries both the local cache and the remote root.

LenDB dataset

class LenDB(**kwargs)[source]

Bases: WaveformBenchmarkDataset

Len-DB dataset from Magrini et al.

LFE stack datasets

class LFEStacksCascadiaBostock2015(component_order='Z12', **kwargs)[source]

Bases: WaveformBenchmarkDataset

Low-frequency earthquake stacks underneath Vancouver Island, Cascadia, Canada/USA based on the catalog by Bostock et al (2015). Compiled to SeisBench format by Münchmeyer et al (2024).

class LFEStacksMexicoFrank2014(component_order='Z12', **kwargs)[source]

Bases: WaveformBenchmarkDataset

Low-frequency earthquake stacks underneath Guerrero, Mexico based on the catalog by Frank et al (2014). Compiled to SeisBench format by Münchmeyer et al (2024).

class LFEStacksSanAndreasShelly2017(component_order='Z12', **kwargs)[source]

Bases: WaveformBenchmarkDataset

Low-frequency earthquake stacks on the San Andreas Fault, California, USA based on the catalog by Shelly (2014). Compiled to SeisBench format by Münchmeyer et al (2024).

NEIC datasets

class MLAAPDE(**kwargs)[source]

Bases: WaveformBenchmarkDataset

MLAAPDE dataset from Cole et al. (2023)

Note that the SeisBench version is not identical to the precompiled version distributed directly through USGS but uses a different data selection. In addition, custom versions of MLAAPDE can be compiled with the software provided by the original authors. These datasets can be exported in SeisBench format.

static available_chunks(*args, **kwargs)[source]

Returns a list of available chunks. Queries both the local cache and the remote root.

class NEIC(**kwargs)[source]

Bases: WaveformBenchmarkDataset

NEIC dataset from Yeck and Patton

OBS datasets

class OBS(component_order='Z12H', **kwargs)[source]

Bases: WaveformBenchmarkDataset

OBS Benchmark Dataset of local events

Default component order is ‘Z12H’. You can easily omit one component like, e.g., hydrophone by explicitly passing parameter ‘component_order=”Z12”’. This way, the dataset can be input to land station pickers that use only 3 components.

static available_chunks(*args, **kwargs)[source]

Returns a list of available chunks. Queries both the local cache and the remote root.

class OBST2024(**kwargs)[source]

Bases: WaveformBenchmarkDataset

The OBS dataset from Niksejel & Zhang (2024)

PiSDL dataset

class PiSDL(**kwargs)[source]

Bases: WaveformBenchmarkDataset

A dataset for induced seismicity from different regions in Canada, Switzerland, Germany, and France. Induced seismic events are caused by hydraulic-fracturing based fluid injection, geothermal power plants, and coal mine flooding. In addition, the dataset contains all available low magnitude events (M_L <= 2) from the Swiss Seismological Service (SED) between 2009 and 2023.

get_dawson_septimus_subset()[source]
get_floodrisk_subset()[source]
get_insheim_subset()[source]
get_st_gallen_subset()[source]
get_switzerland_subset()[source]
get_vendenheim_subset()[source]

PNW datasets

class PNW(**kwargs)[source]

Bases: WaveformBenchmarkDataset

PNW ComCat dataset from Ni et al. (2023)

class PNWAccelerometers(**kwargs)[source]

Bases: WaveformBenchmarkDataset

PNW Accelerometers dataset from Ni et al. (2023)

class PNWExotic(**kwargs)[source]

Bases: WaveformBenchmarkDataset

PNW Exotic dataset from Ni et al. (2023)

class PNWNoise(**kwargs)[source]

Bases: WaveformBenchmarkDataset

PNW Noise dataset from Ni et al. (2023)

Southern California datasets

class Meier2019JGR(**kwargs)[source]

Bases: WaveformBenchmarkDataset

Southern californian part of the dataset from Meier et al. (2019) Note that due to the missing Japanese data, there is a massive overrepresentation of noise samples.

Meier, M.-A., Ross, Z. E., Ramachandran, A., Balakrishna, A., Nair, S., Kundzicz, P., et al. (2019). Reliable real‐time seismic signal/noise discrimination with machine learning. Journal of Geophysical Research: Solid Earth, 124. https://doi.org/10.1029/2018JB016661

class Ross2018GPD(**kwargs)[source]

Bases: WaveformBenchmarkDataset

Pick dataset belonging to the publication: Zachary E. Ross, Men‐Andrin Meier, Egill Hauksson, Thomas H. Heaton; Generalized Seismic Phase Detection with Deep Learning. Bulletin of the Seismological Society of America 2018;; 108 (5A): 2894–2901. https://doi.org/10.1785/0120180080

class Ross2018JGRFM(component_order='Z', **kwargs)[source]

Bases: WaveformBenchmarkDataset

First motion polarity dataset belonging to the publication: Ross, Z. E., Meier, M.‐A., & Hauksson, E. (2018). P wave arrival picking and first‐motion polarity determination with deep learning. Journal of Geophysical Research: Solid Earth, 123, 5120– 5129. https://doi.org/10.1029/2017JB015251

Note that this dataset contains picks as well.

Warning

This dataset only contains traces for the Z component. It therefore ignores the default SeisBench the component_order.

class Ross2018JGRPick(component_order='Z', **kwargs)[source]

Bases: WaveformBenchmarkDataset

Pick dataset belonging to the publication: Ross, Z. E., Meier, M.‐A., & Hauksson, E. (2018). P wave arrival picking and first‐motion polarity determination with deep learning. Journal of Geophysical Research: Solid Earth, 123, 5120– 5129. https://doi.org/10.1029/2017JB015251

Note that this dataset contains polarities as well.

Warning

This dataset only contains traces for the Z component. It therefore ignores the default SeisBench the component_order.

class SCEDC(**kwargs)[source]

Bases: WaveformBenchmarkDataset

SCEDC waveform archive (2000-2020).

Splits are set using standard random sampling of :py:class: BenchmarkDataset.

STEAD dataset

class STEAD(**kwargs)[source]

Bases: WaveformBenchmarkDataset

STEAD dataset from Mousavi et al.

Using the train/test split from the EQTransformer Github repository train/dev split defined in SeisBench

TXED dataset

class TXED(**kwargs)[source]

Bases: WaveformBenchmarkDataset

TEXD dataset from Chen et al.

train/dev/test split defined in SeisBench.

VCSEIS dataset

class VCSEIS(**kwargs)[source]

Bases: WaveformBenchmarkDataset

A data set of seismic waveforms from various volcanic regions: Alaska, Hawaii, Northern California, Cascade volcanoes.

get_alaska_subset()[source]

Select and return the data from Alaska

get_cascade_subset()[source]

Select and return the data from Cascade volcanoes

get_hawaii_subset()[source]

Select and return the data from Hawaii

get_long_period_earthquakes()[source]

Return the subset with only long-period earthquakes

get_noise_traces()[source]

Return the subset with only noise traces

get_northern_california_subset()[source]

Select and return the data from Northern California

get_regular_earthquakes()[source]

Return the subset with only regular earthquakes