seisbench.data

Base classes

class BenchmarkDataset(chunks=None, citation=None, license=None, force=False, wait_for_file=False, repository_lookup=False, download_kwargs=None, **kwargs)[source]

Bases: WaveformDataset, ABC

This class is the base class for benchmark waveform datasets. It adds functionality to automatically download the dataset to the SeisBench cache. Downloads can either be from the SeisBench repository, if the dataset is available there and in the right format, or from another source, which will usually require some form of conversion. Furthermore, it adds annotations for citation and license.

Parameters:

chunks – List of chunks to download
citation – Citation for the dataset. Should be set in the inheriting class.
license – License associated with the dataset. Should be set in the inheriting class.
force – Passed to callback_if_uncached()
wait_for_file – Passed to callback_if_uncached()
repository_lookup – Whether the data set should be search in the remote repository or directly use the download function. Should be set in the inheriting class. Only needs to be set to true if the dataset is available in a repository, e.g., the SeisBench repository, for direct download.
download_kwargs – Dict of arguments passed to the download_dataset function, in case the dataset is loaded from scratch.
kwargs – Keyword arguments passed to WaveformDataset

classmethod available_chunks(force=False, wait_for_file=False)[source]: Returns a list of available chunks. Queries both the local cache and the remote root.

property citation: The suggested citation for this dataset

property license: The licence attached to this dataset

property name: Name of the dataset. For BenchmarkDatasets, always matches the class name.

property path: Path to the dataset location in the SeisBench cache

class Bucketer[source]

Bases: ABC

This is the abstract bucketer class that needs to be provided to the WaveformDataWriter. It offers one public function, get_bucket(), to assign a bucket to each trace.

abstract get_bucket(metadata, waveform)[source]

Calculates the bucket for the trace given its metadata and waveforms

Parameters:

metadata – Metadata as given to the WaveformDataWriter.
waveform – Waveforms as given to the WaveformDataWriter.

Returns:

A hashable object denoting the bucket this sample belongs to.

class GeometricBucketer(minbucket=100, factor=1.2, splits=True, track_channels=True, axis=-1)[source]

Bases: Bucketer

A simple bucketer that uses the length of the traces and optionally the assigned split to determine buckets. Only takes into account the length along one fixed axis. Bucket edges are create with a geometric spacing above a minimum bucket. The first bucket is [0, minbucket), the second one [minbucket, minbucket * factor) and so on. There is no maximum bucket. This bucketer ensures that the overhead from padding is at most factor - 1, as long as only few traces with length < minbucket exist. Note that this can even be significantly reduced by passing the input traces ordered by their length.

Parameters:

minbucket (int) – Upper limit of the lowest bucket and start of the geometric spacing.
factor (float) – Factor for the geometric spacing.
splits (bool) – If true, returns separate buckets for each split. Defaults to true. If no split is defined in the metadata, this parameter is ignored.
track_channels (bool) – If true, uses the shape of the input waveform along all axis except the one defined in axis, to determine the bucket. Only traces agreeing in all dimensions except the given axis will be assigned to the same bucket.
axis (int) – Axis to take into account for determining the length of the trace.

get_bucket(metadata, waveform)[source]

Calculates the bucket for the trace given its metadata and waveforms

Parameters:

metadata – Metadata as given to the WaveformDataWriter.
waveform – Waveforms as given to the WaveformDataWriter.

Returns:

A hashable object denoting the bucket this sample belongs to.

class LoadingContext(chunks, waveform_paths)[source]

Bases: object

The LoadingContext is a dict of pointers to the hdf5 files for the chunks. It is an easy way to manage opening and closing of file pointers when required.

class MultiWaveformDataset(datasets)[source]

Bases: object

A MultiWaveformDataset is an ordered collection of WaveformDataset. It exposes mostly the same API as a single WaveformDataset.

The constructor checks for compatibility of dimension_order, component_order and sampling_rate. The caching strategy of each contained dataset is left unmodified, but a warning is issued if different caching schemes are found.

Parameters:: datasets – List of WaveformDataset. The constructor will create a copy of each dataset using the WaveformDataset.copy() method.

property cache: Get or set cache strategy

property component_order: Get or set component order

property datasets: Datasets contained in MultiWaveformDataset.

dev()

Convenience method for get_split(“dev”).

Returns:: Development dataset

property dimension_order: Get or set dimension order for output

filter(mask, inplace=True)[source]

Filters dataset, similar to WaveformDataset.filter().

Parameters:

mask (masked-array) – Boolean mask to apple to metadata.
inplace (bool) – If true, filter inplace.

Returns:

None if filter=true, otherwise the filtered dataset.

get_group_idx_from_params(params)

Returns the index of the group identified by the params.

Parameters:: params – The parameters identifying the group. For a single grouping parameter, this argument will be a single value. Otherwise this argument needs to be a tuple of keys.
Returns:: Index of the group
Return type:: int

get_group_samples(idx, **kwargs)

Returns the waveforms and metadata for each member of a group. For details see get_sample().

Parameters:

idx (int) – Group index
kwargs – Kwargs passed to get_sample()

Returns:

List of waveforms, list of metadata dicts

get_group_size(idx)

Returns the number of samples in a group

Parameters:: idx (int) – Group index
Returns:: Size of the group
Return type:: int

get_group_waveforms(idx, **kwargs)

Returns the waveforms for each member of a group. For details see get_sample().

Parameters:

idx (int) – Group index
kwargs – Kwargs passed to get_sample()

Returns:

List of waveforms

get_idx_from_trace_name(trace_name, chunk=None, dataset=None)

Returns the index of a trace with given trace_name, chunk and dataset. Chunk and dataset parameters are optional, but might be necessary to uniquely identify traces for chunked datasets or for MultiWaveformDataset. The method will issue a warning the first time a non-uniquely identifiable trace is requested. If no matching key is found, a KeyError is raised.

Parameters:

trace_name (str) – Trace name as in metadata[“trace_name”]
chunk (None) – Trace chunk as in metadata[“trace_chunk”]. If None this key will be ignored.
dataset (None) – Trace dataset as in metadata[“trace_dataset”]. Only for MultiWaveformDataset. If None this key will be ignored.

Returns:

Index of the sample

get_sample(idx, *args, **kwargs)[source]

Wraps WaveformDataset.get_sample()

Parameters:

idx – Index of the sample
args – passed to parent function
kwargs – passed to parent function

Returns:

Return value of parent function

get_split(split)

Returns a dataset with the requested split.

Parameters:: split – Split name to return. Usually one of “train”, “dev”, “test”
Returns:: Dataset filtered to the requested split.

get_waveforms(idx=None, mask=None, **kwargs)[source]

Collects waveforms and returns them as an array.

Parameters:

idx (int, list[int]) – Idx or list of idx to obtain waveforms for
mask (np.ndarray[bool]) – Binary mask on the metadata, indicating which traces should be returned. Can not be used jointly with idx.
kwargs – Passed to WaveformDataset.get_waveforms()

Returns:

Waveform array with dimensions ordered according to dimension_order e.g. default ‘NCW’ (number of traces, number of components, record samples). If the number record samples varies between different entries, all entries are padded to the maximum length.

Return type:

np.ndarray

property grouping: The grouping parameters for the dataset. Grouping allows to access metadata and waveforms jointly from a set of traces with a common metadata parameter. This can for example be used to access all waveforms belong to one event and building event based models. Setting the grouping parameter defines the output of groups and the associated methods. grouping can be either a single string or a list of strings. Each string must be a column in the metadata. By default, the grouping is None.

property groups: The list of groups as defined by the grouping or None if grouping is None.

property metadata: Metadata of the dataset as pandas DataFrame.

property metadata_cache

property missing_components: Get or set strategy for missing components

plot_map(res='110m', connections=False, **kwargs)

Plots the dataset onto a map using the Mercator projection. Requires a cartopy installation.

Parameters:

res (str, optional) – Resolution for cartopy features, defaults to 110m.
connections (bool, optional) – If true, plots lines connecting sources and stations. Defaults to false.
kwargs – Plotting kwargs that will be passed to matplotlib plot. Args need to be prefixed with sta_, ev_ and conn_ to address stations, events or connections.

Returns:

A figure handle for the created figure.

preload_waveforms(*args, **kwargs)[source]: Calls WaveformDataset.preload_waveforms() for all member datasets with the provided arguments.

region_filter(domain, lat_col, lon_col, inplace=True)

Filtering of dataset based on predefined region or geometry. See also convenience functions region_filter_[source|receiver].

Parameters:

domain (obspy.core.fdsn.mass_downloader.domain:) – The domain filter
lat_col (str) – Name of latitude coordinate column
lon_col (str) – Name of longitude coordinate column
inplace (bool) – Inplace filtering, default to true. See also filter().

Returns:

None if inplace=True, otherwise the filtered dataset.

region_filter_receiver(domain, inplace=True): Convenience method for region filtering by receiver location.

region_filter_source(domain, inplace=True): Convenience method for region filtering by source location.

property sampling_rate: Get or set sampling rate for output

test()

Convenience method for get_split(“test”).

Returns:: Test dataset

train()

Convenience method for get_split(“train”).

Returns:: Training dataset

train_dev_test()

Convenience method for returning training, development and test set. Equal to:

>>> self.train(), self.dev(), self.test()

Returns:: Training dataset, development dataset, test dataset

class WaveformDataWriter(metadata_path, waveforms_path)[source]

Bases: object

The WaveformDataWriter for writing datasets in SeisBench format.

To improve reading performance when using the datasets, the writer groups traces into blocks and writes them into joint arrays in the hdf5 file. The exact behaviour is controlled by the bucketer and the bucket_size. For details see their documentation. This packing is necessary, due to limitations in the hdf5 performance. Reading many small datasets from a hdf5 file causes the overhead of the hdf5 structure to define the read times.

Parameters:

metadata_path (str or Path) – Path to write the metadata file to
waveforms_path (str or Path) – Path to write the waveforms file to

Returns:

None

add_trace(metadata, waveform)[source]

Adds a trace to the writer. This does not imply that the trace is immediately written to disk, as the writer might wait to fill a bucket. The writer ensures that the order of traces in the metadata is identical to the order of calls to add_trace.

Parameters:

metadata (dict[str, any]) – Metadata of the trace
waveform (np.ndarray) – Waveform of the trace

Returns:

None

property bucket_size

The maximum size of a bucket. Once adding another trace would overload the bucket, the bucket is written to disk. Defaults to 1024.

Returns:: Bucket size

property bucketer

The currently used bucketer, which sorts traces into buckets. If the bucketer is None, no buckets are used and all traces are written separately. By default uses the GeometricBucketer with default parameters. Please check that this suits your needs. In particular, make sure that the default axis matches your sample axis.

Returns:: Returns the current bucketer.

flush_hdf5()[source]: Writes out all traces currently in the cache to the hdf5 file. Should be called if no more traces for the existing buckets will be added, e.g., after finishing a split. Does not write the metadata to csv.

set_total(n)[source]

Set the total number of traces to write. Only used for correct progress calculation

Parameters:: n (int) – Number of traces
Returns:: None

class WaveformDataset(path, name=None, dimension_order=None, component_order=None, sampling_rate=None, cache=None, chunks=None, missing_components='pad', metadata_cache=False, **kwargs)[source]

Bases: object

This class is the base class for waveform datasets.

A key consideration should be how the cache is used. If sufficient memory is available to keep the full data set in memory, activating the cache will yield strong performance gains. For details on the cache strategies, see the documentation of the cache parameter.

Parameters:

path (pathlib.Path, str) – Path to dataset.
name (str, optional) – Dataset name, default is None.
dimension_order (str, optional) – Dimension order e.g. ‘CHW’, if not specified will be assumed from config file, defaults to None.
component_order (str, optional) – Component order e.g. ‘ZNE’, if not specified will be assumed from config file, defaults to None.
sampling_rate (int, optional) – Common sampling rate of waveforms in dataset, sampling rate can also be specified as a metadata column if not common across dataset.
cache (str, optional) –
Defines the behaviour of the waveform cache. Provides three options:
- ”full”: When a trace is queried, the full block containing the trace is loaded into the cache and stored in memory. This causes the highest memory consumption, but also best performance when using large parts of the dataset.
- ”trace”: When a trace is queried, only the trace itself is loaded and stored in memory. This is particularly useful when only a subset of traces is queried, but these are queried multiple times. In this case the performance of this strategy might outperform “full”.
- None: When a trace is queried, it is always loaded from disk. This mode leads to low memory consumption but high IO load. It is most likely not usable for model training.
Note that for datasets without blocks, i.e., each trace in a single array in the hdf5 file, the strategies “full” and “trace” are identical. The default cache strategy is None.

Use preload_waveforms() to populate the cache. Preloading the waveforms is often much faster than loading them during later application, as preloading can use sequential access. Note that it is recommended to always first filter a dataset and then preload to reduce unnecessary reads and memory consumption.
chunks (list, optional) – Specify particular chunks to load. If None, loads all chunks. Defaults to None.
missing_components (str) –
Strategy to deal with missing components. Options are:
- ”pad”: Fill with zeros.
- ”copy”: Fill with values from first existing traces.
- ”ignore”: Order all existing components in the requested order, but ignore missing ones. This will raise an error if traces with different numbers of components are requested together.
metadata_cache – If true, metadata is cached in a lookup table. This significantly speeds up access to metadata and thereby access to samples. On the downside, this requires storing two copies of the metadata in memory. The second copy usually consumes more memory due to the less space-efficient format. Runtime differences are particularly big for large datasets.
kwargs –

static available_chunks(path)[source]

Determines the chunks of the dataset in the given path.

Parameters:: path – Dataset path
Returns:: List of chunks

property cache: Get or set the cache strategy of the dataset. For possible strategies see the constructor. Note that changing cache strategies will not cause a cache eviction.

property chunks: Returns a list of chunks. If dataset is not chunked, returns an empty list.

property component_order: Get or set order of components in the output.

copy()[source]

Create a copy of the data set. All attributes are copied by value, except waveform cache entries. The cache entries are copied by reference, as the waveforms will take up most of the memory. This should be fine for most use cases, because the cache entries should anyhow never be modified. Note that the cache dict itself is not shared, such that cache evictions and inserts in one of the data sets do not affect the other one.

Returns:: Copy of the dataset

property data_format: Data format dictionary, describing the data format of the stored dataset. Note that this does not necessarily equals the output data format of get waveforms. To query these, use the relevant class properties.

dev()[source]

Convenience method for get_split(“dev”).

Returns:: Development dataset

property dimension_order: Get or set the order of the dimension in the output.

filter(mask, inplace=True)[source]

Filters dataset, e.g. by distance/magnitude/…, using a binary mask. Default behaviour is to perform inplace filtering, directly changing the metadata and waveforms to only keep the results of the masking query. Setting inplace equal to false will return a filtered copy of the data set. For details on the copy operation see copy().

Parameters:

mask (boolean array) – Boolean mask to apply to metadata.
inplace (bool) – If true, filter inplace.

Example usage:

dataset.filter(dataset["p_status"] == "manual")

Returns:: None if inplace=True, otherwise the filtered dataset.

get_group_idx_from_params(params)[source]

Returns the index of the group identified by the params.

Parameters:: params – The parameters identifying the group. For a single grouping parameter, this argument will be a single value. Otherwise this argument needs to be a tuple of keys.
Returns:: Index of the group
Return type:: int

get_group_samples(idx, **kwargs)[source]

Returns the waveforms and metadata for each member of a group. For details see get_sample().

Parameters:

idx (int) – Group index
kwargs – Kwargs passed to get_sample()

Returns:

List of waveforms, list of metadata dicts

get_group_size(idx)[source]

Returns the number of samples in a group

Parameters:: idx (int) – Group index
Returns:: Size of the group
Return type:: int

get_group_waveforms(idx, **kwargs)[source]

Returns the waveforms for each member of a group. For details see get_sample().

Parameters:

idx (int) – Group index
kwargs – Kwargs passed to get_sample()

Returns:

List of waveforms

get_idx_from_trace_name(trace_name, chunk=None, dataset=None)[source]

Returns the index of a trace with given trace_name, chunk and dataset. Chunk and dataset parameters are optional, but might be necessary to uniquely identify traces for chunked datasets or for MultiWaveformDataset. The method will issue a warning the first time a non-uniquely identifiable trace is requested. If no matching key is found, a KeyError is raised.

Parameters:

trace_name (str) – Trace name as in metadata[“trace_name”]
chunk (None) – Trace chunk as in metadata[“trace_chunk”]. If None this key will be ignored.
dataset (None) – Trace dataset as in metadata[“trace_dataset”]. Only for MultiWaveformDataset. If None this key will be ignored.

Returns:

Index of the sample

get_sample(idx, sampling_rate=None)[source]

Returns both waveforms and metadata of a traces. Adjusts all metadata traces with sampling rate dependent values to the correct sampling rate, e.g., p_pick_samples will still point to the right sample after this operation, even if the trace was resampled.

Parameters:

idx – Idx of sample to return
sampling_rate – Target sampling rate, overwrites sampling rate for dataset.

Returns:

Tuple with the waveforms and the metadata of the sample.

get_split(split)[source]

Returns a dataset with the requested split.

Parameters:: split – Split name to return. Usually one of “train”, “dev”, “test”
Returns:: Dataset filtered to the requested split.

get_waveforms(idx=None, mask=None, sampling_rate=None)[source]

Collects waveforms and returns them as an array.

Parameters:

idx (int, list[int]) – Idx or list of idx to obtain waveforms for
mask (np.ndarray[bool]) – Binary mask on the metadata, indicating which traces should be returned. Can not be used jointly with idx.
sampling_rate (float) – Target sampling rate, overwrites sampling rate for dataset

Returns:

Waveform array with dimensions ordered according to dimension_order e.g. default ‘NCW’ (number of traces, number of components, record samples). If the number of record samples varies between different entries, all entries are padded to the maximum length.

Return type:

np.ndarray

property grouping: The grouping parameters for the dataset. These parameters are used to determine the groups and for the associated methods. grouping can be either a single string or a list of strings. Each string must be a column in the metadata. By default, the grouping is None.

property groups: The list of groups as defined by the grouping or None if grouping is None.

property metadata: Metadata of the dataset as pandas DataFrame.

property metadata_cache

property missing_components: Get or set strategy to handle missing components. For options, see the constructor.

property name: Name of the dataset (immutable)

property path: Path of the dataset (immutable)

plot_map(res='110m', connections=False, **kwargs)[source]

Plots the dataset onto a map using the Mercator projection. Requires a cartopy installation.

Parameters:

res (str, optional) – Resolution for cartopy features, defaults to 110m.
connections (bool, optional) – If true, plots lines connecting sources and stations. Defaults to false.
kwargs – Plotting kwargs that will be passed to matplotlib plot. Args need to be prefixed with sta_, ev_ and conn_ to address stations, events or connections.

Returns:

A figure handle for the created figure.

preload_waveforms(pbar=False)[source]

Loads waveform data from hdf5 file into cache. Fails if caching strategy is None.

Parameters:: pbar – If true, shows progress bar. Defaults to False.

region_filter(domain, lat_col, lon_col, inplace=True)[source]

Filtering of dataset based on predefined region or geometry. See also convenience functions region_filter_[source|receiver].

Parameters:

domain (obspy.core.fdsn.mass_downloader.domain:) – The domain filter
lat_col (str) – Name of latitude coordinate column
lon_col (str) – Name of longitude coordinate column
inplace (bool) – Inplace filtering, default to true. See also filter().

Returns:

None if inplace=True, otherwise the filtered dataset.

region_filter_receiver(domain, inplace=True)[source]: Convenience method for region filtering by receiver location.

region_filter_source(domain, inplace=True)[source]: Convenience method for region filtering by source location.

test()[source]

Convenience method for get_split(“test”).

Returns:: Test dataset

train()[source]

Convenience method for get_split(“train”).

Returns:: Training dataset

train_dev_test()[source]

Convenience method for returning training, development and test set. Equal to:

>>> self.train(), self.dev(), self.test()

Returns:: Training dataset, development dataset, test dataset

Dummy datasets

The dummy datasets mostly exist for testing purposes. They are very small datasets that can be used, for example, to demonstrate certain functionality.

class ChunkedDummyDataset(**kwargs)[source]

Bases: BenchmarkDataset

A chunked dummy dataset visualizing the implementation of custom datasets with chunking

class DummyDataset(**kwargs)[source]

Bases: BenchmarkDataset

A dummy dataset visualizing the implementation of custom datasets

ETHZ dataset

class ETHZ(**kwargs)[source]

Bases: BenchmarkDataset

Regional benchmark dataset of publicly available waveform data & corresponding metadata in Swiss Seismological Service (SED) archive. Contains data from (2013-2020). A pre-compiled version of the benchmark dataset in compatible SeisBench format is available for download from remote root. In case of download issues, benchmark dataset is downloaded directly from source via FDSN client and converted SeisBench format.

property client

GEOFON dataset

class GEOFON(**kwargs)[source]

Bases: BenchmarkDataset

GEOFON dataset consisting of both regional and teleseismic picks. Mostly contains P arrivals, but a few S arrivals are annotated as well. Contains data from 2010-2013. The dataset will be downloaded from the SeisBench repository on first usage.

INSTANCE dataset

class InstanceCounts(**kwargs)[source]

Bases: InstanceTypeDataset

INSTANCE dataset - Events with waveforms in counts

class InstanceCountsCombined(**kwargs)[source]

Bases: MultiWaveformDataset

Convenience class to jointly load InstanceCounts and InstanceNoise.

Parameters:: kwargs – Passed to the constructors of both InstanceCounts and InstanceNoise

class InstanceGM(**kwargs)[source]

Bases: InstanceTypeDataset

INSTANCE dataset - Events with waveforms in ground motion units

class InstanceNoise(**kwargs)[source]

Bases: InstanceTypeDataset

INSTANCE dataset - Noise samples

Iquique dataset

class Iquique(**kwargs)[source]

Bases: BenchmarkDataset

Iquique Benchmark Dataset of local events used for training in Woollam (2019) study (see citation).

Splits are set using standard random sampling of seisbench.data.base.BenchmarkDataset.

ISC-EHB dataset

class ISC_EHB_DepthPhases(**kwargs)[source]

Bases: BenchmarkDataset

Dataset of depth phase picks from the ISC-EHB bulletin.

static available_chunks(*args, **kwargs)[source]: Returns a list of available chunks. Queries both the local cache and the remote root.

LenDB dataset

class LenDB(**kwargs)[source]

Bases: BenchmarkDataset

Len-DB dataset from Magrini et al.

LFE stack datasets

class LFEStacksCascadiaBostock2015(component_order='Z12', **kwargs)[source]

Bases: BenchmarkDataset

Low-frequency earthquake stacks underneath Vancouver Island, Cascadia, Canada/USA based on the catalog by Bostock et al (2015). Compiled to SeisBench format by Münchmeyer et al (2024).

class LFEStacksMexicoFrank2014(component_order='Z12', **kwargs)[source]

Bases: BenchmarkDataset

Low-frequency earthquake stacks underneath Guerrero, Mexico based on the catalog by Frank et al (2014). Compiled to SeisBench format by Münchmeyer et al (2024).

class LFEStacksSanAndreasShelly2017(component_order='Z12', **kwargs)[source]

Bases: BenchmarkDataset

Low-frequency earthquake stacks on the San Andreas Fault, California, USA based on the catalog by Shelly (2014). Compiled to SeisBench format by Münchmeyer et al (2024).

NEIC datasets

class MLAAPDE(**kwargs)[source]

Bases: BenchmarkDataset

MLAAPDE dataset from Cole et al. (2023)

Note that the SeisBench version is not identical to the precompiled version distributed directly through USGS but uses a different data selection. In addition, custom versions of MLAAPDE can be compiled with the software provided by the original authors. These datasets can be exported in SeisBench format.

static available_chunks(*args, **kwargs)[source]: Returns a list of available chunks. Queries both the local cache and the remote root.

class NEIC(**kwargs)[source]

Bases: BenchmarkDataset

NEIC dataset from Yeck and Patton

OBS datasets

class OBS(component_order='Z12H', **kwargs)[source]

Bases: BenchmarkDataset

OBS Benchmark Dataset of local events

Default component order is ‘Z12H’. You can easily omit one component like, e.g., hydrophone by explicitly passing parameter ‘component_order=”Z12”’. This way, the dataset can be input to land station pickers that use only 3 components.

static available_chunks(*args, **kwargs)[source]: Returns a list of available chunks. Queries both the local cache and the remote root.

class OBST2024(**kwargs)[source]

Bases: BenchmarkDataset

The OBS dataset from Niksejel & Zhang (2024)

PNW datasets

class PNW(**kwargs)[source]

Bases: BenchmarkDataset

PNW ComCat dataset from Ni et al. (2023)

class PNWAccelerometers(**kwargs)[source]

Bases: BenchmarkDataset

PNW Accelerometers dataset from Ni et al. (2023)

class PNWExotic(**kwargs)[source]

Bases: BenchmarkDataset

PNW Exotic dataset from Ni et al. (2023)

class PNWNoise(**kwargs)[source]

Bases: BenchmarkDataset

PNW Noise dataset from Ni et al. (2023)

Southern California datasets

class Meier2019JGR(**kwargs)[source]

Bases: BenchmarkDataset

Southern californian part of the dataset from Meier et al. (2019) Note that due to the missing Japanese data, there is a massive overrepresentation of noise samples.

Meier, M.-A., Ross, Z. E., Ramachandran, A., Balakrishna, A., Nair, S., Kundzicz, P., et al. (2019). Reliable real‐time seismic signal/noise discrimination with machine learning. Journal of Geophysical Research: Solid Earth, 124. https://doi.org/10.1029/2018JB016661

class Ross2018GPD(**kwargs)[source]

Bases: BenchmarkDataset

Pick dataset belonging to the publication: Zachary E. Ross, Men‐Andrin Meier, Egill Hauksson, Thomas H. Heaton; Generalized Seismic Phase Detection with Deep Learning. Bulletin of the Seismological Society of America 2018;; 108 (5A): 2894–2901. https://doi.org/10.1785/0120180080

class Ross2018JGRFM(component_order='Z', **kwargs)[source]

Bases: BenchmarkDataset

First motion polarity dataset belonging to the publication: Ross, Z. E., Meier, M.‐A., & Hauksson, E. (2018). P wave arrival picking and first‐motion polarity determination with deep learning. Journal of Geophysical Research: Solid Earth, 123, 5120– 5129. https://doi.org/10.1029/2017JB015251

Note that this dataset contains picks as well.

Warning

This dataset only contains traces for the Z component. It therefore ignores the default SeisBench the component_order.

class Ross2018JGRPick(component_order='Z', **kwargs)[source]

Bases: BenchmarkDataset

Pick dataset belonging to the publication: Ross, Z. E., Meier, M.‐A., & Hauksson, E. (2018). P wave arrival picking and first‐motion polarity determination with deep learning. Journal of Geophysical Research: Solid Earth, 123, 5120– 5129. https://doi.org/10.1029/2017JB015251

Note that this dataset contains polarities as well.

Warning

This dataset only contains traces for the Z component. It therefore ignores the default SeisBench the component_order.

class SCEDC(**kwargs)[source]

Bases: BenchmarkDataset

SCEDC waveform archive (2000-2020).

Splits are set using standard random sampling of :py:class: BenchmarkDataset.

STEAD dataset

class STEAD(**kwargs)[source]

Bases: BenchmarkDataset

STEAD dataset from Mousavi et al.

Using the train/test split from the EQTransformer Github repository train/dev split defined in SeisBench

TXED dataset

class TXED(**kwargs)[source]

Bases: BenchmarkDataset

TEXD dataset from Chen et al.

train/dev/test split defined in SeisBench.