seisbench.data
Base classes
- class BenchmarkDataset(chunks=None, citation=None, license=None, force=False, wait_for_file=False, repository_lookup=False, download_kwargs=None, **kwargs)[source]
Bases:
WaveformDataset
,ABC
This class is the base class for benchmark waveform datasets. It adds functionality to automatically download the dataset to the SeisBench cache. Downloads can either be from the SeisBench repository, if the dataset is available there and in the right format, or from another source, which will usually require some form of conversion. Furthermore, it adds annotations for citation and license.
- Parameters:
chunks – List of chunks to download
citation – Citation for the dataset. Should be set in the inheriting class.
license – License associated with the dataset. Should be set in the inheriting class.
force – Passed to
callback_if_uncached()
wait_for_file – Passed to
callback_if_uncached()
repository_lookup – Whether the data set should be search in the remote repository or directly use the download function. Should be set in the inheriting class. Only needs to be set to true if the dataset is available in a repository, e.g., the SeisBench repository, for direct download.
download_kwargs – Dict of arguments passed to the download_dataset function, in case the dataset is loaded from scratch.
kwargs – Keyword arguments passed to WaveformDataset
- classmethod available_chunks(force=False, wait_for_file=False)[source]
Returns a list of available chunks. Queries both the local cache and the remote root.
- property citation
The suggested citation for this dataset
- property license
The licence attached to this dataset
- property name
Name of the dataset. For BenchmarkDatasets, always matches the class name.
- property path
Path to the dataset location in the SeisBench cache
- class Bucketer[source]
Bases:
ABC
This is the abstract bucketer class that needs to be provided to the WaveformDataWriter. It offers one public function,
get_bucket()
, to assign a bucket to each trace.- abstract get_bucket(metadata, waveform)[source]
Calculates the bucket for the trace given its metadata and waveforms
- Parameters:
metadata – Metadata as given to the WaveformDataWriter.
waveform – Waveforms as given to the WaveformDataWriter.
- Returns:
A hashable object denoting the bucket this sample belongs to.
- class GeometricBucketer(minbucket=100, factor=1.2, splits=True, track_channels=True, axis=-1)[source]
Bases:
Bucketer
A simple bucketer that uses the length of the traces and optionally the assigned split to determine buckets. Only takes into account the length along one fixed axis. Bucket edges are create with a geometric spacing above a minimum bucket. The first bucket is [0, minbucket), the second one [minbucket, minbucket * factor) and so on. There is no maximum bucket. This bucketer ensures that the overhead from padding is at most factor - 1, as long as only few traces with length < minbucket exist. Note that this can even be significantly reduced by passing the input traces ordered by their length.
- Parameters:
minbucket (int) – Upper limit of the lowest bucket and start of the geometric spacing.
factor (float) – Factor for the geometric spacing.
splits (bool) – If true, returns separate buckets for each split. Defaults to true. If no split is defined in the metadata, this parameter is ignored.
track_channels (bool) – If true, uses the shape of the input waveform along all axis except the one defined in axis, to determine the bucket. Only traces agreeing in all dimensions except the given axis will be assigned to the same bucket.
axis (int) – Axis to take into account for determining the length of the trace.
- get_bucket(metadata, waveform)[source]
Calculates the bucket for the trace given its metadata and waveforms
- Parameters:
metadata – Metadata as given to the WaveformDataWriter.
waveform – Waveforms as given to the WaveformDataWriter.
- Returns:
A hashable object denoting the bucket this sample belongs to.
- class LoadingContext(chunks, waveform_paths)[source]
Bases:
object
The LoadingContext is a dict of pointers to the hdf5 files for the chunks. It is an easy way to manage opening and closing of file pointers when required.
- class MultiWaveformDataset(datasets)[source]
Bases:
object
A
MultiWaveformDataset
is an ordered collection ofWaveformDataset
. It exposes mostly the same API as a singleWaveformDataset
.The constructor checks for compatibility of dimension_order, component_order and sampling_rate. The caching strategy of each contained dataset is left unmodified, but a warning is issued if different caching schemes are found.
- Parameters:
datasets – List of
WaveformDataset
. The constructor will create a copy of each dataset using theWaveformDataset.copy()
method.
- property cache
Get or set cache strategy
- property component_order
Get or set component order
- property datasets
Datasets contained in MultiWaveformDataset.
- dev()
Convenience method for get_split(“dev”).
- Returns:
Development dataset
- property dimension_order
Get or set dimension order for output
- filter(mask, inplace=True)[source]
Filters dataset, similar to
WaveformDataset.filter()
.- Parameters:
mask (masked-array) – Boolean mask to apple to metadata.
inplace (bool) – If true, filter inplace.
- Returns:
None if filter=true, otherwise the filtered dataset.
- get_group_idx_from_params(params)
Returns the index of the group identified by the params.
- Parameters:
params – The parameters identifying the group. For a single grouping parameter, this argument will be a single value. Otherwise this argument needs to be a tuple of keys.
- Returns:
Index of the group
- Return type:
int
- get_group_samples(idx, **kwargs)
Returns the waveforms and metadata for each member of a group. For details see
get_sample()
.- Parameters:
idx (int) – Group index
kwargs – Kwargs passed to
get_sample()
- Returns:
List of waveforms, list of metadata dicts
- get_group_size(idx)
Returns the number of samples in a group
- Parameters:
idx (int) – Group index
- Returns:
Size of the group
- Return type:
int
- get_group_waveforms(idx, **kwargs)
Returns the waveforms for each member of a group. For details see
get_sample()
.- Parameters:
idx (int) – Group index
kwargs – Kwargs passed to
get_sample()
- Returns:
List of waveforms
- get_idx_from_trace_name(trace_name, chunk=None, dataset=None)
Returns the index of a trace with given trace_name, chunk and dataset. Chunk and dataset parameters are optional, but might be necessary to uniquely identify traces for chunked datasets or for
MultiWaveformDataset
. The method will issue a warning the first time a non-uniquely identifiable trace is requested. If no matching key is found, a KeyError is raised.- Parameters:
trace_name (str) – Trace name as in metadata[“trace_name”]
chunk (None) – Trace chunk as in metadata[“trace_chunk”]. If None this key will be ignored.
dataset (None) – Trace dataset as in metadata[“trace_dataset”]. Only for
MultiWaveformDataset
. If None this key will be ignored.
- Returns:
Index of the sample
- get_sample(idx, *args, **kwargs)[source]
Wraps
WaveformDataset.get_sample()
- Parameters:
idx – Index of the sample
args – passed to parent function
kwargs – passed to parent function
- Returns:
Return value of parent function
- get_split(split)
Returns a dataset with the requested split.
- Parameters:
split – Split name to return. Usually one of “train”, “dev”, “test”
- Returns:
Dataset filtered to the requested split.
- get_waveforms(idx=None, mask=None, **kwargs)[source]
Collects waveforms and returns them as an array.
- Parameters:
idx (int, list[int]) – Idx or list of idx to obtain waveforms for
mask (np.ndarray[bool]) – Binary mask on the metadata, indicating which traces should be returned. Can not be used jointly with idx.
kwargs – Passed to
WaveformDataset.get_waveforms()
- Returns:
Waveform array with dimensions ordered according to dimension_order e.g. default ‘NCW’ (number of traces, number of components, record samples). If the number record samples varies between different entries, all entries are padded to the maximum length.
- Return type:
np.ndarray
- property grouping
The grouping parameters for the dataset. Grouping allows to access metadata and waveforms jointly from a set of traces with a common metadata parameter. This can for example be used to access all waveforms belong to one event and building event based models. Setting the grouping parameter defines the output of
groups
and the associated methods. grouping can be either a single string or a list of strings. Each string must be a column in the metadata. By default, the grouping is None.
- property metadata
Metadata of the dataset as pandas DataFrame.
- property metadata_cache
- property missing_components
Get or set strategy for missing components
- plot_map(res='110m', connections=False, **kwargs)
Plots the dataset onto a map using the Mercator projection. Requires a cartopy installation.
- Parameters:
res (str, optional) – Resolution for cartopy features, defaults to 110m.
connections (bool, optional) – If true, plots lines connecting sources and stations. Defaults to false.
kwargs – Plotting kwargs that will be passed to matplotlib plot. Args need to be prefixed with sta_, ev_ and conn_ to address stations, events or connections.
- Returns:
A figure handle for the created figure.
- preload_waveforms(*args, **kwargs)[source]
Calls
WaveformDataset.preload_waveforms()
for all member datasets with the provided arguments.
- region_filter(domain, lat_col, lon_col, inplace=True)
Filtering of dataset based on predefined region or geometry. See also convenience functions region_filter_[source|receiver].
- Parameters:
domain (obspy.core.fdsn.mass_downloader.domain:) – The domain filter
lat_col (str) – Name of latitude coordinate column
lon_col (str) – Name of longitude coordinate column
inplace (bool) – Inplace filtering, default to true. See also
filter()
.
- Returns:
None if inplace=True, otherwise the filtered dataset.
- region_filter_receiver(domain, inplace=True)
Convenience method for region filtering by receiver location.
- region_filter_source(domain, inplace=True)
Convenience method for region filtering by source location.
- property sampling_rate
Get or set sampling rate for output
- test()
Convenience method for get_split(“test”).
- Returns:
Test dataset
- train()
Convenience method for get_split(“train”).
- Returns:
Training dataset
- train_dev_test()
Convenience method for returning training, development and test set. Equal to:
>>> self.train(), self.dev(), self.test()
- Returns:
Training dataset, development dataset, test dataset
- class WaveformDataWriter(metadata_path, waveforms_path)[source]
Bases:
object
The WaveformDataWriter for writing datasets in SeisBench format.
To improve reading performance when using the datasets, the writer groups traces into blocks and writes them into joint arrays in the hdf5 file. The exact behaviour is controlled by the
bucketer
and thebucket_size
. For details see their documentation. This packing is necessary, due to limitations in the hdf5 performance. Reading many small datasets from a hdf5 file causes the overhead of the hdf5 structure to define the read times.- Parameters:
metadata_path (str or Path) – Path to write the metadata file to
waveforms_path (str or Path) – Path to write the waveforms file to
- Returns:
None
- add_trace(metadata, waveform)[source]
Adds a trace to the writer. This does not imply that the trace is immediately written to disk, as the writer might wait to fill a bucket. The writer ensures that the order of traces in the metadata is identical to the order of calls to add_trace.
- Parameters:
metadata (dict[str, any]) – Metadata of the trace
waveform (np.ndarray) – Waveform of the trace
- Returns:
None
- property bucket_size
The maximum size of a bucket. Once adding another trace would overload the bucket, the bucket is written to disk. Defaults to 1024.
- Returns:
Bucket size
- property bucketer
The currently used bucketer, which sorts traces into buckets. If the bucketer is None, no buckets are used and all traces are written separately. By default uses the
GeometricBucketer
with default parameters. Please check that this suits your needs. In particular, make sure that the default axis matches your sample axis.- Returns:
Returns the current bucketer.
- class WaveformDataset(path, name=None, dimension_order=None, component_order=None, sampling_rate=None, cache=None, chunks=None, missing_components='pad', metadata_cache=False, **kwargs)[source]
Bases:
object
This class is the base class for waveform datasets.
A key consideration should be how the cache is used. If sufficient memory is available to keep the full data set in memory, activating the cache will yield strong performance gains. For details on the cache strategies, see the documentation of the
cache
parameter.- Parameters:
path (pathlib.Path, str) – Path to dataset.
name (str, optional) – Dataset name, default is None.
dimension_order (str, optional) – Dimension order e.g. ‘CHW’, if not specified will be assumed from config file, defaults to None.
component_order (str, optional) – Component order e.g. ‘ZNE’, if not specified will be assumed from config file, defaults to None.
sampling_rate (int, optional) – Common sampling rate of waveforms in dataset, sampling rate can also be specified as a metadata column if not common across dataset.
cache (str, optional) –
Defines the behaviour of the waveform cache. Provides three options:
”full”: When a trace is queried, the full block containing the trace is loaded into the cache and stored in memory. This causes the highest memory consumption, but also best performance when using large parts of the dataset.
”trace”: When a trace is queried, only the trace itself is loaded and stored in memory. This is particularly useful when only a subset of traces is queried, but these are queried multiple times. In this case the performance of this strategy might outperform “full”.
None: When a trace is queried, it is always loaded from disk. This mode leads to low memory consumption but high IO load. It is most likely not usable for model training.
Note that for datasets without blocks, i.e., each trace in a single array in the hdf5 file, the strategies “full” and “trace” are identical. The default cache strategy is None.
Use
preload_waveforms()
to populate the cache. Preloading the waveforms is often much faster than loading them during later application, as preloading can use sequential access. Note that it is recommended to always first filter a dataset and then preload to reduce unnecessary reads and memory consumption.chunks (list, optional) – Specify particular chunks to load. If None, loads all chunks. Defaults to None.
missing_components (str) –
Strategy to deal with missing components. Options are:
”pad”: Fill with zeros.
”copy”: Fill with values from first existing traces.
”ignore”: Order all existing components in the requested order, but ignore missing ones. This will raise an error if traces with different numbers of components are requested together.
metadata_cache – If true, metadata is cached in a lookup table. This significantly speeds up access to metadata and thereby access to samples. On the downside, this requires storing two copies of the metadata in memory. The second copy usually consumes more memory due to the less space-efficient format. Runtime differences are particularly big for large datasets.
kwargs –
- static available_chunks(path)[source]
Determines the chunks of the dataset in the given path.
- Parameters:
path – Dataset path
- Returns:
List of chunks
- property cache
Get or set the cache strategy of the dataset. For possible strategies see the constructor. Note that changing cache strategies will not cause a cache eviction.
- property chunks
Returns a list of chunks. If dataset is not chunked, returns an empty list.
- property component_order
Get or set order of components in the output.
- copy()[source]
Create a copy of the data set. All attributes are copied by value, except waveform cache entries. The cache entries are copied by reference, as the waveforms will take up most of the memory. This should be fine for most use cases, because the cache entries should anyhow never be modified. Note that the cache dict itself is not shared, such that cache evictions and inserts in one of the data sets do not affect the other one.
- Returns:
Copy of the dataset
- property data_format
Data format dictionary, describing the data format of the stored dataset. Note that this does not necessarily equals the output data format of get waveforms. To query these, use the relevant class properties.
- property dimension_order
Get or set the order of the dimension in the output.
- filter(mask, inplace=True)[source]
Filters dataset, e.g. by distance/magnitude/…, using a binary mask. Default behaviour is to perform inplace filtering, directly changing the metadata and waveforms to only keep the results of the masking query. Setting inplace equal to false will return a filtered copy of the data set. For details on the copy operation see
copy()
.- Parameters:
mask (boolean array) – Boolean mask to apply to metadata.
inplace (bool) – If true, filter inplace.
Example usage:
dataset.filter(dataset["p_status"] == "manual")
- Returns:
None if inplace=True, otherwise the filtered dataset.
- get_group_idx_from_params(params)[source]
Returns the index of the group identified by the params.
- Parameters:
params – The parameters identifying the group. For a single grouping parameter, this argument will be a single value. Otherwise this argument needs to be a tuple of keys.
- Returns:
Index of the group
- Return type:
int
- get_group_samples(idx, **kwargs)[source]
Returns the waveforms and metadata for each member of a group. For details see
get_sample()
.- Parameters:
idx (int) – Group index
kwargs – Kwargs passed to
get_sample()
- Returns:
List of waveforms, list of metadata dicts
- get_group_size(idx)[source]
Returns the number of samples in a group
- Parameters:
idx (int) – Group index
- Returns:
Size of the group
- Return type:
int
- get_group_waveforms(idx, **kwargs)[source]
Returns the waveforms for each member of a group. For details see
get_sample()
.- Parameters:
idx (int) – Group index
kwargs – Kwargs passed to
get_sample()
- Returns:
List of waveforms
- get_idx_from_trace_name(trace_name, chunk=None, dataset=None)[source]
Returns the index of a trace with given trace_name, chunk and dataset. Chunk and dataset parameters are optional, but might be necessary to uniquely identify traces for chunked datasets or for
MultiWaveformDataset
. The method will issue a warning the first time a non-uniquely identifiable trace is requested. If no matching key is found, a KeyError is raised.- Parameters:
trace_name (str) – Trace name as in metadata[“trace_name”]
chunk (None) – Trace chunk as in metadata[“trace_chunk”]. If None this key will be ignored.
dataset (None) – Trace dataset as in metadata[“trace_dataset”]. Only for
MultiWaveformDataset
. If None this key will be ignored.
- Returns:
Index of the sample
- get_sample(idx, sampling_rate=None)[source]
Returns both waveforms and metadata of a traces. Adjusts all metadata traces with sampling rate dependent values to the correct sampling rate, e.g., p_pick_samples will still point to the right sample after this operation, even if the trace was resampled.
- Parameters:
idx – Idx of sample to return
sampling_rate – Target sampling rate, overwrites sampling rate for dataset.
- Returns:
Tuple with the waveforms and the metadata of the sample.
- get_split(split)[source]
Returns a dataset with the requested split.
- Parameters:
split – Split name to return. Usually one of “train”, “dev”, “test”
- Returns:
Dataset filtered to the requested split.
- get_waveforms(idx=None, mask=None, sampling_rate=None)[source]
Collects waveforms and returns them as an array.
- Parameters:
idx (int, list[int]) – Idx or list of idx to obtain waveforms for
mask (np.ndarray[bool]) – Binary mask on the metadata, indicating which traces should be returned. Can not be used jointly with idx.
sampling_rate (float) – Target sampling rate, overwrites sampling rate for dataset
- Returns:
Waveform array with dimensions ordered according to dimension_order e.g. default ‘NCW’ (number of traces, number of components, record samples). If the number of record samples varies between different entries, all entries are padded to the maximum length.
- Return type:
np.ndarray
- property grouping
The grouping parameters for the dataset. These parameters are used to determine the
groups
and for the associated methods. grouping can be either a single string or a list of strings. Each string must be a column in the metadata. By default, the grouping is None.
- property metadata
Metadata of the dataset as pandas DataFrame.
- property metadata_cache
- property missing_components
Get or set strategy to handle missing components. For options, see the constructor.
- property name
Name of the dataset (immutable)
- property path
Path of the dataset (immutable)
- plot_map(res='110m', connections=False, **kwargs)[source]
Plots the dataset onto a map using the Mercator projection. Requires a cartopy installation.
- Parameters:
res (str, optional) – Resolution for cartopy features, defaults to 110m.
connections (bool, optional) – If true, plots lines connecting sources and stations. Defaults to false.
kwargs – Plotting kwargs that will be passed to matplotlib plot. Args need to be prefixed with sta_, ev_ and conn_ to address stations, events or connections.
- Returns:
A figure handle for the created figure.
- preload_waveforms(pbar=False)[source]
Loads waveform data from hdf5 file into cache. Fails if caching strategy is None.
- Parameters:
pbar – If true, shows progress bar. Defaults to False.
- region_filter(domain, lat_col, lon_col, inplace=True)[source]
Filtering of dataset based on predefined region or geometry. See also convenience functions region_filter_[source|receiver].
- Parameters:
domain (obspy.core.fdsn.mass_downloader.domain:) – The domain filter
lat_col (str) – Name of latitude coordinate column
lon_col (str) – Name of longitude coordinate column
inplace (bool) – Inplace filtering, default to true. See also
filter()
.
- Returns:
None if inplace=True, otherwise the filtered dataset.
- region_filter_receiver(domain, inplace=True)[source]
Convenience method for region filtering by receiver location.
Dummy datasets
The dummy datasets mostly exist for testing purposes. They are very small datasets that can be used, for example, to demonstrate certain functionality.
- class ChunkedDummyDataset(**kwargs)[source]
Bases:
BenchmarkDataset
A chunked dummy dataset visualizing the implementation of custom datasets with chunking
- class DummyDataset(**kwargs)[source]
Bases:
BenchmarkDataset
A dummy dataset visualizing the implementation of custom datasets
ETHZ dataset
- class ETHZ(**kwargs)[source]
Bases:
BenchmarkDataset
Regional benchmark dataset of publicly available waveform data & corresponding metadata in Swiss Seismological Service (SED) archive. Contains data from (2013-2020). A pre-compiled version of the benchmark dataset in compatible SeisBench format is available for download from remote root. In case of download issues, benchmark dataset is downloaded directly from source via FDSN client and converted SeisBench format.
- property client
GEOFON dataset
- class GEOFON(**kwargs)[source]
Bases:
BenchmarkDataset
GEOFON dataset consisting of both regional and teleseismic picks. Mostly contains P arrivals, but a few S arrivals are annotated as well. Contains data from 2010-2013. The dataset will be downloaded from the SeisBench repository on first usage.
INSTANCE dataset
- class InstanceCounts(**kwargs)[source]
Bases:
InstanceTypeDataset
INSTANCE dataset - Events with waveforms in counts
- class InstanceCountsCombined(**kwargs)[source]
Bases:
MultiWaveformDataset
Convenience class to jointly load
InstanceCounts
andInstanceNoise
.- Parameters:
kwargs – Passed to the constructors of both
InstanceCounts
andInstanceNoise
Iquique dataset
- class Iquique(**kwargs)[source]
Bases:
BenchmarkDataset
Iquique Benchmark Dataset of local events used for training in Woollam (2019) study (see citation).
Splits are set using standard random sampling of
seisbench.data.base.BenchmarkDataset
.
ISC-EHB dataset
- class ISC_EHB_DepthPhases(**kwargs)[source]
Bases:
BenchmarkDataset
Dataset of depth phase picks from the ISC-EHB bulletin.
LenDB dataset
- class LenDB(**kwargs)[source]
Bases:
BenchmarkDataset
Len-DB dataset from Magrini et al.
LFE stack datasets
- class LFEStacksCascadiaBostock2015(component_order='Z12', **kwargs)[source]
Bases:
BenchmarkDataset
Low-frequency earthquake stacks underneath Vancouver Island, Cascadia, Canada/USA based on the catalog by Bostock et al (2015). Compiled to SeisBench format by Münchmeyer et al (2024).
- class LFEStacksMexicoFrank2014(component_order='Z12', **kwargs)[source]
Bases:
BenchmarkDataset
Low-frequency earthquake stacks underneath Guerrero, Mexico based on the catalog by Frank et al (2014). Compiled to SeisBench format by Münchmeyer et al (2024).
- class LFEStacksSanAndreasShelly2017(component_order='Z12', **kwargs)[source]
Bases:
BenchmarkDataset
Low-frequency earthquake stacks on the San Andreas Fault, California, USA based on the catalog by Shelly (2014). Compiled to SeisBench format by Münchmeyer et al (2024).
NEIC datasets
- class MLAAPDE(**kwargs)[source]
Bases:
BenchmarkDataset
MLAAPDE dataset from Cole et al. (2023)
Note that the SeisBench version is not identical to the precompiled version distributed directly through USGS but uses a different data selection. In addition, custom versions of MLAAPDE can be compiled with the software provided by the original authors. These datasets can be exported in SeisBench format.
- class NEIC(**kwargs)[source]
Bases:
BenchmarkDataset
NEIC dataset from Yeck and Patton
OBS datasets
- class OBS(component_order='Z12H', **kwargs)[source]
Bases:
BenchmarkDataset
OBS Benchmark Dataset of local events
Default component order is ‘Z12H’. You can easily omit one component like, e.g., hydrophone by explicitly passing parameter ‘component_order=”Z12”’. This way, the dataset can be input to land station pickers that use only 3 components.
- class OBST2024(**kwargs)[source]
Bases:
BenchmarkDataset
The OBS dataset from Niksejel & Zhang (2024)
PNW datasets
- class PNW(**kwargs)[source]
Bases:
BenchmarkDataset
PNW ComCat dataset from Ni et al. (2023)
- class PNWAccelerometers(**kwargs)[source]
Bases:
BenchmarkDataset
PNW Accelerometers dataset from Ni et al. (2023)
- class PNWExotic(**kwargs)[source]
Bases:
BenchmarkDataset
PNW Exotic dataset from Ni et al. (2023)
- class PNWNoise(**kwargs)[source]
Bases:
BenchmarkDataset
PNW Noise dataset from Ni et al. (2023)
Southern California datasets
- class Meier2019JGR(**kwargs)[source]
Bases:
BenchmarkDataset
Southern californian part of the dataset from Meier et al. (2019) Note that due to the missing Japanese data, there is a massive overrepresentation of noise samples.
Meier, M.-A., Ross, Z. E., Ramachandran, A., Balakrishna, A., Nair, S., Kundzicz, P., et al. (2019). Reliable real‐time seismic signal/noise discrimination with machine learning. Journal of Geophysical Research: Solid Earth, 124. https://doi.org/10.1029/2018JB016661
- class Ross2018GPD(**kwargs)[source]
Bases:
BenchmarkDataset
Pick dataset belonging to the publication: Zachary E. Ross, Men‐Andrin Meier, Egill Hauksson, Thomas H. Heaton; Generalized Seismic Phase Detection with Deep Learning. Bulletin of the Seismological Society of America 2018;; 108 (5A): 2894–2901. https://doi.org/10.1785/0120180080
- class Ross2018JGRFM(component_order='Z', **kwargs)[source]
Bases:
BenchmarkDataset
First motion polarity dataset belonging to the publication: Ross, Z. E., Meier, M.‐A., & Hauksson, E. (2018). P wave arrival picking and first‐motion polarity determination with deep learning. Journal of Geophysical Research: Solid Earth, 123, 5120– 5129. https://doi.org/10.1029/2017JB015251
Note that this dataset contains picks as well.
Warning
This dataset only contains traces for the Z component. It therefore ignores the default SeisBench the component_order.
- class Ross2018JGRPick(component_order='Z', **kwargs)[source]
Bases:
BenchmarkDataset
Pick dataset belonging to the publication: Ross, Z. E., Meier, M.‐A., & Hauksson, E. (2018). P wave arrival picking and first‐motion polarity determination with deep learning. Journal of Geophysical Research: Solid Earth, 123, 5120– 5129. https://doi.org/10.1029/2017JB015251
Note that this dataset contains polarities as well.
Warning
This dataset only contains traces for the Z component. It therefore ignores the default SeisBench the component_order.
- class SCEDC(**kwargs)[source]
Bases:
BenchmarkDataset
SCEDC waveform archive (2000-2020).
Splits are set using standard random sampling of :py:class: BenchmarkDataset.
STEAD dataset
- class STEAD(**kwargs)[source]
Bases:
BenchmarkDataset
STEAD dataset from Mousavi et al.
Using the train/test split from the EQTransformer Github repository train/dev split defined in SeisBench
TXED dataset
- class TXED(**kwargs)[source]
Bases:
BenchmarkDataset
TEXD dataset from Chen et al.
train/dev/test split defined in SeisBench.