seisbench.generate

Generators

class GenericGenerator(dataset)[source]

Bases: Dataset

add_augmentations(augmentations)[source]

Adds a list of augmentations to the generator. Can not be used as decorator.

Parameters:: augmentations (list[callable]) – List of augmentations

augmentation(f)[source]: Decorator for augmentations.

class GroupGenerator(dataset)[source]

Bases: GenericGenerator

This data generator follows the same principle as the GenericGenerator but instead of single traces always loads groups into the state dict. The grouping parameter of the underlying dataset needs to be set.

class SteeredGenerator(dataset, metadata)[source]

Bases: GenericGenerator

This data generator follows the same principles as the GenericGenerator(). However, in contrast to the GenericGenerator() the generator is controlled by a dataframe with control information. Each row in the control dataframe corresponds to one example output by the generator. The dataframe holds two types of information. First, information identifying the traces, provided using the trace_name (required), trace_chunk (optional), and trace_dataset (optional). See the description of get_idx_from_trace_name() for details. Second, additional information for the augmentations. This additional information is stored in state_dict[“_control_”] as a dict. This generator is particularly useful for evaluation, e.g., when extracting predefined windows from a trace.

Note that the “_control_” group will usually not be modified by augmentations. This means, that for example after a window selection, sample positions might be off. To automatically handle these, you must explicitly put the relevant control information into the state_dict metadata of the relevant key.

Warning

This generator should in most cases not be used for changing label distributions by resampling the dataset. For this application, we recommend using a pytorch Sampler.

Parameters:

dataset (seisbench.data.WaveformDataset or seisbench.data.MultiWaveformDataset) – The underlying SeisBench data set
metadata (pandas.DataFrame) – The additional information as pandas dataframe. Each row corresponds to one sample from the generator.

Window selection

class AlignGroupsOnKey(alignment_key, completeness=0.0, fill_value=0, sample_axis=-1, key='X')[source]

Bases: object

Aligns all waveforms according to a metadata key. After alignment, the metadata key will be at the same sample in all examples. All traces with a nan-value in the alignment key well be dropped.

To align traces according to wall time, you have to write the sample offset into the metadata first. This can be done using the UTCOffsets augmentation.

Warning

Assumes identical sampling rate and shape (except number of samples) for all traces.

Parameters:

alignment_key (str) – Metadata key to align traces on.
completeness (float) – Required fraction of traces (between 0 and 1) that need to exist to keep the sample. Samples at the start and end of the trace will be truncated if not enough input traces have waveforms available for the samples. This function can be used to avoid sparse output.
fill_value (float) – Value used in the output for samples without input data.
sample_axis (int) – sample axis in the input
key (str, tuple[str, str]) – The keys for reading from and writing to the state dict. If key is a single string, the corresponding entry in state dict is modified. Otherwise, a 2-tuple is expected, with the first string indicating the key to read from and the second one the key to write to.

class FixedWindow(p0=None, windowlen=None, strategy='fail', axis=-1, key='X')[source]

Bases: object

A simple windower that returns fixed windows. In addition, the windower rewrites all metadata ending in “_sample” to point to the correct sample after window selection. Window start and length can be set either at initialization or separately in each call. The later is primarily intended for more complicated windowers inheriting from FixedWindow.

Parameters:

p0 (None or int) – Start position of the trace. If p0 is negative, this will be treated as identifying a sample before the start of the trace. This is in contrast to standard list indexing with negative indices in Python, which counts items from the end of the list. Negative p0 is not possible with the strategy “fail”.
windowlen (None or int) – Window length
strategy (str) –
Strategy to mitigate insufficient data. Options are:
- ”fail”: Raises a ValueError
- ”pad”: Adds zero padding to the window
- ”move”: Moves the start to the closest possible position to achieve sufficient trace length. The resulting trace will be aligned to the input trace on one of the ends, depending if parts before (left aligned) or after the trace (right aligned) were requested. Will fail if total trace length is shorter than requested window length.
- ”variable”: Returns shorter length window, resulting in possibly varying window size. Might return empty window if requested window is completely outside target range.
axis (int) – Axis along which the window selection should be performed
key (str, tuple[str, str]) – The keys for reading from and writing to the state dict. If key is a single string, the corresponding entry in state dict is modified. Otherwise, a 2-tuple is expected, with the first string indicating the key to read from and the second one the key to write to.

class RandomWindow(low=None, high=None, windowlen=None, **kwargs)[source]

Bases: FixedWindow

Selects a window within the range [low, high) randomly with a uniform distribution. If there are no candidates fulfilling the criteria, the window selection depends on the strategy. For “fail” or “move” a ValueError is raise. For “variable” only the part between low and high is returned. If high < low, this part will be empty. For “pad” the same as for “variable” will be returned, but padded to the correct length. The padding will be added randomly to both sides.

Parameters:

low (None or int) – The lowest allowed index for the start sample. The sample at this position can be included in the output.
high (None or int) – The highest allowed index for the end. The sample at position high can not be included in the output
kwargs – Parameters passed to the init method of FixedWindow.

class SelectOrPadAlongAxis(n, adjust_metadata=True, repeat=True, axis=0, key='X')[source]

Bases: object

Changes the length of an axis from m to n by: - padding/repeating data if m < n - random selection if m > n

In addition, can adjust the length of the metadata arrays accordingly. This augmentation is primarily intended to apply to grouped data. The input data must be an array.

Data is padded with zeros, metadata with values depending on the dtype (NaN for float, 0 for int, empty string for str).

Parameters:

n (int) – Length of output
adjust_metadata (None) – If true, adjusts metadata. Otherwise, leaves metadata unaltered.
repeat (bool) – If true, repeat data instead of padding
axis (int) – Axis along which reshaping should be applied
key (str, tuple[str, str]) – The keys for reading from and writing to the state dict. If key is a single string, the corresponding entry in state dict is modified. Otherwise, a 2-tuple is expected, with the first string indicating the key to read from and the second one the key to write to.

class SlidingWindow(timestep, windowlen, **kwargs)[source]

Bases: FixedWindow

Generates sliding windows and adds a new axis for windows as first axis. All metadata entries are converted to arrays of entries. Only complete windows are returned and a possible remainder is truncated. In particular, if the available data is shorter than the number of windows, an empty array is returned.

Parameters:

timestep (int) – Difference between two consecutive window starts in samples
windowlen (int) – Length of the output window
kwargs – All kwargs are passed directly to FixedWindow

class SteeredWindow(windowlen, start_key='start_sample', end_key='end_sample', window_output_key='window_borders', **kwargs)[source]

Bases: FixedWindow

Window selection that relies on the “_control_” dict from the SteeredGenerator. Selects a window of given length with the window defined by start and end sample in the middle. If no length is given, the window between start and end sample is returned. If there are insufficient samples on one side of the target window, the window will be moved. If the total number of samples is insufficient, the window will start at the earliest possible sample. The behavior in this case depends on the chosen strategy for FixedWindow.

Parameters:

windowlen (int or None) – Length of the window to be returned. If None, will be determined using the start and end samples from the “_control_” dict.
start_key (str) – Key of the start sample in the “_control_” dict
end_key (str) – Key of the end sample in the “_control_” dict
window_output_key (str) – The sample start and end will be written as numpy array to this key in the state_dict
kwargs – Parameters passed to the init method of FixedWindow.

class UTCOffsets(time_key='trace_start_time', offset_key='trace_offset_sample', key='X')[source]

Bases: object

Write the offset in samples between the different traces into the metadata. The offset of the trace with the earliest start time is set to 0. In combination with AlignGroupsOnKey, this can be used to align traces based on wall time.

Parameters:

time_key (str) – Metadata key to read the start times from.
offset_key (str) – Metadata key to write offset samples to.
key (str, tuple[str, str]) – The keys for reading from and writing to the state dict. If key is a single string, the corresponding entry in state dict is modified. Otherwise, a 2-tuple is expected, with the first string indicating the key to read from and the second one the key to write to.

class WindowAroundSample(metadata_keys, samples_before=0, selection='first', **kwargs)[source]

Bases: FixedWindow

Creates a window around a sample defined in the metadata. :type metadata_keys: :param metadata_keys: Metadata key or list of metadata keys to use for window selection. The corresponding metadata entries are assumed to be in sample units. The generator will return a window starting at the first sample, if all relevant metadata entries are NaN.

Parameters:

samples_before (int) – The number of samples to include before the target sample.
selection (str) –

Selection strategy in case multiple metadata keys are provided and have non-NaN values.
Options are:
- ”first”: use the first available key
- ”random”: use uniform random selection among the keys
kwargs – Parameters passed to the init method of FixedWindow.

Labeling

class DetectionLabeller(p_phases, s_phases=None, factor=1.4, fixed_window=None, **kwargs)[source]

Bases: SupervisedLabeller

Create detection labels from picks. The labeler can either use fixed detection length or determine the length from the P to S time as in Mousavi et al. (2020, Nature communications). In the latter case, detections range from P to S + factor * (S - P) and are only annotated if both P and S phases are present. All detections are represented through a boxcar time series with the same length as the input waveforms. For both P and S, lists of phases can be passed of which the sequentially first one will be used. All picks with NaN sample are treated as not present.

Parameters:

p_phases (str, list[str]) – (List of) P phase metadata columns
s_phases (str, list[str]) – (List of) S phase metadata columns
factor (float) – Factor for length of window after S onset
fixed_window (int) – Number of samples for fixed window detections. If none, will determine length from P to S time.

label(X, metadata)[source]

to be overwritten in subclasses

Parameters:

X (numpy.ndarray) – trace from state dict
metadata (dict) – metadata from state dict

Returns:

Label

Return type:

numpy.ndarray

class PickLabeller(label_columns=None, noise_column=True, model_labels=None, **kwargs)[source]

Bases: SupervisedLabeller, ABC

Abstract class for PickLabellers implementing common functionality

Parameters:

label_columns (Union[list[str], dict[str, str], None]) – Specify the columns to use for pick labelling, defaults to None and columns are inferred from metadata. Columns can either be specified as list or dict. For a list, each entry is treated as its own pick type. The dict should contain a mapping from column name to pick label, e.g., {“trace_Pg_arrival_sample”: “P”}. This allows to group phases, e.g., Pg, Pn, pP all being labeled as P phase. Multiple phases present within a window can lead to the labeller annotating multiple picks for the same label.
noise_column (bool) – If False, disables normalization of phases and noise label, default is True
model_labels (Union[str, list[str], None]) – Order of the labels, defaults to None. If None, the labels will be in alphabetical order with Noise as the last label. To get the labels from a WaveformModel, use by model.labels.
kwargs – Kwargs are passed to the SupervisedLabeller superclass

class ProbabilisticLabeller(shape='gaussian', sigma=10, **kwargs)[source]

Bases: PickLabeller

Create supervised labels from picks. The picks in example are represented probabilistically with shapes of:

Note that the parameter sigma has a different meaning depending on the chosen shape. In particular, the total probability mass of the picks will not be constant.

All picks with NaN sample are treated as not present. The noise class is automatically created as \(\max \left(0, 1 - \sum_{n=1}^{c} y_{j} \right)\).

Parameters:

shape (str) – Shape of the label. One of gaussian, triangle, or box.
sigma (int) – Variance of Gaussian (gaussian), half-width of triangle (triangle) or box function (box) label representation in samples, defaults to 10.

label(X, metadata)[source]

to be overwritten in subclasses

Parameters:

X (numpy.ndarray) – trace from state dict
metadata (dict) – metadata from state dict

Returns:

Label

Return type:

numpy.ndarray

class ProbabilisticPointLabeller(position=0.5, **kwargs)[source]

Bases: ProbabilisticLabeller

This labeller annotates windows true their pick probabilities at a certain point in the window. The output is a probability vector, i.e., [0.5, 0.2, 0.3] could indicate 50 % P wave, 20 % S wave, 30 % noise. This labelling scheme is more flexible than the StandardLabeller and can encode for example the centrality of a pick within a window or multiple picks within the same window.

This class relies on the ProbabilisticLabeller, but instead of probability curves only returns the probabilities at one point.

Parameters:

position (float) – Position to label as fraction of the total trace length. Defaults to 0.5, i.e., the center of the window.
kwargs – Passed to ProbabilisticLabeller

label(X, metadata)[source]

to be overwritten in subclasses

Parameters:

X (numpy.ndarray) – trace from state dict
metadata (dict) – metadata from state dict

Returns:

Label

Return type:

numpy.ndarray

class StandardLabeller(on_overlap='label-first', low=None, high=None, **kwargs)[source]

Bases: PickLabeller

Create supervised labels from picks. The entire example is labelled as a single class/pick. For cases where multiple picks overlap in the input window, a number of options can be specified:

‘label-first’ Only use first pick as label example.

‘fixed-relevance’ Use pick closest to centre point of window as example.

‘random’ Use random choice as example label.

In general, it is recommended to set low and high, as it is very difficult for models to spot if a pick is just inside or outside the window. This can lead to noisy predictions that strongly depend on the marginal label distribution in the training set.

Parameters:

low (int, None) – Only take into account picks after or at this sample. If None, uses low=0. If negative, counts from the end of the trace.
high (int, None) – Only take into account picks before this sample. If None, uses high=num_samples. If negative, counts from the end of the trace.
on_overlap (str, optional) – Method used to label when multiple picks present in window, defaults to “label-first”

label(X, metadata)[source]

to be overwritten in subclasses

Parameters:

X (numpy.ndarray) – trace from state dict
metadata (dict) – metadata from state dict

Returns:

Label

Return type:

numpy.ndarray

class StepLabeller(**kwargs)[source]

Bases: PickLabeller

Create supervised labels from picks. The picks are represented by probability curves with value 0 before the pick and 1 afterwards. The output contains one channel per pick type and no noise channel.

All picks with NaN sample are treated as not present.

label(X, metadata)[source]

to be overwritten in subclasses

Parameters:

X (numpy.ndarray) – trace from state dict
metadata (dict) – metadata from state dict

Returns:

Label

Return type:

numpy.ndarray

class SupervisedLabeller(label_type, dim, key=('X', 'y'))[source]

Bases: ABC

Supervised classification labels. Performs simple checks for standard supervised classification labels.

Parameters:

label_type (str) – The type of label either: ‘multi_label’, ‘multi_class’, ‘binary’.
dim (int) – Dimension over which labelling will be applied.
key (str, tuple[str, str]) – The keys for reading from and writing to the state dict. If key is a single string, the corresponding entry in state dict is modified. Otherwise, a 2-tuple is expected, with the first string indicating the key to read from and the second one the key to write to.

abstract label(X, metadata)[source]

to be overwritten in subclasses

Parameters:

X (numpy.ndarray) – trace from state dict
metadata (dict) – metadata from state dict

Returns:

Label

Return type:

numpy.ndarray

box_pick(onset, length, sigma)[source]

Create box representation of pick in time series.

Parameters:

onset (float) – The nearest sample to pick onset
length (int) – The length of the trace time series in samples
sigma (float) – The half width of the box distribution in samples

Return y:

1D time series with box representation of pick

Return type:

np.ndarray

gaussian_pick(onset, length, sigma)[source]

Create probabilistic representation of pick in time series. PDF function given by:

\[\mathcal{N}(\mu,\,\sigma^{2})\]

Parameters:

onset (float) – The nearest sample to pick onset
length (int) – The length of the trace time series in samples
sigma (float) – The variance of the Gaussian distribution in samples

Return prob_pick:

1D time series with probabilistic representation of pick

Return type:

np.ndarray

triangle_pick(onset, length, sigma)[source]

Create triangle representation of pick in time series.

Parameters:

onset (float) – The nearest sample to pick onset
length (int) – The length of the trace time series in samples
sigma (float) – The half width of the triangle distribution in samples

Return y:

1D time series with triangle representation of pick

Return type:

np.ndarray

Miscellaneous

class AddGap(axis=-1, key='X', metadata_picks_in_gap_threshold=None, label_keys=None, noise_id={'y': -1})[source]

Bases: object

Adds a gap into the data by zeroing out entries.

Parameters:

axis (int) – Sample dimension, defaults to -1.
key (str, tuple[str, str]) – The keys for reading from and writing to the state dict. If key is a single string, the corresponding entry in state dict is modified. Otherwise, a 2-tuple is expected, with the first string indicating the key to read from and the second one the key to write to.
metadata_picks_in_gap_threshold (int, None) – If it is not None, check whether the picks in the metadata is within the gap. If a pick is within the gap and the distance from the pick to the gap border is larger than metadata_picks_in_gap_threshold (unit: sample), the corresponding arrival sample in the metadata will be set to NaN. If it is None, the metadata will not be modified
label_keys (str, tuple[str,str], None) – Specify the labels to which the gap is applied
noise_id (dict) – {key of labels containing noise –> index of the noise column}. For example, noise_id={“y”, -1} indicate that state_dict[“y”][0][…, -1, …] is the noise label.

class ChangeDtype(dtype, key='X')[source]

Bases: object

Copies the data while changing the data type to the provided one

Parameters:

dtype (numpy.dtype) – Target data type
key (str, tuple[str, str]) – The keys for reading from and writing to the state dict. If key is a single string, the corresponding entry in state dict is modified. Otherwise, a 2-tuple is expected, with the first string indicating the key to read from and the second one the key to write to.

class ChannelDropout(axis=-2, key='X', check_meta_picks_in_gap=False, label_keys=None, noise_id={'y': -1})[source]

Bases: object

Similar to Dropout, zeros out between 0 and the c - 1 channels randomly. Outputs are multiplied by the inverse of the fraction of remaining channels. As for regular Dropout, this ensures that the output “energy” is unchanged.

Parameters:

axis (int) – Channel dimension, defaults to -2.
key (str, tuple[str, str]) – The keys for reading from and writing to the state dict. If key is a single string, the corresponding entry in state dict is modified. Otherwise, a 2-tuple is expected, with the first string indicating the key to read from and the second one the key to write to.
check_meta_picks_in_gap (bool) – If true, check whether all channels are zero after channel dropping and if so, set phase arrivals to NaN.
label_keys (str, tuple[str,str], None) – Specify the labels to which the gap is applied
noise_id (dict) – {key of labels containing noise –> index of the noise column}. For example, noise_id={“y”, -1} indicate that state_dict[“y”][0][…, -1, …] is the noise label.

class Copy(key=('X', 'Xc'))[source]

Bases: object

A copy augmentation. Maps data from a given key in the state_dict to a new key.

Parameters:: key (str, tuple[str, str]) – The keys for reading from and writing to the state dict. A a 2-tuple is expected, with the first string indicating the key to read from and the second one the key to write to.

class Filter(N, Wn, btype='low', analog=False, forward_backward=False, axis=-1, key='X')[source]

Bases: object

Implements a filter augmentation, closely based on scipy.signal.butter. Please refer to the scipy documentation for more detailed description of the parameters.

The filter can also be applied to unaligned groups, i.e., lists of numpy arrays. The list is not taken into account for the enumeration of the axis, i.e., axis=0 will refer to the first axis in every array within the list.

Parameters:

N (int) – Filter order
Wn (list/array of float) – The critical frequency or frequencies
btype (str) – The filter type: ‘lowpass’, ‘highpass’, ‘bandpass’, ‘bandstop’
analog (bool) – When True, return an analog filter, otherwise a digital filter is returned.
forward_backward (bool) – If true, filters once forward and once backward. This doubles the order of the filter and makes the filter zero-phase.
axis (int) – Axis along which the filter is applied.
key (str, tuple[str, str]) – The keys for reading from and writing to the state dict. If key is a single string, the corresponding entry in state dict is modified. Otherwise, a 2-tuple is expected, with the first string indicating the key to read from and the second one the key to write to.

class FilterKeys(include=None, exclude=None)[source]

Bases: object

Filters keys in the state dict. Can be used to remove keys from the output that can not be collated by pytorch or are not required anymore. Either included or excluded keys can be defined.

Parameters:

include (list[str] or None) – Only these keys will be present in the output.
exclude (list[str] or None) – All keys except these keys will be present in the output.

class GaussianNoise(scale=(0, 0.15), key='X')[source]

Bases: object

Adds point-wise independent Gaussian noise to an array.

Parameters:

scale (tuple[float, float]) – Tuple of minimum and maximum relative amplitude of the noise. Relative amplitude is defined as the quotient of the standard deviation of the noise and the absolute maximum of the input array.
key (str, tuple[str, str]) – The keys for reading from and writing to the state dict. If key is a single string, the corresponding entry in state dict is modified. Otherwise, a 2-tuple is expected, with the first string indicating the key to read from and the second one the key to write to.

class Normalize(demean_axis=None, detrend_axis=None, amp_norm_axis=None, amp_norm_type='peak', eps=1e-10, key='X')[source]

Bases: object

A normalization augmentation that allows demeaning, detrending and amplitude normalization (in this order).

The normalization can also be applied to unaligned groups, i.e., lists of numpy arrays. The list is not taken into account for the enumeration of the axis, i.e., demean_axis=0 will refer to the first axis in every array within the list.

Parameters:

demean_axis (int, None) – The axis (single axis or tuple) which should be jointly demeaned. None indicates no demeaning.
detrend_axis (int, None) – The axis along with detrending should be applied. None indicates no normalization.
amp_norm_axis (int, None) – The axis (single axis or tuple) which should be jointly amplitude normalized. None indicates no normalization.
amp_norm_type (str) – Type of amplitude normalization. Supported types: - “peak”: division by the absolute peak of the trace - “std”: division by the standard deviation of the trace
eps (float) – Epsilon value added in amplitude normalization to avoid division by zero
key (str, tuple[str, str]) – The keys for reading from and writing to the state dict. If key is a single string, the corresponding entry in state dict is modified. Otherwise, a 2-tuple is expected, with the first string indicating the key to read from and the second one the key to write to.

class NullAugmentation[source]

Bases: object

This augmentation does not perform any modification on the state dict. It is primarily intended to be used as a no-op for OneOf.

class OneOf(augmentations, probabilities=None)[source]

Bases: object

Runs one of the augmentations provided, choosing randomly each time called.

Parameters:

augmentations (callable) – A list of augmentations
probabilities (list/array/tuple of scalar) – Probability for each augmentation to be used. Probabilities will automatically be normed to sum to 1. If None, equal probability is assigned to each augmentation.

property probabilities

class RandomArrayRotation(keys='X', axis=-1)[source]

Bases: object

Randomly rotates a set of arrays, i.e., shifts samples along an axis and puts the end samples to the start. The same rotation will be applied to each array. All arrays need to have the same length along the target axis.

Warning

This augmentation does not modify the metadata, as positional entries anyhow become non-unique after rotation. Workflows should therefore always first generate labels from metadata and then jointly rotate data and labels.

Parameters:

keys (Union[list[Union[str, tuple[str, str]]], str, tuple[str,str]]) – Single key specification or list of key specifications. Each key specification is either a string, for identical input and output keys, or as a tuple of two strings, input and output keys. Defaults to “X”.
axis (Union[int, list[int]]) – Sample axis. Either a single integer or a list of integers for multiple keys. If a single integer but multiple keys are provided, the same axis will be used for each key. Defaults to -1.