Base classes

General and WaveformModels

class ActivationLSTMCell(input_size, hidden_size, gate_activation=<function hard_sigmoid>, recurrent_dropout=0)[source]

Bases: Module

LSTM Cell using variable gating activation, by default hard sigmoid

If gate_activation=torch.sigmoid this is the standard LSTM cell

Uses recurrent dropout strategy from https://arxiv.org/abs/1603.05118 to match Keras implementation.

forward(input, state)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights()[source]

class CustomLSTM(cell, *cell_args, bidirectional=True, **cell_kwargs)[source]

Bases: Module

LSTM to be used with custom cells

forward(input, state=None)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class GroupingHelper(grouping)[source]

Bases: object

A helper class for grouping streams for the annotate function. In most cases, no direct interaction with this class is required. However, when implementing new models, subclassing this helper allows for more flexibility.

group_stream(stream, strict, min_length_s, comp_dict)[source]

Perform grouping of input stream. In addition, enforces the strict mode, i.e, if strict=True only keeps segments where all components are available, and discards segments that are too short. For grouping=channel no checks are performed.

Parameters:

stream (Stream) – Input stream
strict (bool) – If streams should be treated strict as for waveform model. Only applied if grouping is “full”.
min_length_s (float) – Minimum length of a segment in seconds. Only applied if grouping is “full”.
comp_dict (dict[str, int]) – Mapping of component characters to int. Only used if grouping is “full”.

Return type:

list[list[Trace]]

Returns:

Grouped list of list traces.

property grouping

static trace_id_without_component(trace)[source]

class SeisBenchModel(citation=None)[source]

Bases: Module

Base SeisBench model interface for processing waveforms.

Parameters:: citation (str, optional) – Citation reference, defaults to None.

property citation

property device: Returns the device of the model parameters. Assumes all parameters are on the same device.

property dtype: Returns the dtype of the model parameters. Assumes all parameters are of the same dtype.

classmethod from_pretrained(name, version_str='latest', update=False, force=False, wait_for_file=False)[source]

Load pretrained model with weights.

A pretrained model weights consists of two files. A weights file [name].pt and a [name].json config file. The config file can (and should) contain the following entries, even though all arguments are optional:

“docstring”: A string documenting the pipeline. Usually also contains information on the author.
“model_args”: Argument dictionary passed to the init function of the pipeline.
“seisbench_requirement”: The minimal version of SeisBench required to use the weights file.
“default_args”: Default args for the annotate()/classify() functions. These arguments will supersede any potential constructor settings.
“version”: The version string of the model. For all but the latest version, version names should furthermore be denoted in the file names, i.e., the files should end with the suffix “.v[VERSION]”. If no version is specified in the json, the assumed version string is “1”.

Warning

Even though the version is set to “latest” by default, this will only use the latest version locally available. Only if no weight is available locally, the remote repository will be queried. This behaviour is implemented for privacy reasons, as it avoids contacting the remote repository for every call of the function. To explicitly update to the latest version from the remote repository, set update=True.

Parameters:

name (str) – Model name prefix.
version_str (str) – Version of the weights to load. Either a version string or “latest”. The “latest” model is the model with the highest version number.
force (bool, optional) – Force execution of download callback, defaults to False
update (bool) – If true, downloads potential new weights file and config from the remote repository. The old files are retained with their version suffix.
wait_for_file (bool, optional) – Whether to wait on partially downloaded files, defaults to False

Returns:

Model instance

Return type:

SeisBenchModel

abstractmethod get_model_args()[source]

Obtain all model parameters for saving.

Returns:: Dictionary of all parameters for a model to store during saving.
Return type:: Dict

classmethod list_pretrained(details=False, remote=True)[source]

Returns list of available pretrained weights and optionally their docstrings.

Parameters:

details (bool) – If true, instead of a returning only a list, also return their docstrings. By default, returns the docstring of the “latest” version for each weight. Note that this requires to download the json files for each model in the background and is therefore slower. Defaults to false.
remote (bool) – If true, reports both locally available weights and versions in the remote repository. Otherwise only reports local versions.

Returns:

List of available weights or dict of weights and their docstrings

Return type:

list or dict

classmethod list_versions(name, remote=True)[source]

Returns list of available versions for a given weight name.

Parameters:

name (str) – Name of the queried weight
remote (bool) – If true, reports both locally available versions and versions in the remote repository. Otherwise only reports local versions.

Returns:

List of available versions

Return type:

list[str]

classmethod load(path, version_str=None, **kwargs)[source]

Load a SeisBench model from local path.

For more information on the SeisBench model format see py:func:save.

Parameters:

path (pathlib.Path ot str) – Define the path to the SeisBench model.
version_str (str, None) – Version string of the model. If none, no version string is appended.

Returns:

Model instance

Return type:

SeisBenchModel

property name

save(path, weights_docstring='', version_str=None)[source]

Save a SeisBench model locally.

SeisBench models are stored inside the directory ‘path’. SeisBench models are saved in 2 parts, the model configuration is stored in JSON format [path][.json], and the underlying model weights in PyTorch format [path][.pt]. Where ‘path’ is the output path to store. The suffixes are appended to the path parameter automatically.

In addition, the models can have a version string which is appended to the json and the pt path. For example, setting version_str=”1” will append .v1 to the file names.

The model config should contain the following information, which is automatically created from the model instance state:

“weights_docstring”: A string documenting the pipeline. Usually also contains information on the author.

“model_args”: Argument dictionary passed to the init function of the pipeline.

“seisbench_requirement”: The minimal version of SeisBench required to use the weights file.

“default_args”: Default args for the annotate()/classify() functions.

Non-serializable arguments (e.g. functions) cannot be saved to JSON, so are not converted.

Parameters:

path (pathlib.Path or str) – Define the path to the output model.
weights_docstring (str, default to '') – Documentation for the model weights (training details, author etc.)
version_str (str, None) – Version string of the model. If none, no version string is appended.

to_preferred_device(verbose=False)[source]

Move the model to an accelerator if available. Currently, this function checks for CUDA, MPS and XPU accelerators (in this order).

The function does not automatically move models to TPU. Check out torch_xla to see how to move models to TPU.

Parameters:: verbose (bool) – If true, prints the new device of the model.

property weights_docstring

property weights_version

class WaveformModel(component_order=None, sampling_rate=None, output_type=None, default_args=None, in_samples=None, pred_sample=0, labels=None, filter_args=None, filter_kwargs=None, grouping='instrument', allow_padding=False, **kwargs)[source]

Bases: SeisBenchModel, ABC

Abstract interface for models processing waveforms. Based on the properties specified by inheriting models, WaveformModel automatically provides the respective annotate()/classify() functions. Both functions take obspy streams as input. The annotate() function has a rather strictly defined output, i.e., it always outputs obspy streams with the annotations. These can for example be functions of pick probability over time. In contrast, the classify() function can tailor it’s output to the model type. For example, a picking model might output picks, while a magnitude estimation model might only output a scalar magnitude. Internally, classify() will usually rely on annotate() and simply add steps to it’s output.

For details see the documentation of these functions.

The following parameters are available for the annotate/classify functions:

Argument	Description	Default value
batch_size	Batch size for the model	256
overlap	Overlap between prediction windows. Values between 0 and 1 are treated as fractions of the window length. Values above 1 a sample counts. (only for window prediction models)	0
stacking	Stacking method for overlapping windows (only for window prediction models). Options are ‘max’ and ‘avg’.	avg
stride	Stride in samples (only for point prediction models)	1
strict	If true, only annotate if recordings for all components are available, otherwise impute missing data with zeros.	False
flexible_horizontal_components	If true, accepts traces with Z12 components as ZNE and vice versa. This is usually acceptable for rotationally invariant models, e.g., most picking models.	True
zerophase_resample	If true, the filter applied before resampling for anti-aliasing is zero-phase. Otherwise, uses causal filter. Note that using a different filter in application than in training might cause small out of distribution issues	True

Hint

Please note that the default parameters can be superseded by the pretrained model weights. Check model.default_args to see which parameters are overwritten.

Parameters:

component_order (list | str | None) – Specify component order (e.g. ‘ZNE’), defaults to None.
sampling_rate (float | None) – Sampling rate of the model, defaults to None. If sampling rate is not None, the annotate and classify functions will automatically resample incoming traces and validate correct sampling rate if the model overwrites annotate_stream_pre().
output_type (str | None) –
The type of output from the model. Current options are:
- ”point” for a point prediction, i.e., the probability of containing a pick in the window or of a pick at a certain location. This will provide an annotate() function. If an classify_aggregate() function is provided by the inheriting model, this will also provide a classify() function.
- ”array” for prediction curves, i.e., probabilities over time for the arrival of certain wave types. This will provide an annotate() function. If an classify_aggregate() function is provided by the inheriting model, this will also provide a classify() function.
- ”regression” for a regression value, i.e., the sample of the arrival within a window. This will only provide a classify() function.
default_args (dict[str, Any] | None) – Default arguments to use in annotate and classify functions
in_samples (int | None) – Number of input samples in time
pred_sample (int | tuple[int, int] | None) – For a “point” prediction: sample number of the sample in a window for which the prediction is valid. For an “array” prediction: a tuple of first and last sample defining the prediction range. Note that the number of output samples and input samples within the given range are not required to agree.
labels (str | list[str] | None) – Labels for the different predictions in the output, e.g., Noise, P, S. If a function is passed, it will be called for every label generation and be provided with the stats of the trace that was annotated.
filter_args (tuple | None) – Arguments to be passed to obspy.filter() in annotate_stream_pre()
filter_kwargs (dict[str, Any] | None) – Keyword arguments to be passed to obspy.filter() in annotate_stream_pre()
grouping (str | GroupingHelper) – Level of grouping for annotating streams. Supports “instrument”, “channel” and “full”. Alternatively, a custom GroupingHelper can be passed.
allow_padding (bool) – If True, annotate will pad different windows if they have different sizes. This is useful, for example, for multi-station methods.
kwargs – Kwargs are passed to the superclass

annotate(stream, copy=True, **kwargs)[source]

Annotates an obspy stream using the model based on the configuration of the WaveformModel superclass. For example, for a picking model, annotate will give a characteristic function/probability function for picks over time. The annotate function contains multiple subfunctions, which can be overwritten individually by inheriting models to accommodate their requirements. These functions are:

annotate_stream_pre()
annotate_stream_validate()
annotate_batch_pre()
annotate_batch_post()

Please see the respective documentation for details on their functionality, inputs and outputs.

Hint

If your machine is equipped with an accelerator, e.g., a GPU, this function will usually run faster when making use of the accelerator. Just call model.to("cuda")/model.to("mps")/model.to("xpu") or use the function to_preferred_device() to automatically select the best device. In addition, you might want to increase the batch size by passing the batch_size argument to the function. Possible values might be 2048 or 4096 (or larger if your GPU permits).

Hint

All calls to annotate and classify will automatically resample the input data to the sampling rate of the model, if defined. When data is downsampled, this might involve an anti-alias filter. To control whether this filter is zero-phase, use the argument zerophase_resample. For more fine-grained control of the resampling process, manually resample the data before passing it to annotate.

Warning

Even though the asyncio implementation itself is not parallel, this does not guarantee that only a single CPU core will be used, as the underlying libraries (pytorch, numpy, scipy, …) might be parallelised. If you need to limit the parallelism of these libraries, check their documentation, e.g., here or here. Bear in mind that a lower number of threads might occasionally improve runtime performance, as it limits overheads, e.g., here.

Parameters:

stream (obspy.core.Stream) – Obspy stream to annotate
copy (bool) – If true, copies the input stream. Otherwise, the input stream is modified in place.
kwargs

Returns:

Obspy stream of annotations

async annotate_async(stream, copy=True, **kwargs)[source]

annotate implementation based on asyncio

Parameters:

stream – Obspy stream to annotate
copy – If true, copies the input stream. Otherwise, the input stream is modified in place.
kwargs – Additional arguments for annotation

Return type:

Stream

Returns:

Obspy stream of annotations

annotate_batch_post(batch, piggyback, argdict)[source]

Runs postprocessing on the predictions of a window for the annotate function, e.g., reformatting them. By default, returns the original prediction. Inheriting classes should overwrite this function if necessary.

Parameters:

batch (Tensor) – Predictions for the batch. The data type depends on the model.
argdict (dict[str, Any]) – Dictionary of arguments
piggyback (Any) – Piggyback information, by default None.

Return type:

Tensor

Returns:

Postprocessed predictions

annotate_batch_pre(batch, argdict)[source]

Runs preprocessing on batch level for the annotate function, e.g., normalization. By default, returns the input batch unmodified. Optionally, this can return a tuple of the preprocessed batch and piggyback information that is passed to annotate_batch_post(). This can for example be used to transfer normalization information. Inheriting classes should overwrite this function if necessary.

Parameters:

batch (Tensor) – Input batch
argdict (dict[str, Any]) – Dictionary of arguments

Return type:

Tensor

Returns:

Preprocessed batch and optionally piggyback information that is passed to annotate_batch_post()

annotate_stream_pre(stream, argdict)[source]

Runs preprocessing on stream level for the annotate function, e.g., filtering or resampling. By default, this function will resample all traces if a sampling rate for the model is provided. Furthermore, if a filter is specified in the class, the filter will be executed. As annotate create a copy of the input stream, this function can safely modify the stream inplace. Inheriting classes should overwrite this function if necessary. To keep the default functionality, a call to the overwritten method can be included.

Parameters:

stream (obspy.Stream) – Input stream
argdict – Dictionary of arguments

Returns:

Preprocessed stream

annotate_stream_validate(stream, argdict)[source]

Validates stream for the annotate function. This function should raise an exception if the stream is invalid. By default, this function will check if the sampling rate fits the provided one, unless it is None, and check for mismatching traces, i.e., traces covering the same time range on the same instrument with different values. Inheriting classes should overwrite this function if necessary. To keep the default functionality, a call to the overwritten method can be included.

Parameters:

stream (obspy.Stream) – Input stream
argdict – Dictionary of arguments

Returns:

None

classify(stream, parallelism=None, **kwargs)[source]

Classifies the stream. The classification can contain any information, but should be consistent with existing models.

Parameters:

stream (obspy.core.Stream) – Obspy stream to classify
kwargs

Return type:

ClassifyOutput

Returns:

A classification for the full stream, e.g., a list of picks or the source magnitude.

classify_aggregate(annotations, argdict)[source]

An aggregation function that converts the annotation streams returned by annotate() into a classification. A classification consists of a ClassifyOutput, essentialy a namespace that can hold an arbitrary set of keys. However, when implementing a model which already exists in similar form, we recommend using the same output format. For example, all pick outputs should have the same format.

Parameters:

annotations – Annotations returned from annotate()
argdict – Dictionary of arguments

Return type:

ClassifyOutput

Returns:

Classification object

async classify_async(stream, **kwargs)[source]

Async interface to the classify() function. See details there.

Return type:: ClassifyOutput

classify_stream_pre(stream, argdict)[source]

Runs preprocessing on stream level for the classify function, e.g., subselecting traces. By default, this function will simply return the input stream. In contrast to annotate_stream_pre(), this function operates on the original input stream. The stream should therefore not be modified in place. Note that annotate_stream_pre() will be executed on the output of this stream within the classify() function.

Parameters:

stream (obspy.Stream) – Input stream
argdict – Dictionary of arguments

Returns:

Preprocessed stream

property component_order

static detections_from_annotations(annotations, threshold)[source]

Converts the annotations streams for a single phase to discrete detections using a classical trigger on/off. The lower threshold is set to half the higher threshold. Detections are represented by Detection objects. The detection start_time and end_time are set to the trigger on and off times.

Parameters:

annotations – Stream of annotations
threshold (float) – Higher threshold for trigger

Return type:

DetectionList

Returns:

List of detections

get_model_args()[source]

Obtain all model parameters for saving.

Returns:: Dictionary of all parameters for a model to store during saving.
Return type:: Dict

static picks_from_annotations(annotations, threshold, phase)[source]

Converts the annotations streams for a single phase to discrete picks using a classical trigger on/off. The lower threshold is set to half the higher threshold. Picks are represented by Pick objects. The pick start_time and end_time are set to the trigger on and off times.

Parameters:

annotations – Stream of annotations
threshold – Higher threshold for trigger
phase – Phase to label, only relevant for output phase labelling

Return type:

PickList

Returns:

List of picks

static resample(stream, sampling_rate, zerophase=True)[source]

Perform inplace resampling of stream to a given sampling rate.

Parameters:

stream (Stream) – Input stream
sampling_rate (float) – Sampling rate (sps) to resample to
zerophase (bool) – If true, use a zero-phase filter for antialiasing, otherwise a causal filter.

static sanitize_mismatching_overlapping_records(stream)[source]

Detects if for any id the stream contains overlapping traces that do not match. If yes, all mismatching parts are removed and a warning is issued.

Parameters:: stream (obspy.core.Stream) – Input stream
Returns:: The stream object without mismatching traces
Return type:: obspy.core.Stream

stream_to_array(traces, argdict)[source]

Converts streams into a start time and a numpy array. Assumes:

All traces within a group can be put into an array, i.e, the strict parameter is already enforced. Every remaining gap is intended to be filled with zeros. The selection/cutting of intervals has already been done by GroupingHelper.group_stream().
No overlapping traces of the same component exist
All traces have the same sampling rate

Parameters:

stream (obspy.core.Stream) – Input stream
argdict (dict) – Dictionary of arguments

Return type:

GroupedTraceData

Returns:

output_times: Start times for each array

Returns:

output_data: Arrays with waveforms

class WaveformPipeline(components, citation=None)[source]

Bases: ABC

A waveform pipeline is a collection of models that together expose an annotate() and a classify() function. Examples of waveform pipelines would be multi-step picking models, conducting first a detection with one model and then a pick identification with a second model. This could also easily be extended by adding further models, e.g., estimating magnitude for each detection.

In contrast to WaveformModel, a waveform pipeline is not a pytorch module and has no forward function. This also means, that all components of a pipeline will usually be trained separately. As a rule of thumb, if the pipeline can be trained end to end, it should most likely rather be a WaveformModel. For a waveform pipeline, the annotate() and classify() functions are not automatically generated, but need to be implemented manually.

Waveform pipelines offer functionality for downloading pipeline configurations from the SeisBench repository. Similarly to SeisBenchModel, waveform pipelines expose a from_pretrained() function, that will download the configuration for a pipeline and its components.

To implement a waveform pipeline, this class needs to be subclassed. This class will throw an exception when trying to instantiate.

Warning

In contrast to SeisBenchModel this class does not yet feature versioning for weights. By default, all underlying models will use the latest, locally available version. This functionality will eventually be added. Please raise an issue on Github if you require this functionality.

Parameters:

components (dict [str, SeisBenchModel]) – Dictionary of components contained in the model. This should contain all models used in the pipeline.
citation (str, optional) – Citation reference, defaults to None.

annotate(stream, **kwargs)[source]

property citation

classify(stream, **kwargs)[source]

abstractmethod classmethod component_classes()[source]

Returns a mapping of component names to their classes. This function needs to be defined in each pipeline, as it is required to load configurations.

Returns:: Dictionary mapping component names to their classes.
Return type:: Dict[str, SeisBenchModel classes]

property docstring

classmethod from_pretrained(name, force=False, wait_for_file=False)[source]

Load pipeline from configuration. Automatically loads all dependent pretrained models weights.

A pipeline configuration is a json file. On the top level, it has three entries:

“components”: A dictionary listing all contained models and the pretrained weight to use for this model.
The instances of these classes will be created using the from_pretrained() method. The components need to match the components from the dictionary returned by component_classes().
“docstring”: A string documenting the pipeline. Usually also contains information on the author.
“model_args”: Argument dictionary passed to the init function of the pipeline. (optional)

Parameters:

name (str) – Configuration name
force (bool, optional) – Force execution of download callback, defaults to False
wait_for_file (bool, optional) – Whether to wait on partially downloaded files, defaults to False

Returns:

Pipeline instance

Return type:

WaveformPipeline

classmethod list_pretrained(details=False)[source]

Returns list of available configurations and optionally their docstrings.

Parameters:: details (bool) – If true, instead of a returning only a list, also return their docstrings. Note that this requires to download the json files for each model in the background and is therefore slower. Defaults to false.
Returns:: List of available weights or dict of weights and their docstrings
Return type:: list or dict

property name

hard_sigmoid(x)[source]

DASModels

class DASAnnotateCallback[source]

Bases: ABC

This abstract class describes the interface for callbacks used in the DAS annotate method. Callbacks will get streaming outputs from the annotate method, containing the different chunks after processing with the deep learning model. Different callbacks are available, e.g., for picking or for writing the full output. To implement a new callback, inherit from this class and implement the methods. Callbacks are stateful, allowing them, for example, to handle overlaps between adjacent chunks.

finalize()[source]

Finalize step for the callback. This is called after the last chunk is processed and can be used to generate the final results based on the intermediate results processed in each chunk.

The finalize step is optional.

Return type:: None

abstractmethod get_results_dict()[source]

This method returns a dictionary with the results of the callback. It is used to generate the ClassifyOutput when using the callback through classify.

Return type:: dict[str, Any]

abstractmethod handle_patch(annotations, in_coords, out_coords)[source]

This method is called for each patch of the output after processing it with the deep learning model. Results inferred from this step should be stored in class variables.

Return type:: None

setup(data, patching_structure, annotate_keys)[source]

Setup step for the callback. This is called before the first chunk is processed and can be used to initialize state variables, e.g., the shape of the output or arrays for intermediate results.

The setup step is optional, however, it is usually good practice to reset all state variables in the setup step.

Return type:: None

class DASModel(dt_range=None, dx_range=None, patching_structure=None, buffer_queue_size=8, annotate_forward_kwargs=None, annotate_keys=None, default_args=None, fk_filter_args=None, filter_samples=None, **kwargs)[source]

Bases: SeisBenchModel, ABC

This is the base class for all models processing DAS data.

Hint

If you are an end-user looking to apply pretrained models, you most likely won’t interact with this class directly. Instead, you will use classes inheriting from this class and their annotate() and classify() functions. If you aim to develop your own model, you should inherit from this class and have a look at the details below.

Hint

When calling annotate or classify, the model can perform automatic resampling along both axis. This ensures that the model can be flexibly applied to data of different sampling rates and channel spacings. However, as models are typically stable with respect to small changes in sampling rate and channel spacing, this class allows for a range of sampling rates and channel spacings to be specified. When called on data that does not fall into this ratio, the model will search for the smallest set of integers for upsampling and downsampling. The resampling is done using scipy.signal.resample_poly. To get the exact resampling ratio for a particular input array, check the function get_resample_ratio().

Parameters:

patching_structure (PatchingStructure | None) – The structure of the patches to cut for annotation. If None, the function get_patching_structure() needs to be implemented, allowing to dynamically adjust the patching structure to the input data.
dt_range (tuple[float, float] | None) – Admissible range for the time step of data to be processed. This value is only taken into account for the execution of the annotate/classify functions. See the above hint on the resampling behavior. Values are in seconds.
dx_range (tuple[float, float] | None) – Same as dt_range but along the channel axis. Values are in meters.
buffer_queue_size (int) – Maximum number of chunks to keep in the intermediate buffers.
annotate_forward_kwargs (dict[str, Any] | None) – Additional keyword arguments to pass to the forward method of the model when running annotate/classify.
annotate_keys (list[str] | None) – List of annotation keys to read from the output.
default_args (dict[str, Any] | None) – Default arguments for the optional keyword arguments of annotate/classify.
fk_filter_args (dict[str, Any] | None) – Arguments for the F-k filter. See FKFilter for details.
filter_samples (tuple[str, dict[str, Any]] | None) – Filter to apply along the sample axis. See VirtualTransformedDataArray for details.

annotate(*args, **kwargs)[source]

Return type:: None

async annotate_async(data, callback, **kwargs)[source]

Return type:: None

static calc_output_shape_and_coordinates(da, patching_structure)[source]

Calculates the shape and coordinate axis of the output array after processing with the given patching structure. In case the output shape would be fractional, an extra sample is added to the output array along the corresponding axis.

Return type:: tuple[tuple[int, int], dict[str, InterpCoordinate]]

classify(*args, **kwargs)[source]

Return type:: ClassifyOutput

async classify_async(data, **kwargs)[source]

The classify method is used to process the data and apply the default callback. The kwargs are split into two groups: those that are passed to the callback and those that are passed to the annotate method.

Return type:: ClassifyOutput

property classify_callback: Type[DASAnnotateCallback]: Return the default callback for this model. For example, for picking models, this would be a DASPickingCallback. The class will then be instantiated and used to process the output of the annotate method. Constructor arguments will be extracted from the kwargs passed to classify.

get_model_args()[source]

Obtain all model parameters for saving.

Returns:: Dictionary of all parameters for a model to store during saving.
Return type:: Dict

get_patching_structure(data_shape, argdict)[source]

To enable dynamic window sizes, depending on the shape of the input record, this function can be overwritten. By default, returns the predefined patching structure. In addition, this function allows to overwrite the overlap dynamically.

The data_shape is provided for adaptive models. Note that the data shape can have float coordinates due to in-memory resampling of the data. The actual output shape can only be inferred once the patching structure has been defined, as the number of truncated samples depends on the patching structure. Therefore, models should be flexible towards the case of slightly smaller data shapes than the theoretical one.

Return type:: PatchingStructure

get_resample_ratios(data, channel_coord_name)[source]

Estimates integer ratios for resampling along the sample and channel axes to fall into the predefined ratios.

Return type:: tuple[tuple[int, int], tuple[int, int]]

class DASPickingCallback(thresholds=0.2, min_time_separation=1.0, blinding=(100, 100))[source]

Bases: DASAnnotateCallback

Pick arrivals from probability curves using scipy.signal.find_peaks. The picking is performed independently on each channel, i.e., no continuity is assumed between channels.

Parameters:

thresholds (float | dict[str, float]) – Confidence thresholds for picking. Can be a single value for all phases, or a dictionary with thresholds per phase.
min_time_separation (float) – Minimum time separation between two picks of the same phase in seconds.
blinding (tuple[int, int]) – Number of samples to ignore at the start and end of each window for picking. Useful to avoid boundary artifacts in picking.

finalize()[source]

Finalize step for the callback. This is called after the last chunk is processed and can be used to generate the final results based on the intermediate results processed in each chunk.

The finalize step is optional.

Return type:: None

get_results_dataframe()[source]

Return type:: DataFrame

get_results_dict()[source]

This method returns a dictionary with the results of the callback. It is used to generate the ClassifyOutput when using the callback through classify.

Return type:: dict[str, Any]

handle_patch(annotations, in_coords, out_coords)[source]

This method is called for each patch of the output after processing it with the deep learning model. Results inferred from this step should be stored in class variables.

Return type:: None

setup(data, patching_structure, annotate_keys)[source]

Setup step for the callback. This is called before the first chunk is processed and can be used to initialize state variables, e.g., the shape of the output or arrays for intermediate results.

The setup step is optional, however, it is usually good practice to reset all state variables in the setup step.

Return type:: None

class FKFilter(dt, dx, v_min=None, v_max=None, mode='pass', **kwargs)[source]

Bases: Module

An F-k filter implemented in PyTorch. The filter processes batched data, i.e., the input format should be (batch, samples, channels).

Parameters:

dx (float) – Channel spacing in space
dt (float) – Sample spacing in time
v_min (float | None) – Minimum velocity to be considered in the filter. If None, no filtering is applied.
v_max (float | None) – Maximum velocity to be considered in the filter. If None, no filtering is applied.
mode (str) – Either “pass” or “reject”. If “pass” all velocities between v_min and v_max are retained. If “reject”, all frequencies outside this band.

forward(data)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Return type:: Tensor

class InMemoryCollectionCallback(stacking='weighted')[source]

Bases: DASAnnotateCallback

Collects the raw predictions of the model in memory and splices the DAS array back together from the individual patches. To avoid memory overflows, this callback should only be used for small datasets.

finalize()[source]

Finalize step for the callback. This is called after the last chunk is processed and can be used to generate the final results based on the intermediate results processed in each chunk.

The finalize step is optional.

Return type:: None

get_results_dict()[source]

This method returns a dictionary with the results of the callback. It is used to generate the ClassifyOutput when using the callback through classify.

Return type:: dict[str, Any]

handle_patch(annotations, in_coords, out_coords)[source]

This method is called for each patch of the output after processing it with the deep learning model. Results inferred from this step should be stored in class variables.

Return type:: None

setup(data, patching_structure, annotate_keys)[source]

Setup step for the callback. This is called before the first chunk is processed and can be used to initialize state variables, e.g., the shape of the output or arrays for intermediate results.

The setup step is optional, however, it is usually good practice to reset all state variables in the setup step.

Return type:: None

class MultiCallback(callbacks)[source]

Bases: DASAnnotateCallback

finalize()[source]

Finalize step for the callback. This is called after the last chunk is processed and can be used to generate the final results based on the intermediate results processed in each chunk.

The finalize step is optional.

Return type:: None

get_results_dict()[source]

This method returns a dictionary with the results of the callback. It is used to generate the ClassifyOutput when using the callback through classify.

Return type:: dict[str, Any]

handle_patch(annotations, in_coords, out_coords)[source]

This method is called for each patch of the output after processing it with the deep learning model. Results inferred from this step should be stored in class variables.

Return type:: None

setup(data, patching_structure, annotate_keys)[source]

Setup step for the callback. This is called before the first chunk is processed and can be used to initialize state variables, e.g., the shape of the output or arrays for intermediate results.

The setup step is optional, however, it is usually good practice to reset all state variables in the setup step.

Return type:: None

class PatchCoordinate(sample, channel, w_sample, w_channel)[source]

Bases: object

Coordinates of a patch in the input or output array. Denotes the upper-left corner of the patch and the dimensions along each axis. Note that coordinates can take non-integer values due to transformations. Callbacks should be able to handle this, e.g., by casting to int.

channel: float

property channel_int: int

sample: float

property sample_int: int

w_channel: int

w_sample: int

class PatchingStructure(in_samples, in_channels, out_samples, out_channels, range_samples, range_channels, overlap_samples=None, overlap_channels=None)[source]

Bases: object

in_channels: int

in_samples: int

out_channels: int

out_samples: int

overlap_channels: int | None = None

overlap_samples: int | None = None

range_channels: tuple[int, int]

range_samples: tuple[int, int]

property shift_channels

property shift_samples

class VirtualTransformedDataArray(data, patching_structure, resample_samples=(1, 1), resample_channels=(1, 1), filter_samples=None, force_dtype=None, channel_coord_name=None)[source]

Bases: object

This class wraps a xdas.DataArray and allows to apply a transformation to it on the fly. It is used to allow loading data from disk in chunks and only applying the transformations to the current chunk. This way, the total memory consumption is independent of the size of the underlying data.

For resampling, the class uses scipy.signal.resample_poly, which internally performs an upsampling, a zero-phase FIR filter and a downsampling step. To avoid boundary artifacts, extra samples are loaded for the filtering and truncated afterwards (at most 11 extra samples per side).

After resampling, the class can apply an IIR filter along the sample axis. Only causal filters are supported because acausal filters would require loading the whole data at once. Only IIR filters are supported due to their higher computational efficiencies and because the number of filter states to cache between consecutive chunks is much lower. No filters along the channel axis are implemented. Instead, consider using an F-k filter.

Parameters:

data (DataArray) – xdas.DataArray to be transformed
patching_structure (PatchingStructure) – The structure of the patches to cut for annotation.
resample_samples (tuple[int, int]) – Tuple of integers (up, down) defining the resampling factors along the sample axis.
resample_channels (tuple[int, int]) – Tuple of integers (up, down) defining the resampling factors along the channel axis.
filter_samples (tuple[str, dict[str, Any]] | None) – Tuple of filter type and keyword arguments to pass to the filter. The filter type must be the name of a filter design function in scipy.signal, e.g., “butter” or “cheby1”. The filter must support the output keyword argument, as this implementation relies on second-order sections. The filter corners should be specified in Hz. The class will automatically pass the sampling rate to the fs argument of the filter creation. No filter is applied if this argument is None.
channel_coord_name (str | None) – Name of the coordinate in the data array that contains the channel coordinates.

property coords: dict[str, xdas.Coordinate]

property dt: float

property dtype: dtype

property dx: float

static estimate_theoretical_output_shape(data, resample_samples, resample_channels)[source]

Return type:: tuple[float, float]

property filter_sos: ndarray | None

static guess_channel_coord_name(data)[source]

Return type:: str

property shape: tuple[int, int]

Shape of the transformed data. Always in (samples, channels) dimension order.

Truncates the right end of the output to ensure (output_samples - in_samples) is divisible by the upsampling. The same is done for the channel axis. This is necessary to avoid fractional window offsets.

class WriterBuffer(data, stacking, output_shape)[source]

Bases: object

A buffer to handle intersections between overlapping output data. The buffer expects data in patches of equal size. The patch order needs to be left to right (samples), top to bottom (channels), i.e., first all samples for a range of channels need to be processed before the next row can be processed.

The buffer keeps up to two rows in memory and writes slices along the sample axis once they are fully predicted.

add_data(data, out_coords)[source]

Return type:: tuple[ndarray, PatchCoordinate] | None

finalize()[source]

Return type:: None

property stacking: str

class WriterCallback(output_path, stacking='weighted')[source]

Bases: DASAnnotateCallback

Writes the raw predictions of the model to disk. The callback implements streaming processing to avoid excessive memory usage, while ensuring correct splicing at the overlaps between adjacent patches.

The output writing relies on the xdas DataArrayWriter . This means that the output will be written in multiple files using one output folder per annotation key. To load the files for key x use xdas.open_mfdataarray("output_path/x/*"). Note that the time coordinate will have minor discontinuities due to the chunked writing. These can be fixed by calling data.coords["time"] = data.coords["time"].simplify(tolerance=np.timedelta64(1, "us")).

finalize()[source]

Finalize step for the callback. This is called after the last chunk is processed and can be used to generate the final results based on the intermediate results processed in each chunk.

The finalize step is optional.

Return type:: None

get_results_dict()[source]

This method returns a dictionary with the results of the callback. It is used to generate the ClassifyOutput when using the callback through classify.

Return type:: dict[str, Any]

handle_patch(annotations, in_coords, out_coords)[source]

This method is called for each patch of the output after processing it with the deep learning model. Results inferred from this step should be stored in class variables.

Return type:: None

setup(data, patching_structure, annotate_keys)[source]

Setup step for the callback. This is called before the first chunk is processed and can be used to initialize state variables, e.g., the shape of the output or arrays for intermediate results.

The setup step is optional, however, it is usually good practice to reset all state variables in the setup step.

Return type:: None

torch_dtype_to_numpy(dtype)[source]

Return type:: dtype