seisbench.util

File handling

callback_if_uncached(files, callback, force=False, wait_for_file=False, test_interval=60)[source]

Checks if all files exists and executes the callback otherwise. Please note that the callback is executed if at least one file is not cached. If one of the files does not exists, but file.partial does, the behaviour depends on force and wait_for_file.

Warning

While making concurrent callbacks unlikely, they can still happen, if the function is called twice in short time, i.e., the second starts before the first created a .partial file.

Parameters:
  • files (list[union[Path, str]], Path, str) – A list of files or single file to check.

  • callback (callable) – A callback, taking one parameter, a list of target file names. Will be called if a file is missing. The callback will be given the same parameter as provided in files, just with files renamed to file.partial. The function will move the files afterwards, but will ignore empty files.

  • force (bool) – If true, and not all files exist, ignore and remove all partial files and execute callback. Only use this parameter if no other instance of callback_if_uncached is currently requesting the same file.

  • wait_for_file (bool) – If true, not all files exist, but partial files exist, sleep until files exists or no partial files exist.

  • test_interval (float) – Sleep interval for wait_for_file.

download_ftp(host, file, target, user='anonymous', passwd='', blocksize=8192, progress_bar=True, desc='Downloading')[source]

Downloads file from ftp source.

Parameters:
  • host (str) – Host URL

  • file (str) – File path on the FTP server

  • target (Path or str) – Path to save to

  • user (str) – Username for login

  • passwd (str) – Password for login

  • blocksize (int) – Size of download blocks in bytes

  • progress_bar (bool) – If true, shows a progress bar for the download

  • desc (str) – Description for the progress bar

download_http(url, target, progress_bar=True, desc='Downloading', precheck_timeout=3)[source]

Downloads file from http/https source. Raises a ValueError for non-200 status codes.

Parameters:
  • url (str) – Target url

  • target (Path or str) – Path to save to

  • progress_bar (bool) – If true, shows a progress bar for the download

  • desc (str) – Description for the progress bar

  • precheck_timeout (int) – Timeout passed to precheck_url()

ls_webdav(url, precheck_timeout=3)[source]

Lists the files in a WebDAV directory

Parameters:
  • url (str) – URL of the directory to list

  • precheck_timeout (int) – Timeout passed to precheck_url()

Returns:

List of files

precheck_url(url, timeout)[source]

Checks whether the url is reachable and give a 200 or 300 HTTP response code. If a timeout occurs or a >=400 response code is returned, the precheck issues a warning.

Parameters:
  • url – URL to check

  • timeout – Timeout in seconds

safe_extract_tar(tar, path='.', members=None, *, numeric_owner=False)[source]

A safe extract function for tar archives avoiding CVE-2007-4559 (extraction of files with absolute path) See https://github.com/seisbench/seisbench/pull/134

Parameters as for tar.extractall

Annotation classes

class ClassifyOutput(creator, **kwargs)[source]

Bases: SimpleNamespace

A general container to hold the outputs of the classify function of SeisBench models. This allows each model to provide a different set of outputs while keeping a consistent output type. For example, EQTransformer can output picks and detections, while PhaseNet only provides detections.

Parameters:
  • creator (str) – The model creating the output.

  • kwargs – All outputs of the model

class Detection(trace_id, start_time, end_time, peak_value=None)[source]

Bases: object

This class serves as container for storing detection information. Defines an ordering based on start time, end time and trace id.

Parameters:
  • trace_id (str) – Id of the trace the detection was generated from

  • start_time (UTCDateTime) – Onset time of the detection

  • end_time (UTCDateTime) – End time of the detection

  • peak_value (float) – Peak value of the characteristic function for the detection

class DetectionList(iterable=(), /)[source]

Bases: PickList

A list of Detection objects with convenience functions for selecting and printing

select(trace_id=None, min_confidence=None)[source]

Select specific detections. Only arguments provided will be used to filter.

Parameters:
  • trace_id (Optional[str]) – A regular expression to match against the trace id. The string is directly passed to the re module in Python, i.e., characters like dots need to be escapes and wildcards are represented using .*.

  • min_confidence (Optional[float]) – The minimum confidence values. Detections without confidence value are discarded.

class Pick(trace_id, start_time, end_time=None, peak_time=None, peak_value=None, phase=None)[source]

Bases: object

This class serves as container for storing pick information. Defines an ordering based on start time, end time and trace id.

Parameters:
  • trace_id (str) – Id of the trace the pick was generated from

  • start_time (UTCDateTime) – Onset time of the pick

  • end_time (UTCDateTime) – End time of the pick

  • peak_time (UTCDateTime) – Peak time of the characteristic function for the pick

  • peak_value (float) – Peak value of the characteristic function for the pick

  • phase (str) – Phase hint

class PickList(iterable=(), /)[source]

Bases: list

A list of Pick objects with convenience functions for selecting and printing

select(trace_id=None, min_confidence=None, phase=None)[source]

Select specific picks. Only arguments provided will be used to filter.

Parameters:
  • trace_id (Optional[str]) – A regular expression to match against the trace id. The string is directly passed to the re module in Python, i.e., characters like dots need to be escapes and wildcards are represented using .*.

  • min_confidence (Optional[float]) – The minimum confidence values. Picks without confidence value are discarded.

  • phase (Optional[str]) – The phase of the pick. Only exact matches will be returned. Picks without phase information are discarded.

Region definitions

class CircleDomain(latitude, longitude, minradius, maxradius)[source]

Bases: CircularDomain

Circular domain for selecting coordinates within a given radii of sourcepoint. The edges are not included in the domain

Parameters:
  • latitude (float) – Latitude of the circle center

  • longitude (float) – Longitude of the circle center

  • minradius (float) – Minimum radius in degrees

  • maxradius (float) – maximum radius in degrees

is_in_domain(latitude, longitude)[source]

Checks whether a point is within the domain

Parameters:
  • latitude (float) – Latitude of query point

  • longitude (float) – Longitude of query point

Returns:

True if point is within the domain, false otherwise

Return type:

bool

class Germany[source]

Bases: Domain

Example usage of how to create more complex region geometries. https://docs.obspy.org/_modules/obspy/clients/fdsn/mass_downloader/domain.html

get_query_parameters()[source]

Return the domain specific query parameters for the get_stations() method as a dictionary. Possibilities keys for rectangular queries are

  • minlatitude

  • maxlatitude

  • minlongitude

  • maxlongitude

For circular queries:

  • latitude

  • longitude

  • minradius

  • maxradius

is_in_domain(latitude, longitude)[source]

Checks whether a point is within the domain

Parameters:
  • latitude (float) – Latitude of query point

  • longitude (float) – Longitude of query point

Returns:

True if point is within the domain, false otherwise

Return type:

bool

class RectangleDomain(minlatitude, maxlatitude, minlongitude, maxlongitude)[source]

Bases: RectangularDomain

A rectangular domain defined by latitude and longitude bounds. Edges are included in the domain.

Parameters:
  • minlatitude (float) – Minimum latitude

  • maxlatitude (float) – Maximum latitude

  • minlongitude (float) – Minimum longitude

  • maxlongitude (float) – Maximum longitude

is_in_domain(latitude, longitude)[source]

Checks whether a point is within the domain

Parameters:
  • latitude (float) – Latitude of query point

  • longitude (float) – Longitude of query point

Returns:

True if point is within the domain, false otherwise

Return type:

bool

Helper functions for pytorch

worker_seeding(wid)[source]

When using numpy random inside multiple workers in the data loader, they all produce the same random numbers, as the seed is shared. As a solution, the worker init function can be overwritten. This solution uses the torch initial_seed, which is set separately for each worker. This should be taken into account, as SeisBench uses numpy random for augmentation.

To set the seed in each worker, use worker_init_fn=worker_seeding when creating the pytorch DataLoader.

Code from https://github.com/pytorch/pytorch/issues/5059

Parameters:

wid (int) – Worker id

Common functions for converting datasets to SeisBench format

fdsn_get_bulk_safe(client, bulk)[source]

A wrapper around obspy’s get_waveforms_bulk that does error handling and tries to download as much data as possible.

Parameters:
  • client (Client) – An obspy FDSN client

  • bulk (list[tuple]) – A bulk request as for get_waveforms_bulk

Return type:

Stream

rotate_stream_to_zne(stream, inventory)[source]

Tries to rotate the stream to ZNE inplace. There are several possible failures, which are silently ignored.

Parameters:
  • stream (obspy.Stream) – Stream to rotate

  • inventory (obspy.Inventory) – Inventory object

stream_to_array(stream, component_order)[source]

Converts stream of single station waveforms into a numpy array according to a given component order. If trace start and end times disagree between component traces, remaining parts are filled with zeros. Also returns completeness, i.e., the fraction of samples in the output that actually contain data. Assumes all traces to have the same sampling rate.

Parameters:
  • stream (obspy.Stream) – Stream to convert

  • component_order (str) – Component order

Returns:

starttime, data, completeness

Return type:

UTCDateTime, np.ndarray, float

trace_has_spikes(data, factor=25, quantile=0.975)[source]

Checks for bit flip errors in the data using a simple quantile rule

Parameters:
  • data (np.ndarray) – Data array

  • factor (float) – Maximum allowed factor between peak and quantile

  • quantile (float) – Quantile to check. Must be between 0 and 1.

waveform_id_to_network_station_location(waveform_id)[source]

Takes a waveform_id as string in the format Network.Station.Location.Channel and returns a string with channel dropped. If the waveform_id does not conform to the format, the input string is returned.

Parameters:

waveform_id (str) – Waveform ID in format Network.Station.Location.Channel

Returns:

Waveform ID in format Network.Station.Location

Return type:

str

Auxiliary functions

in_notebook()[source]

Checks whether code is executed within a jupyter notebook

Return type:

bool

Array tools

pad_packed_sequence(seq)[source]

Packs a list of arrays into one array by adding a new first dimension and padding where necessary.

Parameters:

seq (list[ndarray]) – List of numpy arrays

Return type:

ndarray

Returns:

Combined arrays

torch_detrend(x)[source]

Detrends a tensor along the last axis using a linear fit.

Parameters:

x (Tensor) – Array to detrend

Return type:

Tensor

Returns:

Detrended array