seisbench.util
File handling
- callback_if_uncached(files, callback, force=False, wait_for_file=False, test_interval=60)[source]
Checks if all files exists and executes the callback otherwise. Please note that the callback is executed if at least one file is not cached. If one of the files does not exists, but file.partial does, the behaviour depends on force and wait_for_file.
Warning
While making concurrent callbacks unlikely, they can still happen, if the function is called twice in short time, i.e., the second starts before the first created a .partial file.
- Parameters:
files (list[union[Path, str]], Path, str) – A list of files or single file to check.
callback (callable) – A callback, taking one parameter, a list of target file names. Will be called if a file is missing. The callback will be given the same parameter as provided in files, just with files renamed to file.partial. The function will move the files afterwards, but will ignore empty files.
force (bool) – If true, and not all files exist, ignore and remove all partial files and execute callback. Only use this parameter if no other instance of callback_if_uncached is currently requesting the same file.
wait_for_file (bool) – If true, not all files exist, but partial files exist, sleep until files exists or no partial files exist.
test_interval (float) – Sleep interval for wait_for_file.
- download_ftp(host, file, target, user='anonymous', passwd='', blocksize=8192, progress_bar=True, desc='Downloading')[source]
Downloads file from ftp source.
- Parameters:
host (str) – Host URL
file (str) – File path on the FTP server
target (Path or str) – Path to save to
user (str) – Username for login
passwd (str) – Password for login
blocksize (int) – Size of download blocks in bytes
progress_bar (bool) – If true, shows a progress bar for the download
desc (str) – Description for the progress bar
- download_http(url, target, progress_bar=True, desc='Downloading', precheck_timeout=3)[source]
Downloads file from http/https source. Raises a ValueError for non-200 status codes.
- Parameters:
url (str) – Target url
target (Path or str) – Path to save to
progress_bar (bool) – If true, shows a progress bar for the download
desc (str) – Description for the progress bar
precheck_timeout (int) – Timeout passed to
precheck_url()
- ls_webdav(url, precheck_timeout=3)[source]
Lists the files in a WebDAV directory
- Parameters:
url (str) – URL of the directory to list
precheck_timeout (int) – Timeout passed to
precheck_url()
- Returns:
List of files
- precheck_url(url, timeout)[source]
Checks whether the url is reachable and give a 200 or 300 HTTP response code. If a timeout occurs or a >=400 response code is returned, the precheck issues a warning.
- Parameters:
url – URL to check
timeout – Timeout in seconds
- safe_extract_tar(tar, path='.', members=None, *, numeric_owner=False)[source]
A safe extract function for tar archives avoiding CVE-2007-4559 (extraction of files with absolute path) See https://github.com/seisbench/seisbench/pull/134
Parameters as for tar.extractall
Annotation classes
- class ClassifyOutput(creator, **kwargs)[source]
Bases:
SimpleNamespace
A general container to hold the outputs of the classify function of SeisBench models. This allows each model to provide a different set of outputs while keeping a consistent output type. For example, EQTransformer can output picks and detections, while PhaseNet only provides detections.
- Parameters:
creator (
str
) – The model creating the output.kwargs – All outputs of the model
- class Detection(trace_id, start_time, end_time, peak_value=None)[source]
Bases:
object
This class serves as container for storing detection information. Defines an ordering based on start time, end time and trace id.
- Parameters:
trace_id (str) – Id of the trace the detection was generated from
start_time (UTCDateTime) – Onset time of the detection
end_time (UTCDateTime) – End time of the detection
peak_value (float) – Peak value of the characteristic function for the detection
- class DetectionList(iterable=(), /)[source]
Bases:
PickList
A list of Detection objects with convenience functions for selecting and printing
- select(trace_id=None, min_confidence=None)[source]
Select specific detections. Only arguments provided will be used to filter.
- Parameters:
trace_id (
Optional
[str
]) – A regular expression to match against the trace id. The string is directly passed to the re module in Python, i.e., characters like dots need to be escapes and wildcards are represented using .*.min_confidence (
Optional
[float
]) – The minimum confidence values. Detections without confidence value are discarded.
- class Pick(trace_id, start_time, end_time=None, peak_time=None, peak_value=None, phase=None)[source]
Bases:
object
This class serves as container for storing pick information. Defines an ordering based on start time, end time and trace id.
- Parameters:
trace_id (str) – Id of the trace the pick was generated from
start_time (UTCDateTime) – Onset time of the pick
end_time (UTCDateTime) – End time of the pick
peak_time (UTCDateTime) – Peak time of the characteristic function for the pick
peak_value (float) – Peak value of the characteristic function for the pick
phase (str) – Phase hint
- class PickList(iterable=(), /)[source]
Bases:
list
A list of Pick objects with convenience functions for selecting and printing
- select(trace_id=None, min_confidence=None, phase=None)[source]
Select specific picks. Only arguments provided will be used to filter.
- Parameters:
trace_id (
Optional
[str
]) – A regular expression to match against the trace id. The string is directly passed to the re module in Python, i.e., characters like dots need to be escapes and wildcards are represented using .*.min_confidence (
Optional
[float
]) – The minimum confidence values. Picks without confidence value are discarded.phase (
Optional
[str
]) – The phase of the pick. Only exact matches will be returned. Picks without phase information are discarded.
Region definitions
- class CircleDomain(latitude, longitude, minradius, maxradius)[source]
Bases:
CircularDomain
Circular domain for selecting coordinates within a given radii of sourcepoint. The edges are not included in the domain
- Parameters:
latitude (float) – Latitude of the circle center
longitude (float) – Longitude of the circle center
minradius (float) – Minimum radius in degrees
maxradius (float) – maximum radius in degrees
- class Germany[source]
Bases:
Domain
Example usage of how to create more complex region geometries. https://docs.obspy.org/_modules/obspy/clients/fdsn/mass_downloader/domain.html
- class RectangleDomain(minlatitude, maxlatitude, minlongitude, maxlongitude)[source]
Bases:
RectangularDomain
A rectangular domain defined by latitude and longitude bounds. Edges are included in the domain.
- Parameters:
minlatitude (float) – Minimum latitude
maxlatitude (float) – Maximum latitude
minlongitude (float) – Minimum longitude
maxlongitude (float) – Maximum longitude
Helper functions for pytorch
- worker_seeding(wid)[source]
When using numpy random inside multiple workers in the data loader, they all produce the same random numbers, as the seed is shared. As a solution, the worker init function can be overwritten. This solution uses the torch initial_seed, which is set separately for each worker. This should be taken into account, as SeisBench uses numpy random for augmentation.
To set the seed in each worker, use worker_init_fn=worker_seeding when creating the pytorch DataLoader.
Code from https://github.com/pytorch/pytorch/issues/5059
- Parameters:
wid (int) – Worker id
Common functions for converting datasets to SeisBench format
- fdsn_get_bulk_safe(client, bulk)[source]
A wrapper around obspy’s get_waveforms_bulk that does error handling and tries to download as much data as possible.
- Parameters:
client (
Client
) – An obspy FDSN clientbulk (
list
[tuple
]) – A bulk request as for get_waveforms_bulk
- Return type:
Stream
- rotate_stream_to_zne(stream, inventory)[source]
Tries to rotate the stream to ZNE inplace. There are several possible failures, which are silently ignored.
- Parameters:
stream (obspy.Stream) – Stream to rotate
inventory (obspy.Inventory) – Inventory object
- stream_to_array(stream, component_order)[source]
Converts stream of single station waveforms into a numpy array according to a given component order. If trace start and end times disagree between component traces, remaining parts are filled with zeros. Also returns completeness, i.e., the fraction of samples in the output that actually contain data. Assumes all traces to have the same sampling rate.
- Parameters:
stream (obspy.Stream) – Stream to convert
component_order (str) – Component order
- Returns:
starttime, data, completeness
- Return type:
UTCDateTime, np.ndarray, float
- trace_has_spikes(data, factor=25, quantile=0.975)[source]
Checks for bit flip errors in the data using a simple quantile rule
- Parameters:
data (np.ndarray) – Data array
factor (float) – Maximum allowed factor between peak and quantile
quantile (float) – Quantile to check. Must be between 0 and 1.
- waveform_id_to_network_station_location(waveform_id)[source]
Takes a waveform_id as string in the format Network.Station.Location.Channel and returns a string with channel dropped. If the waveform_id does not conform to the format, the input string is returned.
- Parameters:
waveform_id (str) – Waveform ID in format Network.Station.Location.Channel
- Returns:
Waveform ID in format Network.Station.Location
- Return type:
str