Spectral¶
The spectral module contains classes that aid in dealing with frequency-domain representations of sound
Representations¶
-
class
zounds.spectral.
FrequencyDimension
(scale)[source]¶ When applied to an axis of
ArrayWithUnits
, that axis can be viewed as representing the energy present in a series of frequency bandsParameters: scale (FrequencyScale) – A scale whose frequency bands correspond to the items along the frequency axis Examples
>>> from zounds import LinearScale, FrequencyBand, ArrayWithUnits >>> from zounds import FrequencyDimension >>> import numpy as np >>> band = FrequencyBand(20, 20000) >>> scale = LinearScale(frequency_band=band, n_bands=100) >>> raw = np.hanning(100) >>> arr = ArrayWithUnits(raw, [FrequencyDimension(scale)]) >>> sliced = arr[FrequencyBand(100, 1000)] >>> sliced.shape (5,) >>> sliced.dimensions (FrequencyDimension(scale=LinearScale(band=FrequencyBand( start_hz=20.0, stop_hz=1019.0, center=519.5, bandwidth=999.0), n_bands=5)),)
-
integer_based_slice
(index)[source]¶ Subclasses define behavior that transforms a custom, user-defined slice into integer indices that numpy can understand
Parameters: index (custom slice) – A user-defined slice instance
-
validate
(size)[source]¶ Ensure that the size of the dimension matches the number of bands in the scale
Raises: ValueError
– when the dimension size and number of bands don’t match
-
-
class
zounds.spectral.
ExplicitFrequencyDimension
(scale, slices)[source]¶ A frequency dimension where the mapping from frequency bands to integer indices is provided explicitly, rather than computed
Parameters: - scale (ExplicitScale) – the explicit frequency scale that defines how slices are extracted from this dimension
- slices (iterable of slices) – An iterable of
python.slice
instances which correspond to each frequency band from scale
Raises: ValueError
– when the number of slices and number of bands in scale don’t match
-
class
zounds.spectral.
FrequencyAdaptive
[source]¶ TODO: This needs some love. Mutually exclusive constructor arguments are no bueno
Parameters: - arrs – TODO
- time_dimension (TimeDimension) – the time dimension of the first axis of this array
- scale (FrequencyScale) – The frequency scale corresponding to the first
axis of this array, mutually exclusive with the
explicit_freq_dimension
argument - explicit_freq_dimension (ExplicitFrequencyDimension) – TODO
See also
-
square
(n_coeffs, do_overlap_add=False)[source]¶ Compute a “square” view of the frequency adaptive transform, by resampling each frequency band such that they all contain the same number of samples, and performing an overlap-add procedure in the case where the sample frequency and duration differ :param n_coeffs: The common size to which each frequency band should be resampled
Functions¶
-
zounds.spectral.
fft
(x, axis=-1, padding_samples=0)[source]¶ Apply an FFT along the given dimension, and with the specified amount of zero-padding
Parameters: - x (ArrayWithUnits) – an
ArrayWithUnits
instance which has one or moreTimeDimension
axes - axis (int) – The axis along which the fft should be applied
- padding_samples (int) – The number of padding zeros to apply along axis before performing the FFT
- x (ArrayWithUnits) – an
-
zounds.spectral.
morlet_filter_bank
(samplerate, kernel_size, scale, scaling_factor, normalize=True)[source]¶ Create a
ArrayWithUnits
instance with aTimeDimension
and aFrequencyDimension
representing a bank of morlet wavelets centered on the sub-bands of the scale.Parameters: - samplerate (SampleRate) – the samplerate of the input signal
- kernel_size (int) – the length in samples of each filter
- scale (FrequencyScale) – a scale whose center frequencies determine the fundamental frequency of each filer
- scaling_factor (int or list of int) – Scaling factors for each band, which determine the time-frequency resolution tradeoff. The number(s) should fall between 0 and 1, with smaller numbers achieving better frequency resolution, and larget numbers better time resolution
- normalize (bool) – When true, ensure that each filter in the bank has unit norm
See also
Processing Nodes¶
-
class
zounds.spectral.
SlidingWindow
(wscheme, wfunc=None, padwith=0, needs=None)[source]¶ SlidingWindow is a processing node that provides a very common precursor to many frequency domain transforms: a lapped and windowed view of the time- domain signal.
Parameters: - wscheme (SampleRate) – a sample rate that describes the frequency and duration af the sliding window
- wfunc (WindowingFunc) – a windowing function to apply to each frame
- needs (Node) – A processing node on which this node relies for its data. This will generally be a time-domain signal
Here’s how you’d typically see
SlidingWindow
used in a processing graphimport zounds Resampled = zounds.resampled(resample_to=zounds.SR11025()) @zounds.simple_in_memory_settings class Sound(Resampled): windowed = zounds.ArrayWithUnitsFeature( zounds.SlidingWindow, needs=Resampled.resampled, wscheme=zounds.SampleRate( frequency=zounds.Milliseconds(250), duration=zounds.Milliseconds(500)), wfunc=zounds.OggVorbisWindowingFunc(), store=True) synth = zounds.SineSynthesizer(zounds.SR44100()) samples = synth.synthesize(zounds.Seconds(5), [220., 440., 880.]) # process the audio, and fetch features from our in-memory store _id = Sound.process(meta=samples.encode()) sound = Sound(_id) print sound.windowed.dimensions[0] # TimeDimension(f=0.250068024879, d=0.500045346811) print sound.windowed.dimensions[1] # TimeDimension(f=9.0702947e-05, d=9.0702947e-05)
See also
-
class
zounds.spectral.
FrequencyWeighting
(weighting=None, needs=None)[source]¶ FrequencyWeighting is a processing node that expects to be passed an
ArrayWithUnits
instance whose last dimension is aFrequencyDimension
Parameters: - weighting (FrequencyWeighting) – the frequency weighting to apply
- needs (Node) – a processing node on which this node depends whose last
dimension is a
FrequencyDimension
-
class
zounds.spectral.
FFT
(needs=None, axis=-1, padding_samples=0)[source]¶ A processing node that performs an FFT of a real-valued signal
Parameters: See also
-
class
zounds.spectral.
DCT
(axis=-1, scale_always_even=False, needs=None)[source]¶ A processing node that performs a Type II Discrete Cosine Transform (https://en.wikipedia.org/wiki/Discrete_cosine_transform#DCT-II) of the input
Parameters: - axis (int) – The axis over which to perform the DCT transform
- needs (Node) – a processing node on which this one depends
See also
DctSynthesizer
-
class
zounds.spectral.
DCTIV
(scale_always_even=False, needs=None)[source]¶ A processing node that performs a Type IV Discrete Cosine Transform (https://en.wikipedia.org/wiki/Discrete_cosine_transform#DCT-IV) of the input
Parameters: needs (Node) – a processing node on which this one depends See also
-
class
zounds.spectral.
MDCT
(needs=None)[source]¶ A processing node that performs a modified discrete cosine transform (https://en.wikipedia.org/wiki/Modified_discrete_cosine_transform) of the input.
This is really just a lapped version of the DCT-IV transform
Parameters: needs (Node) – a processing node on which this one depends See also
-
class
zounds.spectral.
FrequencyAdaptiveTransform
(transform=None, scale=None, window_func=None, check_scale_overlap_ratio=False, needs=None)[source]¶ A processing node that expects to receive the input from a frequency domain transformation (e.g.
FFT
), and produces aFrequencyAdaptive
instance where time resolution can vary by frequency. This is similar to, but not precisely the same as ideas introduced in:- A quasi-orthogonal, invertible, and perceptually relevant time-frequency transform for audio coding
- A FRAMEWORK FOR INVERTIBLE, REAL-TIME CONSTANT-Q TRANSFORMS
Parameters: - transform (function) – the transform to be applied to each frequency band
- scale (FrequencyScale) – the scale used to take frequency band slices
- window_func (numpy.ndarray) – the windowing function to apply each band before the transform is applied
- check_scale_overlap_ratio (bool) – If this feature is to be used for resynthesis later, ensure that each frequency band overlaps with the previous one by at least half, to ensure artifact-free synthesis
-
class
zounds.spectral.
Chroma
(frequency_band, window=<zounds.spectral.sliding_window.HanningWindowingFunc object>, needs=None)[source]¶
-
class
zounds.spectral.
BarkBands
(frequency_band, n_bands=100, window=<zounds.spectral.sliding_window.HanningWindowingFunc object>, needs=None)[source]¶
-
class
zounds.spectral.
SpectralCentroid
(needs=None)[source]¶ Indicates where the “center of mass” of the spectrum is. Perceptually, it has a robust connection with the impression of “brightness” of a sound. It is calculated as the weighted mean of the frequencies present in the signal, determined using a Fourier transform, with their magnitudes as the weights…
-
class
zounds.spectral.
SpectralFlatness
(needs=None)[source]¶ Spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used in digital signal processing to characterize an audio spectrum. Spectral flatness is typically measured in decibels, and provides a way to quantify how tone-like a sound is, as opposed to being noise-like. The meaning of tonal in this context is in the sense of the amount of peaks or resonant structure in a power spectrum, as opposed to flat spectrum of a white noise. A high spectral flatness indicates that the spectrum has a similar amount of power in all spectral bands - this would sound similar to white noise, and the graph of the spectrum would appear relatively flat and smooth. A low spectral flatness indicates that the spectral power is concentrated in a relatively small number of bands - this would typically sound like a mixture of sine waves, and the spectrum would appear “spiky”…
Windowing Functions¶
-
class
zounds.spectral.
WindowingFunc
(windowing_func=None)[source]¶ WindowingFunc is mostly a convenient wrapper around numpy’s handy windowing functions, or any function that takes a size parameter and returns a numpy array-like object.
A WindowingFunc instance can be multiplied with a nother array of any size.
Parameters: windowing_func (function) – A function that takes a size parameter, and returns a numpy array-like object Examples
>>> from zounds import WindowingFunc >>> import numpy as np >>> wf = WindowingFunc(lambda size: np.hanning(size)) >>> np.ones(5) * wf array([ 0. , 0.5, 1. , 0.5, 0. ]) >>> np.ones(10) * wf array([ 0. , 0.11697778, 0.41317591, 0.75 , 0.96984631, 0.96984631, 0.75 , 0.41317591, 0.11697778, 0. ])
-
class
zounds.spectral.
OggVorbisWindowingFunc
[source]¶ The windowing function described in the ogg vorbis specification
Scales¶
-
class
zounds.spectral.
LinearScale
(frequency_band, n_bands, always_even=False)[source]¶ A linear frequency scale with constant bandwidth. Appropriate for use with transforms whose coefficients also lie on a linear frequency scale, e.g. the FFT or DCT transforms.
Parameters: - frequency_band (FrequencyBand) – A band representing the entire span of
this scale. E.g., one might want to generate a scale spanning the
entire range of human hearing by starting with
FrequencyBand(20, 20000)
- n_bands (int) – The number of bands in this scale
- always_even (bool) – when converting frequency slices to integer indices that numpy can understand, should the slice size always be even?
Examples
>>> from zounds import FrequencyBand, LinearScale >>> scale = LinearScale(FrequencyBand(20, 20000), 10) >>> scale LinearScale(band=FrequencyBand( start_hz=20, stop_hz=20000, center=10010.0, bandwidth=19980), n_bands=10) >>> scale.Q array([ 0.51001001, 1.51001001, 2.51001001, 3.51001001, 4.51001001, 5.51001001, 6.51001001, 7.51001001, 8.51001001, 9.51001001])
-
static
from_sample_rate
(sample_rate, n_bands, always_even=False)[source]¶ Return a
LinearScale
instance whose upper frequency bound is informed by the nyquist frequency of the sample rate.Parameters: - sample_rate (SamplingRate) – the sample rate whose nyquist frequency will serve as the upper frequency bound of this scale
- n_bands (int) – the number of evenly-spaced frequency bands
- frequency_band (FrequencyBand) – A band representing the entire span of
this scale. E.g., one might want to generate a scale spanning the
entire range of human hearing by starting with
-
class
zounds.spectral.
GeometricScale
(start_center_hz, stop_center_hz, bandwidth_ratio, n_bands, always_even=False)[source]¶ A constant-Q scale whose center frequencies progress geometrically rather than linearly
Parameters: Examples
>>> from zounds import GeometricScale >>> scale = GeometricScale(20, 20000, 0.05, 10) >>> scale GeometricScale(band=FrequencyBand( start_hz=19.5, stop_hz=20500.0, center=10259.75, bandwidth=20480.5), n_bands=10) >>> scale.Q array([ 20., 20., 20., 20., 20., 20., 20., 20., 20., 20.]) >>> list(scale.center_frequencies) [20.000000000000004, 43.088693800637671, 92.831776672255558, 200.00000000000003, 430.88693800637651, 928.31776672255558, 2000.0000000000005, 4308.8693800637648, 9283.1776672255564, 20000.000000000004]
-
class
zounds.spectral.
ExplicitScale
(bands)[source]¶ A scale where the frequency bands are provided explicitly, rather than computed
Parameters: bands (list of FrequencyBand) – The explicit bands used by this scale See also
-
class
zounds.spectral.
FrequencyScale
(frequency_band, n_bands, always_even=False)[source]¶ Represents a set of frequency bands with monotonically increasing start frequencies
Parameters: - frequency_band (FrequencyBand) – A band representing the entire span of
this scale. E.g., one might want to generate a scale spanning the
entire range of human hearing by starting with
FrequencyBand(20, 20000)
- n_bands (int) – The number of bands in this scale
- always_even (bool) – when converting frequency slices to integer indices that numpy can understand, should the slice size always be even?
See also
-
bands
¶ An iterable of all bands in this scale
-
center_frequencies
¶ An iterable of the center frequencies of each band in this scale
-
bandwidths
¶ An iterable of the bandwidths of each band in this scale
-
ensure_overlap_ratio
(required_ratio=0.5)[source]¶ Ensure that every adjacent pair of frequency bands meets the overlap ratio criteria. This can be helpful in scenarios where a scale is being used in an invertible transform, and something like the constant overlap add constraint must be met in order to not introduce artifacts in the reconstruction.
Parameters: required_ratio (float) – The required overlap ratio between all adjacent frequency band pairs Raises: AssertionError
– when the overlap ratio for one or more adjacent frequency band pairs is not met
-
Q
¶ The quality factor of the scale, or, the ratio of center frequencies to bandwidths
-
start_hz
¶ The lower bound of this frequency scale
-
stop_hz
¶ The upper bound of this frequency scale
- frequency_band (FrequencyBand) – A band representing the entire span of
this scale. E.g., one might want to generate a scale spanning the
entire range of human hearing by starting with
-
class
zounds.spectral.
FrequencyBand
(start_hz, stop_hz)[source]¶ Represents an interval, or band of frequencies in hertz (cycles per second)
Parameters: - Examples::
>>> import zounds >>> band = zounds.FrequencyBand(500, 1000) >>> band.center_frequency 750.0 >>> band.bandwidth 500
-
intersect
(other)[source]¶ Return the intersection between this frequency band and another.
Parameters: other (FrequencyBand) – the instance to intersect with - Examples::
>>> import zounds >>> b1 = zounds.FrequencyBand(500, 1000) >>> b2 = zounds.FrequencyBand(900, 2000) >>> intersection = b1.intersect(b2) >>> intersection.start_hz, intersection.stop_hz (900, 1000)
-
static
from_start
(start_hz, bandwidth_hz)[source]¶ Produce a
FrequencyBand
instance from a lower bound and bandwidthParameters:
-
bandwidth
¶ The span of this frequency band, in hertz
Frequency Weightings¶
-
class
zounds.spectral.
AWeighting
[source]¶ An A-weighting (https://en.wikipedia.org/wiki/A-weighting) that can be applied to a frequency axis via multiplication.
Examples
>>> from zounds import ArrayWithUnits, GeometricScale >>> from zounds import FrequencyDimension, AWeighting >>> import numpy as np >>> scale = GeometricScale(20, 20000, 0.05, 10) >>> raw = np.ones(len(scale)) >>> arr = ArrayWithUnits(raw, [FrequencyDimension(scale)]) >>> arr * AWeighting() ArrayWithUnits([ 1. , 18.3172567 , 31.19918106, 40.54760374, 47.15389876, 51.1554151 , 52.59655479, 52.24516649, 49.39906912, 42.05409205])