Spectral

The spectral module contains classes that aid in dealing with frequency-domain representations of sound

Representations

class zounds.spectral.FrequencyDimension(scale)[source]

When applied to an axis of ArrayWithUnits, that axis can be viewed as representing the energy present in a series of frequency bands

Parameters:scale (FrequencyScale) – A scale whose frequency bands correspond to the items along the frequency axis

Examples

>>> from zounds import LinearScale, FrequencyBand, ArrayWithUnits
>>> from zounds import FrequencyDimension
>>> import numpy as np
>>> band = FrequencyBand(20, 20000)
>>> scale = LinearScale(frequency_band=band, n_bands=100)
>>> raw = np.hanning(100)
>>> arr = ArrayWithUnits(raw, [FrequencyDimension(scale)])
>>> sliced = arr[FrequencyBand(100, 1000)]
>>> sliced.shape
(5,)
>>> sliced.dimensions
(FrequencyDimension(scale=LinearScale(band=FrequencyBand(
start_hz=20.0,
stop_hz=1019.0,
center=519.5,
bandwidth=999.0), n_bands=5)),)
metaslice(index, size)[source]

Produce a new instance of this dimension, given a custom slice

integer_based_slice(index)[source]

Subclasses define behavior that transforms a custom, user-defined slice into integer indices that numpy can understand

Parameters:index (custom slice) – A user-defined slice instance
validate(size)[source]

Ensure that the size of the dimension matches the number of bands in the scale

Raises:ValueError – when the dimension size and number of bands don’t match
class zounds.spectral.ExplicitFrequencyDimension(scale, slices)[source]

A frequency dimension where the mapping from frequency bands to integer indices is provided explicitly, rather than computed

Parameters:
  • scale (ExplicitScale) – the explicit frequency scale that defines how slices are extracted from this dimension
  • slices (iterable of slices) – An iterable of python.slice instances which correspond to each frequency band from scale
Raises:

ValueError – when the number of slices and number of bands in scale don’t match

metaslice(index, size)[source]

Produce a new instance of this dimension, given a custom slice

integer_based_slice(index)[source]

Subclasses define behavior that transforms a custom, user-defined slice into integer indices that numpy can understand

Parameters:index (custom slice) – A user-defined slice instance
validate(size)[source]

Subclasses check to ensure that the dimensions size does not validate any assumptions made by this instance

class zounds.spectral.FrequencyAdaptive[source]

TODO: This needs some love. Mutually exclusive constructor arguments are no bueno

Parameters:
  • arrs – TODO
  • time_dimension (TimeDimension) – the time dimension of the first axis of this array
  • scale (FrequencyScale) – The frequency scale corresponding to the first axis of this array, mutually exclusive with the explicit_freq_dimension argument
  • explicit_freq_dimension (ExplicitFrequencyDimension) – TODO
square(n_coeffs, do_overlap_add=False)[source]

Compute a “square” view of the frequency adaptive transform, by resampling each frequency band such that they all contain the same number of samples, and performing an overlap-add procedure in the case where the sample frequency and duration differ :param n_coeffs: The common size to which each frequency band should be resampled

Functions

zounds.spectral.fft(x, axis=-1, padding_samples=0)[source]

Apply an FFT along the given dimension, and with the specified amount of zero-padding

Parameters:
  • x (ArrayWithUnits) – an ArrayWithUnits instance which has one or more TimeDimension axes
  • axis (int) – The axis along which the fft should be applied
  • padding_samples (int) – The number of padding zeros to apply along axis before performing the FFT
zounds.spectral.morlet_filter_bank(samplerate, kernel_size, scale, scaling_factor, normalize=True)[source]

Create a ArrayWithUnits instance with a TimeDimension and a FrequencyDimension representing a bank of morlet wavelets centered on the sub-bands of the scale.

Parameters:
  • samplerate (SampleRate) – the samplerate of the input signal
  • kernel_size (int) – the length in samples of each filter
  • scale (FrequencyScale) – a scale whose center frequencies determine the fundamental frequency of each filer
  • scaling_factor (int or list of int) – Scaling factors for each band, which determine the time-frequency resolution tradeoff. The number(s) should fall between 0 and 1, with smaller numbers achieving better frequency resolution, and larget numbers better time resolution
  • normalize (bool) – When true, ensure that each filter in the bank has unit norm

Processing Nodes

class zounds.spectral.SlidingWindow(wscheme, wfunc=None, padwith=0, needs=None)[source]

SlidingWindow is a processing node that provides a very common precursor to many frequency domain transforms: a lapped and windowed view of the time- domain signal.

Parameters:
  • wscheme (SampleRate) – a sample rate that describes the frequency and duration af the sliding window
  • wfunc (WindowingFunc) – a windowing function to apply to each frame
  • needs (Node) – A processing node on which this node relies for its data. This will generally be a time-domain signal

Here’s how you’d typically see SlidingWindow used in a processing graph

import zounds

Resampled = zounds.resampled(resample_to=zounds.SR11025())

@zounds.simple_in_memory_settings
class Sound(Resampled):
    windowed = zounds.ArrayWithUnitsFeature(
        zounds.SlidingWindow,
        needs=Resampled.resampled,
        wscheme=zounds.SampleRate(
            frequency=zounds.Milliseconds(250),
            duration=zounds.Milliseconds(500)),
        wfunc=zounds.OggVorbisWindowingFunc(),
        store=True)


synth = zounds.SineSynthesizer(zounds.SR44100())
samples = synth.synthesize(zounds.Seconds(5), [220., 440., 880.])

# process the audio, and fetch features from our in-memory store
_id = Sound.process(meta=samples.encode())
sound = Sound(_id)

print sound.windowed.dimensions[0]
# TimeDimension(f=0.250068024879, d=0.500045346811)
print sound.windowed.dimensions[1]
# TimeDimension(f=9.0702947e-05, d=9.0702947e-05)
class zounds.spectral.FrequencyWeighting(weighting=None, needs=None)[source]

FrequencyWeighting is a processing node that expects to be passed an ArrayWithUnits instance whose last dimension is a FrequencyDimension

Parameters:
class zounds.spectral.FFT(needs=None, axis=-1, padding_samples=0)[source]

A processing node that performs an FFT of a real-valued signal

Parameters:
  • axis (int) – The axis over which the FFT should be computed
  • padding_samples (int) – number of zero samples to pad each window with before applying the FFT
  • needs (Node) – a processing node on which this one depends

See also

FFTSynthesizer

class zounds.spectral.DCT(axis=-1, scale_always_even=False, needs=None)[source]

A processing node that performs a Type II Discrete Cosine Transform (https://en.wikipedia.org/wiki/Discrete_cosine_transform#DCT-II) of the input

Parameters:
  • axis (int) – The axis over which to perform the DCT transform
  • needs (Node) – a processing node on which this one depends

See also

DctSynthesizer

class zounds.spectral.DCTIV(scale_always_even=False, needs=None)[source]

A processing node that performs a Type IV Discrete Cosine Transform (https://en.wikipedia.org/wiki/Discrete_cosine_transform#DCT-IV) of the input

Parameters:needs (Node) – a processing node on which this one depends

See also

DCTIVSynthesizer

class zounds.spectral.MDCT(needs=None)[source]

A processing node that performs a modified discrete cosine transform (https://en.wikipedia.org/wiki/Modified_discrete_cosine_transform) of the input.

This is really just a lapped version of the DCT-IV transform

Parameters:needs (Node) – a processing node on which this one depends

See also

MDCTSynthesizer

class zounds.spectral.FrequencyAdaptiveTransform(transform=None, scale=None, window_func=None, check_scale_overlap_ratio=False, needs=None)[source]

A processing node that expects to receive the input from a frequency domain transformation (e.g. FFT), and produces a FrequencyAdaptive instance where time resolution can vary by frequency. This is similar to, but not precisely the same as ideas introduced in:

Parameters:
  • transform (function) – the transform to be applied to each frequency band
  • scale (FrequencyScale) – the scale used to take frequency band slices
  • window_func (numpy.ndarray) – the windowing function to apply each band before the transform is applied
  • check_scale_overlap_ratio (bool) – If this feature is to be used for resynthesis later, ensure that each frequency band overlaps with the previous one by at least half, to ensure artifact-free synthesis
class zounds.spectral.Chroma(frequency_band, window=<zounds.spectral.sliding_window.HanningWindowingFunc object>, needs=None)[source]
class zounds.spectral.BarkBands(frequency_band, n_bands=100, window=<zounds.spectral.sliding_window.HanningWindowingFunc object>, needs=None)[source]
class zounds.spectral.SpectralCentroid(needs=None)[source]

Indicates where the “center of mass” of the spectrum is. Perceptually, it has a robust connection with the impression of “brightness” of a sound. It is calculated as the weighted mean of the frequencies present in the signal, determined using a Fourier transform, with their magnitudes as the weights…

http://en.wikipedia.org/wiki/Spectral_centroid

class zounds.spectral.SpectralFlatness(needs=None)[source]

Spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used in digital signal processing to characterize an audio spectrum. Spectral flatness is typically measured in decibels, and provides a way to quantify how tone-like a sound is, as opposed to being noise-like. The meaning of tonal in this context is in the sense of the amount of peaks or resonant structure in a power spectrum, as opposed to flat spectrum of a white noise. A high spectral flatness indicates that the spectrum has a similar amount of power in all spectral bands - this would sound similar to white noise, and the graph of the spectrum would appear relatively flat and smooth. A low spectral flatness indicates that the spectral power is concentrated in a relatively small number of bands - this would typically sound like a mixture of sine waves, and the spectrum would appear “spiky”…

http://en.wikipedia.org/wiki/Spectral_flatness

class zounds.spectral.BFCC(needs=None, n_coeffs=13, exclude=1)[source]

Bark frequency cepstral coefficients

Windowing Functions

class zounds.spectral.WindowingFunc(windowing_func=None)[source]

WindowingFunc is mostly a convenient wrapper around numpy’s handy windowing functions, or any function that takes a size parameter and returns a numpy array-like object.

A WindowingFunc instance can be multiplied with a nother array of any size.

Parameters:windowing_func (function) – A function that takes a size parameter, and returns a numpy array-like object

Examples

>>> from zounds import WindowingFunc
>>> import numpy as np
>>> wf = WindowingFunc(lambda size: np.hanning(size))
>>> np.ones(5) *  wf
array([ 0. ,  0.5,  1. ,  0.5,  0. ])
>>> np.ones(10) * wf
array([ 0.        ,  0.11697778,  0.41317591,  0.75      ,  0.96984631,
        0.96984631,  0.75      ,  0.41317591,  0.11697778,  0.        ])
class zounds.spectral.IdentityWindowingFunc[source]

An identity windowing function

class zounds.spectral.OggVorbisWindowingFunc[source]

The windowing function described in the ogg vorbis specification

class zounds.spectral.HanningWindowingFunc[source]

A hanning window function

Scales

class zounds.spectral.LinearScale(frequency_band, n_bands, always_even=False)[source]

A linear frequency scale with constant bandwidth. Appropriate for use with transforms whose coefficients also lie on a linear frequency scale, e.g. the FFT or DCT transforms.

Parameters:
  • frequency_band (FrequencyBand) – A band representing the entire span of this scale. E.g., one might want to generate a scale spanning the entire range of human hearing by starting with FrequencyBand(20, 20000)
  • n_bands (int) – The number of bands in this scale
  • always_even (bool) – when converting frequency slices to integer indices that numpy can understand, should the slice size always be even?

Examples

>>> from zounds import FrequencyBand, LinearScale
>>> scale = LinearScale(FrequencyBand(20, 20000), 10)
>>> scale
LinearScale(band=FrequencyBand(
start_hz=20,
stop_hz=20000,
center=10010.0,
bandwidth=19980), n_bands=10)
>>> scale.Q
array([ 0.51001001,  1.51001001,  2.51001001,  3.51001001,  4.51001001,
        5.51001001,  6.51001001,  7.51001001,  8.51001001,  9.51001001])
static from_sample_rate(sample_rate, n_bands, always_even=False)[source]

Return a LinearScale instance whose upper frequency bound is informed by the nyquist frequency of the sample rate.

Parameters:
  • sample_rate (SamplingRate) – the sample rate whose nyquist frequency will serve as the upper frequency bound of this scale
  • n_bands (int) – the number of evenly-spaced frequency bands
class zounds.spectral.GeometricScale(start_center_hz, stop_center_hz, bandwidth_ratio, n_bands, always_even=False)[source]

A constant-Q scale whose center frequencies progress geometrically rather than linearly

Parameters:
  • start_center_hz (int) – the center frequency of the first band in the scale
  • stop_center_hz (int) – the center frequency of the last band in the scale
  • bandwidth_ratio (float) – the center frequency to bandwidth ratio
  • n_bands (int) – the total number of bands

Examples

>>> from zounds import GeometricScale
>>> scale = GeometricScale(20, 20000, 0.05, 10)
>>> scale
GeometricScale(band=FrequencyBand(
start_hz=19.5,
stop_hz=20500.0,
center=10259.75,
bandwidth=20480.5), n_bands=10)
>>> scale.Q
array([ 20.,  20.,  20.,  20.,  20.,  20.,  20.,  20.,  20.,  20.])
>>> list(scale.center_frequencies)
[20.000000000000004, 43.088693800637671, 92.831776672255558,
    200.00000000000003, 430.88693800637651, 928.31776672255558,
    2000.0000000000005, 4308.8693800637648, 9283.1776672255564,
    20000.000000000004]
class zounds.spectral.ExplicitScale(bands)[source]

A scale where the frequency bands are provided explicitly, rather than computed

Parameters:bands (list of FrequencyBand) – The explicit bands used by this scale
class zounds.spectral.FrequencyScale(frequency_band, n_bands, always_even=False)[source]

Represents a set of frequency bands with monotonically increasing start frequencies

Parameters:
  • frequency_band (FrequencyBand) – A band representing the entire span of this scale. E.g., one might want to generate a scale spanning the entire range of human hearing by starting with FrequencyBand(20, 20000)
  • n_bands (int) – The number of bands in this scale
  • always_even (bool) – when converting frequency slices to integer indices that numpy can understand, should the slice size always be even?
bands

An iterable of all bands in this scale

center_frequencies

An iterable of the center frequencies of each band in this scale

bandwidths

An iterable of the bandwidths of each band in this scale

ensure_overlap_ratio(required_ratio=0.5)[source]

Ensure that every adjacent pair of frequency bands meets the overlap ratio criteria. This can be helpful in scenarios where a scale is being used in an invertible transform, and something like the constant overlap add constraint must be met in order to not introduce artifacts in the reconstruction.

Parameters:required_ratio (float) – The required overlap ratio between all adjacent frequency band pairs
Raises:AssertionError – when the overlap ratio for one or more adjacent frequency band pairs is not met
Q

The quality factor of the scale, or, the ratio of center frequencies to bandwidths

start_hz

The lower bound of this frequency scale

stop_hz

The upper bound of this frequency scale

get_slice(frequency_band)[source]

Given a frequency band, and a frequency dimension comprised of n_samples, return a slice using integer indices that may be used to extract only the frequency samples that intersect with the frequency band

class zounds.spectral.FrequencyBand(start_hz, stop_hz)[source]

Represents an interval, or band of frequencies in hertz (cycles per second)

Parameters:
  • start_hz (float) – The lower bound of the frequency band in hertz
  • stop_hz (float) – The upper bound of the frequency band in hertz
Examples::
>>> import zounds
>>> band = zounds.FrequencyBand(500, 1000)
>>> band.center_frequency
750.0
>>> band.bandwidth
500
intersect(other)[source]

Return the intersection between this frequency band and another.

Parameters:other (FrequencyBand) – the instance to intersect with
Examples::
>>> import zounds
>>> b1 = zounds.FrequencyBand(500, 1000)
>>> b2 = zounds.FrequencyBand(900, 2000)
>>> intersection = b1.intersect(b2)
>>> intersection.start_hz, intersection.stop_hz
(900, 1000)
static from_start(start_hz, bandwidth_hz)[source]

Produce a FrequencyBand instance from a lower bound and bandwidth

Parameters:
  • start_hz (float) – the lower bound of the desired FrequencyBand
  • bandwidth_hz (float) – the bandwidth of the desired FrequencyBand
bandwidth

The span of this frequency band, in hertz

Frequency Weightings

class zounds.spectral.AWeighting[source]

An A-weighting (https://en.wikipedia.org/wiki/A-weighting) that can be applied to a frequency axis via multiplication.

Examples

>>> from zounds import ArrayWithUnits, GeometricScale
>>> from zounds import FrequencyDimension, AWeighting
>>> import numpy as np
>>> scale = GeometricScale(20, 20000, 0.05, 10)
>>> raw = np.ones(len(scale))
>>> arr = ArrayWithUnits(raw, [FrequencyDimension(scale)])
>>> arr * AWeighting()
ArrayWithUnits([  1.        ,  18.3172567 ,  31.19918106,  40.54760374,
        47.15389876,  51.1554151 ,  52.59655479,  52.24516649,
        49.39906912,  42.05409205])