
The spectral module contains classes that aid in dealing with frequency-domain representations of sound


class zounds.spectral.FrequencyDimension(scale)[source]

When applied to an axis of ArrayWithUnits, that axis can be viewed as representing the energy present in a series of frequency bands

Parameters:scale (FrequencyScale) – A scale whose frequency bands correspond to the items along the frequency axis


>>> from zounds import LinearScale, FrequencyBand, ArrayWithUnits
>>> from zounds import FrequencyDimension
>>> import numpy as np
>>> band = FrequencyBand(20, 20000)
>>> scale = LinearScale(frequency_band=band, n_bands=100)
>>> raw = np.hanning(100)
>>> arr = ArrayWithUnits(raw, [FrequencyDimension(scale)])
>>> sliced = arr[FrequencyBand(100, 1000)]
>>> sliced.shape
>>> sliced.dimensions
bandwidth=999.0), n_bands=5)),)
metaslice(index, size)[source]

Produce a new instance of this dimension, given a custom slice


Subclasses define behavior that transforms a custom, user-defined slice into integer indices that numpy can understand

Parameters:index (custom slice) – A user-defined slice instance

Ensure that the size of the dimension matches the number of bands in the scale

Raises:ValueError – when the dimension size and number of bands don’t match
class zounds.spectral.ExplicitFrequencyDimension(scale, slices)[source]

A frequency dimension where the mapping from frequency bands to integer indices is provided explicitly, rather than computed

  • scale (ExplicitScale) – the explicit frequency scale that defines how slices are extracted from this dimension
  • slices (iterable of slices) – An iterable of python.slice instances which correspond to each frequency band from scale

ValueError – when the number of slices and number of bands in scale don’t match

metaslice(index, size)[source]

Produce a new instance of this dimension, given a custom slice


Subclasses define behavior that transforms a custom, user-defined slice into integer indices that numpy can understand

Parameters:index (custom slice) – A user-defined slice instance

Subclasses check to ensure that the dimensions size does not validate any assumptions made by this instance

class zounds.spectral.FrequencyAdaptive[source]

TODO: This needs some love. Mutually exclusive constructor arguments are no bueno

  • arrs – TODO
  • time_dimension (TimeDimension) – the time dimension of the first axis of this array
  • scale (FrequencyScale) – The frequency scale corresponding to the first axis of this array, mutually exclusive with the explicit_freq_dimension argument
  • explicit_freq_dimension (ExplicitFrequencyDimension) – TODO
square(n_coeffs, do_overlap_add=False)[source]

Compute a “square” view of the frequency adaptive transform, by resampling each frequency band such that they all contain the same number of samples, and performing an overlap-add procedure in the case where the sample frequency and duration differ :param n_coeffs: The common size to which each frequency band should be resampled


zounds.spectral.fft(x, axis=-1, padding_samples=0)[source]

Apply an FFT along the given dimension, and with the specified amount of zero-padding

  • x (ArrayWithUnits) – an ArrayWithUnits instance which has one or more TimeDimension axes
  • axis (int) – The axis along which the fft should be applied
  • padding_samples (int) – The number of padding zeros to apply along axis before performing the FFT
zounds.spectral.morlet_filter_bank(samplerate, kernel_size, scale, scaling_factor, normalize=True)[source]

Create a ArrayWithUnits instance with a TimeDimension and a FrequencyDimension representing a bank of morlet wavelets centered on the sub-bands of the scale.

  • samplerate (SampleRate) – the samplerate of the input signal
  • kernel_size (int) – the length in samples of each filter
  • scale (FrequencyScale) – a scale whose center frequencies determine the fundamental frequency of each filer
  • scaling_factor (int or list of int) – Scaling factors for each band, which determine the time-frequency resolution tradeoff. The number(s) should fall between 0 and 1, with smaller numbers achieving better frequency resolution, and larget numbers better time resolution
  • normalize (bool) – When true, ensure that each filter in the bank has unit norm

Processing Nodes

class zounds.spectral.SlidingWindow(wscheme, wfunc=None, padwith=0, needs=None)[source]

SlidingWindow is a processing node that provides a very common precursor to many frequency domain transforms: a lapped and windowed view of the time- domain signal.

  • wscheme (SampleRate) – a sample rate that describes the frequency and duration af the sliding window
  • wfunc (WindowingFunc) – a windowing function to apply to each frame
  • needs (Node) – A processing node on which this node relies for its data. This will generally be a time-domain signal

Here’s how you’d typically see SlidingWindow used in a processing graph

import zounds

Resampled = zounds.resampled(resample_to=zounds.SR11025())

class Sound(Resampled):
    windowed = zounds.ArrayWithUnitsFeature(

synth = zounds.SineSynthesizer(zounds.SR44100())
samples = synth.synthesize(zounds.Seconds(5), [220., 440., 880.])

# process the audio, and fetch features from our in-memory store
_id = Sound.process(meta=samples.encode())
sound = Sound(_id)

print sound.windowed.dimensions[0]
# TimeDimension(f=0.250068024879, d=0.500045346811)
print sound.windowed.dimensions[1]
# TimeDimension(f=9.0702947e-05, d=9.0702947e-05)
class zounds.spectral.FrequencyWeighting(weighting=None, needs=None)[source]

FrequencyWeighting is a processing node that expects to be passed an ArrayWithUnits instance whose last dimension is a FrequencyDimension

class zounds.spectral.FFT(needs=None, axis=-1, padding_samples=0)[source]

A processing node that performs an FFT of a real-valued signal

  • axis (int) – The axis over which the FFT should be computed
  • padding_samples (int) – number of zero samples to pad each window with before applying the FFT
  • needs (Node) – a processing node on which this one depends

See also


class zounds.spectral.DCT(axis=-1, scale_always_even=False, needs=None)[source]

A processing node that performs a Type II Discrete Cosine Transform (https://en.wikipedia.org/wiki/Discrete_cosine_transform#DCT-II) of the input

  • axis (int) – The axis over which to perform the DCT transform
  • needs (Node) – a processing node on which this one depends

See also


class zounds.spectral.DCTIV(scale_always_even=False, needs=None)[source]

A processing node that performs a Type IV Discrete Cosine Transform (https://en.wikipedia.org/wiki/Discrete_cosine_transform#DCT-IV) of the input

Parameters:needs (Node) – a processing node on which this one depends

See also


class zounds.spectral.MDCT(needs=None)[source]

A processing node that performs a modified discrete cosine transform (https://en.wikipedia.org/wiki/Modified_discrete_cosine_transform) of the input.

This is really just a lapped version of the DCT-IV transform

Parameters:needs (Node) – a processing node on which this one depends

See also


class zounds.spectral.FrequencyAdaptiveTransform(transform=None, scale=None, window_func=None, check_scale_overlap_ratio=False, needs=None)[source]

A processing node that expects to receive the input from a frequency domain transformation (e.g. FFT), and produces a FrequencyAdaptive instance where time resolution can vary by frequency. This is similar to, but not precisely the same as ideas introduced in:

  • transform (function) – the transform to be applied to each frequency band
  • scale (FrequencyScale) – the scale used to take frequency band slices
  • window_func (numpy.ndarray) – the windowing function to apply each band before the transform is applied
  • check_scale_overlap_ratio (bool) – If this feature is to be used for resynthesis later, ensure that each frequency band overlaps with the previous one by at least half, to ensure artifact-free synthesis
class zounds.spectral.Chroma(frequency_band, window=<zounds.spectral.sliding_window.HanningWindowingFunc object>, needs=None)[source]
class zounds.spectral.BarkBands(frequency_band, n_bands=100, window=<zounds.spectral.sliding_window.HanningWindowingFunc object>, needs=None)[source]
class zounds.spectral.SpectralCentroid(needs=None)[source]

Indicates where the “center of mass” of the spectrum is. Perceptually, it has a robust connection with the impression of “brightness” of a sound. It is calculated as the weighted mean of the frequencies present in the signal, determined using a Fourier transform, with their magnitudes as the weights…


class zounds.spectral.SpectralFlatness(needs=None)[source]

Spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used in digital signal processing to characterize an audio spectrum. Spectral flatness is typically measured in decibels, and provides a way to quantify how tone-like a sound is, as opposed to being noise-like. The meaning of tonal in this context is in the sense of the amount of peaks or resonant structure in a power spectrum, as opposed to flat spectrum of a white noise. A high spectral flatness indicates that the spectrum has a similar amount of power in all spectral bands - this would sound similar to white noise, and the graph of the spectrum would appear relatively flat and smooth. A low spectral flatness indicates that the spectral power is concentrated in a relatively small number of bands - this would typically sound like a mixture of sine waves, and the spectrum would appear “spiky”…


class zounds.spectral.BFCC(needs=None, n_coeffs=13, exclude=1)[source]

Bark frequency cepstral coefficients

Windowing Functions

class zounds.spectral.WindowingFunc(windowing_func=None)[source]

WindowingFunc is mostly a convenient wrapper around numpy’s handy windowing functions, or any function that takes a size parameter and returns a numpy array-like object.

A WindowingFunc instance can be multiplied with a nother array of any size.

Parameters:windowing_func (function) – A function that takes a size parameter, and returns a numpy array-like object


>>> from zounds import WindowingFunc
>>> import numpy as np
>>> wf = WindowingFunc(lambda size: np.hanning(size))
>>> np.ones(5) *  wf
array([ 0. ,  0.5,  1. ,  0.5,  0. ])
>>> np.ones(10) * wf
array([ 0.        ,  0.11697778,  0.41317591,  0.75      ,  0.96984631,
        0.96984631,  0.75      ,  0.41317591,  0.11697778,  0.        ])
class zounds.spectral.IdentityWindowingFunc[source]

An identity windowing function

class zounds.spectral.OggVorbisWindowingFunc[source]

The windowing function described in the ogg vorbis specification

class zounds.spectral.HanningWindowingFunc[source]

A hanning window function


class zounds.spectral.LinearScale(frequency_band, n_bands, always_even=False)[source]

A linear frequency scale with constant bandwidth. Appropriate for use with transforms whose coefficients also lie on a linear frequency scale, e.g. the FFT or DCT transforms.

  • frequency_band (FrequencyBand) – A band representing the entire span of this scale. E.g., one might want to generate a scale spanning the entire range of human hearing by starting with FrequencyBand(20, 20000)
  • n_bands (int) – The number of bands in this scale
  • always_even (bool) – when converting frequency slices to integer indices that numpy can understand, should the slice size always be even?


>>> from zounds import FrequencyBand, LinearScale
>>> scale = LinearScale(FrequencyBand(20, 20000), 10)
>>> scale
bandwidth=19980), n_bands=10)
>>> scale.Q
array([ 0.51001001,  1.51001001,  2.51001001,  3.51001001,  4.51001001,
        5.51001001,  6.51001001,  7.51001001,  8.51001001,  9.51001001])
static from_sample_rate(sample_rate, n_bands, always_even=False)[source]

Return a LinearScale instance whose upper frequency bound is informed by the nyquist frequency of the sample rate.

  • sample_rate (SamplingRate) – the sample rate whose nyquist frequency will serve as the upper frequency bound of this scale
  • n_bands (int) – the number of evenly-spaced frequency bands
class zounds.spectral.GeometricScale(start_center_hz, stop_center_hz, bandwidth_ratio, n_bands, always_even=False)[source]

A constant-Q scale whose center frequencies progress geometrically rather than linearly

  • start_center_hz (int) – the center frequency of the first band in the scale
  • stop_center_hz (int) – the center frequency of the last band in the scale
  • bandwidth_ratio (float) – the center frequency to bandwidth ratio
  • n_bands (int) – the total number of bands


>>> from zounds import GeometricScale
>>> scale = GeometricScale(20, 20000, 0.05, 10)
>>> scale
bandwidth=20480.5), n_bands=10)
>>> scale.Q
array([ 20.,  20.,  20.,  20.,  20.,  20.,  20.,  20.,  20.,  20.])
>>> list(scale.center_frequencies)
[20.000000000000004, 43.088693800637671, 92.831776672255558,
    200.00000000000003, 430.88693800637651, 928.31776672255558,
    2000.0000000000005, 4308.8693800637648, 9283.1776672255564,
class zounds.spectral.ExplicitScale(bands)[source]

A scale where the frequency bands are provided explicitly, rather than computed

Parameters:bands (list of FrequencyBand) – The explicit bands used by this scale
class zounds.spectral.FrequencyScale(frequency_band, n_bands, always_even=False)[source]

Represents a set of frequency bands with monotonically increasing start frequencies

  • frequency_band (FrequencyBand) – A band representing the entire span of this scale. E.g., one might want to generate a scale spanning the entire range of human hearing by starting with FrequencyBand(20, 20000)
  • n_bands (int) – The number of bands in this scale
  • always_even (bool) – when converting frequency slices to integer indices that numpy can understand, should the slice size always be even?

An iterable of all bands in this scale


An iterable of the center frequencies of each band in this scale


An iterable of the bandwidths of each band in this scale


Ensure that every adjacent pair of frequency bands meets the overlap ratio criteria. This can be helpful in scenarios where a scale is being used in an invertible transform, and something like the constant overlap add constraint must be met in order to not introduce artifacts in the reconstruction.

Parameters:required_ratio (float) – The required overlap ratio between all adjacent frequency band pairs
Raises:AssertionError – when the overlap ratio for one or more adjacent frequency band pairs is not met

The quality factor of the scale, or, the ratio of center frequencies to bandwidths


The lower bound of this frequency scale


The upper bound of this frequency scale


Given a frequency band, and a frequency dimension comprised of n_samples, return a slice using integer indices that may be used to extract only the frequency samples that intersect with the frequency band

class zounds.spectral.FrequencyBand(start_hz, stop_hz)[source]

Represents an interval, or band of frequencies in hertz (cycles per second)

  • start_hz (float) – The lower bound of the frequency band in hertz
  • stop_hz (float) – The upper bound of the frequency band in hertz
>>> import zounds
>>> band = zounds.FrequencyBand(500, 1000)
>>> band.center_frequency
>>> band.bandwidth

Return the intersection between this frequency band and another.

Parameters:other (FrequencyBand) – the instance to intersect with
>>> import zounds
>>> b1 = zounds.FrequencyBand(500, 1000)
>>> b2 = zounds.FrequencyBand(900, 2000)
>>> intersection = b1.intersect(b2)
>>> intersection.start_hz, intersection.stop_hz
(900, 1000)
static from_start(start_hz, bandwidth_hz)[source]

Produce a FrequencyBand instance from a lower bound and bandwidth

  • start_hz (float) – the lower bound of the desired FrequencyBand
  • bandwidth_hz (float) – the bandwidth of the desired FrequencyBand

The span of this frequency band, in hertz

Frequency Weightings

class zounds.spectral.AWeighting[source]

An A-weighting (https://en.wikipedia.org/wiki/A-weighting) that can be applied to a frequency axis via multiplication.


>>> from zounds import ArrayWithUnits, GeometricScale
>>> from zounds import FrequencyDimension, AWeighting
>>> import numpy as np
>>> scale = GeometricScale(20, 20000, 0.05, 10)
>>> raw = np.ones(len(scale))
>>> arr = ArrayWithUnits(raw, [FrequencyDimension(scale)])
>>> arr * AWeighting()
ArrayWithUnits([  1.        ,  18.3172567 ,  31.19918106,  40.54760374,
        47.15389876,  51.1554151 ,  52.59655479,  52.24516649,
        49.39906912,  42.05409205])