Soundfile

The soundfile module introduces featureflow.Node subclasses that know how to process low-level audio samples and common audio encodings.

Input

class zounds.soundfile.AudioMetaData(uri=None, samplerate=None, channels=None, licensing=None, description=None, tags=None, **kwargs)[source]

Encapsulates metadata about a source audio file, including things like text descriptions and licensing information.

Parameters:
  • uri (requests.Request or str) – uri may be either a string representing a network resource or a local file, or a requests.Request instance
  • samplerate (int) – the samplerate of the source audio
  • channels (int) – the number of channels of the source audio
  • licensing (str) – The licensing agreement (if any) that applies to the source audio
  • description (str) – a text description of the source audio
  • tags (str) – text tags that apply to the source audio
  • kwargs (dict) – other arbitrary properties about the source audio
Raises:

ValueError – when uri is not provided

Chunksize

class zounds.soundfile.ChunkSizeBytes(samplerate, duration, channels=2, bit_depth=16)[source]

A convenience class to help describe a chunksize in bytes for the featureflow.ByteStream in terms of audio sample batch sizes.

Parameters:
  • samplerate (SampleRate) – The samples-per-second factor
  • duration (numpy.timedelta64) – The length of desired chunks in seconds
  • channels (int) – Then audio channels factor
  • bit_depth (int) – The bit depth factor

Examples

>>> from zounds import ChunkSizeBytes, Seconds, SR44100
>>> chunksize = ChunkSizeBytes(SR44100(), Seconds(30))
>>> chunksize
ChunkSizeBytes(samplerate=SR44100(f=2.2675736e-05, d=2.2675736e-05)...
>>> int(chunksize)
5292000

Processing Nodes

class zounds.soundfile.AudioStream(sum_to_mono=True, needs=None)[source]

AudioStream expects to process a raw stream of bytes (e.g. one produced by featureflow.ByteStream) and produces chunks of AudioSamples

Parameters:
  • sum_to_mono (bool) – True if this node should return a AudioSamples instance with a single channel
  • needs (Feature) – a processing node that produces a byte stream (e.g. ByteStream

Here’s how’d you typically see AudioStream used in a processing graph.

import featureflow as ff
import zounds

chunksize = zounds.ChunkSizeBytes(
    samplerate=zounds.SR44100(),
    duration=zounds.Seconds(30),
    bit_depth=16,
    channels=2)

@zounds.simple_in_memory_settings
class Document(ff.BaseModel):
    meta = ff.JSONFeature(
        zounds.MetaData,
        store=True,
        encoder=zounds.AudioMetaDataEncoder)

    raw = ff.ByteStreamFeature(
        ff.ByteStream,
        chunksize=chunksize,
        needs=meta,
        store=False)

    pcm = zounds.AudioSamplesFeature(
        zounds.AudioStream,
        needs=raw,
        store=True)


synth = zounds.NoiseSynthesizer(zounds.SR11025())
samples = synth.synthesize(zounds.Seconds(10))
raw_bytes = samples.encode()
_id = Document.process(meta=raw_bytes)
doc = Document(_id)
print doc.pcm.__class__  # returns an AudioSamples instance
class zounds.soundfile.Resampler(samplerate=None, needs=None)[source]

Resampler expects to process AudioSamples instances (e.g., those produced by a AudioStream node), and will produce a new stream of AudioSamples at a new sampling rate.

Parameters:
  • samplerate (AudioSampleRate) – the desired sampling rate. If none is provided, the default is SR44100
  • needs (Feature) – a processing node that produces AudioSamples

Here’s how you’d typically see Resampler used in a processing graph.

import featureflow as ff
import zounds

chunksize = zounds.ChunkSizeBytes(
    samplerate=zounds.SR44100(),
    duration=zounds.Seconds(30),
    bit_depth=16,
    channels=2)

@zounds.simple_in_memory_settings
class Document(ff.BaseModel):
    meta = ff.JSONFeature(
        zounds.MetaData,
        store=True,
        encoder=zounds.AudioMetaDataEncoder)

    raw = ff.ByteStreamFeature(
        ff.ByteStream,
        chunksize=chunksize,
        needs=meta,
        store=False)

    pcm = zounds.AudioSamplesFeature(
        zounds.AudioStream,
        needs=raw,
        store=True)

    resampled = zounds.AudioSamplesFeature(
        zounds.Resampler,
        samplerate=zounds.SR22050(),
        needs=pcm,
        store=True)


synth = zounds.NoiseSynthesizer(zounds.SR11025())
samples = synth.synthesize(zounds.Seconds(10))
raw_bytes = samples.encode()
_id = Document.process(meta=raw_bytes)
doc = Document(_id)
print doc.pcm.samplerate.__class__.__name__  # SR11025
print doc.resampled.samplerate.__class__.__name__  # SR22050
class zounds.soundfile.OggVorbis(needs=None)[source]

OggVorbis expects to process a stream of raw bytes (e.g. one produced by featureflow.ByteStream) and produces a new byte stream where the original audio samples are ogg-vorbis encoded

Parameters:needs (Feature) – a feature that produces a byte stream (e.g. featureflow.Bytestream)

Here’s how you’d typically see OggVorbis used in a processing graph.

import featureflow as ff
import zounds


chunksize = zounds.ChunkSizeBytes(
    samplerate=zounds.SR44100(),
    duration=zounds.Seconds(30),
    bit_depth=16,
    channels=2)

@zounds.simple_in_memory_settings
class Document(ff.BaseModel):
    meta = ff.JSONFeature(
        zounds.MetaData,
        store=True,
        encoder=zounds.AudioMetaDataEncoder)

    raw = ff.ByteStreamFeature(
        ff.ByteStream,
        chunksize=chunksize,
        needs=meta,
        store=False)

    ogg = zounds.OggVorbisFeature(
        zounds.OggVorbis,
        needs=raw,
        store=True)


synth = zounds.NoiseSynthesizer(zounds.SR11025())
samples = synth.synthesize(zounds.Seconds(10))
raw_bytes = samples.encode()
_id = Document.process(meta=raw_bytes)
doc = Document(_id)
# fetch and decode a section of audio
ts = zounds.TimeSlice(zounds.Seconds(2))
print doc.ogg[ts].shape  # 22050