jeipoppu

Server-side audio processing

Python

public

J-Pop - annoying anime music, or in our case, server-side audio features processing.

Uses: Amazon Kinesis, DynamoDB

Client (whether it be Morpheus, a laptop, an app, etc.) pushes audio feature data to our server, and our server puts it the Kinesis “audio_features” stream.
AudioFeaturesConsumer.py polls the stream, processes the audio features (i.e. runs a classifier on it), and puts it into the “audio_products” stream
Someone (unknown who or what right now) will aggregate the audio_products stream and then ship it off to S3, or wherever
Config file determines everything… which S3 bucket/key to pull the classifier from, which classifier to use, which streams, etc.
Execution: ./AudioFeaturesConsumer.py ./config.txt

Example of The Data Model

Kinesis stream is divided into shards, each shard should have one processor of a type.

Let’s say I have a classifier for snoring, and a classifier for dog barking, and for whatever reason they’re not in the same neural net/SVM/whatever so I am running two separate processors.
Each Kinesis “shard” gets one processor of each type

STREAM: audio_features

   —> shard1

   —> shard2

shard1

   —> bark processor, node1

    —> snore processor, node2

shard2

—> bark processor, node3

—> snore processor, node4

node1,node2,node3,node4 ----> STREAM: audio_products

audio_features stream uses the “matrix.proto” message definitions, which is define in the kitsune repo (kitsune/kitsune/protobuf)
audio_products stream uses the “classifiers.proto” message definitions, which is defined in the ./proto directory
ALL CLASSIFIER CLASS LABELS ARE DEFINED IN THE classifiers.proto DEFINITION. I.e. class label SNORING is audio_class_data.audio_class.SNORING
Everyone has access to the protobuf messages, it’s backwards compatible… lots of good reasons to use it.

Find me