SYS/02 · NEURAL AUDIO

KrisCodec

Music-specialized neural audio codec, built from first principles, producing a custom .kris format.

PyTorchRVQAudio DSP

TESTS: 55 passing
DECODE: ~7.4 ms
QUANT: RVQ
ACTIVATION: Snake

KrisCodec is a neural audio codec specialized for music, written from first principles to understand codec design end to end, from the DSP front end through vector quantization to the decoder.

KEY FACTS

Snake activations and a Residual Vector Quantizer (RVQ) at the codec core.
55-test suite covering the model and training path.
End-to-end training pipeline from raw audio to encoded format.
Benchmarked at ~7.4 ms decode latency.

Codec architecture

The codec uses Snake activations, periodic activations suited to audio's oscillatory structure, and a Residual Vector Quantizer (RVQ) to discretize the latent into a compact, layered code. Output is a custom .kris format.

Training and validation

A full end-to-end training pipeline takes raw audio through the encoder, quantizer, and decoder. A 55-test suite covers the model components and the training path. Decode is benchmarked at roughly 7.4 ms.

Back to all projects