DATA ENGINEERING — PARIS, FR

Kushath Nanduri

I build data infrastructure and ML systems — from neural audio codecs to LLM evaluation tooling.

STATUS: MSc Data Science, EDC Paris — graduating 08.2026 · Open to data engineering roles from 09.2026

01 / SYSTEMS

Systems

SYS/01DATA TOOLING

forge-prep

Data-readiness toolkit that audits and cleans enterprise corpora before they reach Mistral's Forge fine-tuning pipeline.

  • Published Python package (v0.1.0) — zero external dependencies, Python 3.10+ stdlib only.
  • 38-test suite with GitHub Actions CI; full packaging via pyproject.toml.
  • audit command emits a 0–100 Forge Readiness Score across six dimensions: volume, quality, dedup, PII, language, format.
  • clean command deduplicates, scrubs typed PII, and filters low-quality files into a Forge-ready corpus.
PythonCLIPyPIGitHub Actions
SYS/02NEURAL AUDIO

KrisCodec

Music-specialized neural audio codec, built from first principles, producing a custom .kris format.

  • Snake activations and a Residual Vector Quantizer (RVQ) at the codec core.
  • 55-test suite covering the model and training path.
  • End-to-end training pipeline from raw audio to encoded format.
  • Benchmarked at ~7.4 ms decode latency.
PyTorchRVQAudio DSP
SYS/03ARCHITECTURE RESEARCH

ARIA

Recurrent reasoning architecture with adaptive pondering, grafted onto a frozen transformer backbone.

  • Frozen GPT-2 Small (124M) + recurrent reasoning core (~20M) + halting controller (~2M) — ~146M total, ~22M trainable.
  • Adaptive halting: the model decides when it has reasoned enough before answering.
  • Trained across multiple Colab runs on ARC, GSM8K, and PIQA.
  • Produced concrete lessons on ponder-cost tuning and output-head design.
PyTorchbfloat16Colab
SYS/04APPLIED ML PRODUCT

VOXMAX

Shipped consumer voice-scoring app running on a custom DSP/ML backend.

  • React Native / Expo frontend (SDK 54) across 11 screens.
  • FastAPI backend: deterministic acoustic feature scoring plus LLM coaching.
  • ~12 s of speech → a VoxScore (0–100) with four sub-dials and one improvement drill.
  • Privacy by design: the backend deletes uploaded audio immediately after analysis.
React NativeExpoFastAPIPython