SYS/01 — DATA TOOLING
forge-prep
Data-readiness toolkit that audits and cleans enterprise corpora before they reach Mistral's Forge fine-tuning pipeline.
- —Published Python package (v0.1.0) — zero external dependencies, Python 3.10+ stdlib only.
- —38-test suite with GitHub Actions CI; full packaging via pyproject.toml.
- —audit command emits a 0–100 Forge Readiness Score across six dimensions: volume, quality, dedup, PII, language, format.
- —clean command deduplicates, scrubs typed PII, and filters low-quality files into a Forge-ready corpus.
PythonCLIPyPIGitHub Actions
SYS/02 — NEURAL AUDIO
KrisCodec
Music-specialized neural audio codec, built from first principles, producing a custom .kris format.
- —Snake activations and a Residual Vector Quantizer (RVQ) at the codec core.
- —55-test suite covering the model and training path.
- —End-to-end training pipeline from raw audio to encoded format.
- —Benchmarked at ~7.4 ms decode latency.
SYS/03 — ARCHITECTURE RESEARCH
ARIA
Recurrent reasoning architecture with adaptive pondering, grafted onto a frozen transformer backbone.
- —Frozen GPT-2 Small (124M) + recurrent reasoning core (~20M) + halting controller (~2M) — ~146M total, ~22M trainable.
- —Adaptive halting: the model decides when it has reasoned enough before answering.
- —Trained across multiple Colab runs on ARC, GSM8K, and PIQA.
- —Produced concrete lessons on ponder-cost tuning and output-head design.
SYS/04 — APPLIED ML PRODUCT
VOXMAX
Shipped consumer voice-scoring app running on a custom DSP/ML backend.
- —React Native / Expo frontend (SDK 54) across 11 screens.
- —FastAPI backend: deterministic acoustic feature scoring plus LLM coaching.
- —~12 s of speech → a VoxScore (0–100) with four sub-dials and one improvement drill.
- —Privacy by design: the backend deletes uploaded audio immediately after analysis.
React NativeExpoFastAPIPython