End-to-end custom model training, delivered as Docker
Built by Nicholas Falshaw · Engineered end-to-end · Production since 2025
Off-the-shelf LLMs don't know your domain. Cloud fine-tuning APIs are expensive, slow, and leak proprietary training data. Most open-source fine-tuning recipes are notebook demos that fall over in production.
A containerized fine-tuning pipeline that runs on a single commodity GPU. Ingests JSONL training data, runs QLoRA training with a configurable base model, merges adapter weights, exports to GGUF for Ollama, and runs a benchmark harness — all from one docker compose up.
Dataset loader
Validates JSONL schema, deduplicates, splits train/eval
QLoRA trainer
PEFT + bitsandbytes 4-bit quantization, configurable rank/alpha/target-modules
Checkpoint merger
Merges adapter into base weights, saves HF-format model
GGUF exporter
llama.cpp conversion with configurable quantization (Q4_K_M / Q5_K_M / Q8_0)
Ollama registrar
Generates Modelfile, pushes to local Ollama instance
Benchmark harness
Perplexity + task-specific evals against held-out test set
Trained custom models on domain-specific corpora without sending data to third-party APIs. Inference served locally via Ollama on the same VPS. Replaces recurring fine-tuning spend with a one-time training run.