Model Training

LLM Fine-Tuning Pipeline

End-to-end custom model training, delivered as Docker

Built by Nicholas Falshaw · Engineered end-to-end · Production since 2025

The problem

Off-the-shelf LLMs don't know your domain. Cloud fine-tuning APIs are expensive, slow, and leak proprietary training data. Most open-source fine-tuning recipes are notebook demos that fall over in production.

What I built

A containerized fine-tuning pipeline that runs on a single commodity GPU. Ingests JSONL training data, runs QLoRA training with a configurable base model, merges adapter weights, exports to GGUF for Ollama, and runs a benchmark harness — all from one docker compose up.

Architecture

  • Dataset loader

    Validates JSONL schema, deduplicates, splits train/eval

  • QLoRA trainer

    PEFT + bitsandbytes 4-bit quantization, configurable rank/alpha/target-modules

  • Checkpoint merger

    Merges adapter into base weights, saves HF-format model

  • GGUF exporter

    llama.cpp conversion with configurable quantization (Q4_K_M / Q5_K_M / Q8_0)

  • Ollama registrar

    Generates Modelfile, pushes to local Ollama instance

  • Benchmark harness

    Perplexity + task-specific evals against held-out test set

Tech stack

PythonLlamaFactoryllama.cppOllamaMLflowRunPodDocker

Outcome

Trained custom models on domain-specific corpora without sending data to third-party APIs. Inference served locally via Ollama on the same VPS. Replaces recurring fine-tuning spend with a one-time training run.

Rogue AI • Production Systems •