Andrej Karpathy Releases 'nanochat': A Minimal, Finish-to-Finish ChatGPT-Type Pipeline You Can Practice In ~4 Hours For ~$100

Andrej Karpathy has open-sourced nanochat, a compact, dependency-light codebase that implements a full ChatGPT-style stack—from tokenizer coaching to internet UI inference—geared toward reproducible, hackable LLM coaching on a single multi-GPU node.

The repo offers a single-script “speedrun” that executes the total loop: tokenization, base pretraining, mid-training on chat/multiple-choice/tool-use knowledge, Supervised Finetuning (SFT), elective RL on GSM8K, analysis, and serving (CLI + ChatGPT-like internet UI). The advisable setup is an 8×H100 node; at ~$24/hour, the 4-hour speedrun lands close to $100. A post-run report.md summarizes metrics (CORE, ARC-E/C, MMLU, GSM8K, HumanEval, ChatCORE).

Tokenizer and knowledge path

Tokenizer: customized Rust BPE (constructed by way of Maturin), with a 65,536-token vocab; coaching makes use of FineWeb-EDU shards (re-packaged/shuffled for easy entry). The walkthrough reviews ~4.8 characters/token compression and compares in opposition to GPT-2/4 tokenizers.
Eval bundle: a curated set for CORE (22 autocompletion datasets like HellaSwag, ARC, BoolQ, and so forth.), downloaded into ~/.cache/nanochat/eval_bundle.

Mannequin, scaling, and “speedrun” goal

The speedrun config trains a depth-20 Transformer (≈560M params with 1280 hidden channels, 10 consideration heads of dim 128) for ~11.2B tokens per Chinchilla-style scaling (params × ~20 tokens). The writer estimates this run as a ~4e19 FLOPs functionality mannequin. Coaching makes use of Muon for matmul parameters and AdamW for embeddings/unembeddings; loss is reported in bits-per-byte (bpb) to be tokenizer-invariant.

Mid-training, SFT, and gear use

After pretraining, mid-training adapts the bottom mannequin to conversations (SmolTalk) and explicitly teaches multiple-choice habits (100K MMLU auxiliary-train questions) and software use by inserting <|python_start|>…<|python_end|> blocks; a small GSM8K slice is included to seed calculator-style utilization. The default combination: SmolTalk (460K), MMLU aux-train (100K), GSM8K fundamental (8K), totaling 568K rows.

SFT then fine-tunes on higher-quality conversations whereas matching test-time formatting (padded, non-concatenated rows) to cut back prepare/inference mismatch. The repo’s instance post-SFT metrics (speedrun tier) report ARC-Straightforward 0.3876, ARC-Problem 0.2807, MMLU 0.3151, GSM8K 0.0455, HumanEval 0.0854, ChatCORE 0.0884.

Software use is wired end-to-end: the customized Engine implements KV cache, prefill/decode inference, and a easy Python interpreter sandbox for tool-augmented runs—utilized in each coaching and analysis flows.

Non-compulsory RL on GSM8K by way of a simplified GRPO loop

The ultimate (elective) stage applies reinforcement studying on GSM8K with a simplified GRPO routine. The walkthrough clarifies what’s omitted relative to canonical PPO-style RLHF: no belief area by way of a reference mannequin, no KL penalties, on-policy updates (discard PPO ratios/clip), token-level GAPO-style normalization, and mean-shift benefit. Virtually, it behaves near REINFORCE whereas protecting the group-relative benefit calculation. Scripts scripts.chat_rl and scripts.chat_eval -i rl -a GSM8K reveal the loop.

Price/high quality scaling and greater fashions

The README sketches two bigger targets past the ~$100 speedrun:

~$300 tier: d=26 (~12 hours), barely surpasses GPT-2 CORE; requires extra pretraining shards and batch-size changes.
~$1,000 tier: ~41.6 hours, with materially improved coherence and fundamental reasoning/coding capacity.

The repo additionally notice prior experimental runs the place a d=30 mannequin educated for ~24 hours reached 40s on MMLU, 70s on ARC-Straightforward, 20s on GSM8K.

Analysis snapshot (speedrun tier)

An instance report.md desk for the ~$100/≈4-hour run reveals: CORE 0.2219 (base); after mid-training/SFT, ARC-E 0.3561→0.3876, ARC-C ~0.2875→0.2807, MMLU 0.3111→0.3151, GSM8K 0.0250→0.0455, HumanEval 0.0671→0.0854, ChatCORE 0.0730→0.0884; wall-clock 3h51m.

Screenshot 2025 10 14 at 10.16.06 AM — https://github.com/karpathy/nanochat/discussions/1

Key Takeaways

nanochat is a minimal, end-to-end ChatGPT-style stack (~8K LOC) that runs by way of a single speedrun.sh on one 8×H100 node (~4h ≈ $100).
The pipeline covers tokenizer (Rust BPE), base pretraining, mid-training, SFT, elective RL on GSM8K (simplified GRPO), analysis, and serving (CLI + Internet UI).
Speedrun metrics (instance report.md): CORE 0.2219 base; after SFT—ARC-Straightforward 0.3876, ARC-Problem 0.2807, MMLU 0.3151, GSM8K 0.0455, HumanEval 0.0854.
Scaling tiers are outlined: ~$300 (d=26, ~12h) “barely outperforms GPT-2 CORE”; ~$1,000 (~41.6h) for materially higher coherence/reasoning.

Karpathy’s nanochat lands in a helpful center floor: a single, clear, dependency-light repository that stitches tokenizer coaching (Rust BPE), pretraining on FineWeb-EDU, mid-training (SmolTalk/MMLU aux/GSM8K with software use tags), SFT, elective simplified GRPO on GSM8K, and a skinny Engine (KV cache, prefill/decode, Python interpreter) right into a reproducible speedrun on an 8×H100 node, producing a traceable report.md with CORE/ARC/MMLU/GSM8K/HumanEval and a minimal Internet UI.

Try the Technical particulars and Codes. Be at liberty to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be at liberty to comply with us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you possibly can be part of us on telegram as properly.

Excited to launch new repo: nanochat!
(it is among the many most unhinged I’ve written).

Not like my earlier related repo nanoGPT which solely lined pretraining, nanochat is a minimal, from scratch, full-stack coaching/inference pipeline of a easy ChatGPT clone in a single,… pic.twitter.com/LLhbLCoZFt

— Andrej Karpathy (@karpathy) October 13, 2025

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.

🙌 Comply with MARKTECHPOST: Add us as a most popular supply on Google.

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the most recent breakthroughs, get unique updates, and join with a world community of future-focused thinkers.
Unlock tomorrow’s traits as we speak: learn extra, subscribe to our publication, and turn out to be a part of the NextTech neighborhood at NextTech-news.com

What's Hot

Blinkit weighs long-term development, heavy investments over near-term EBITDA

Cipher Prescription drugs is a purchase, this analyst says

Mastercard desires to energy Africa’s cross-border future

Andrej Karpathy Releases ‘nanochat’: A Minimal, Finish-to-Finish ChatGPT-Type Pipeline You Can Practice in ~4 Hours for ~$100

Qualifire AI Open-Sources Rogue: An Finish-to-Finish Agentic AI Testing Framework Designed to Consider the Efficiency, Compliance, and Reliability of AI Brokers

QeRL: NVFP4-Quantized Reinforcement Studying (RL) Brings 32B LLM Coaching to a Single H100—Whereas Bettering Exploration

Constructing a Context-Folding LLM Agent for Lengthy-Horizon Reasoning with Reminiscence Compression and Software Use

Blinkit weighs long-term development, heavy investments over near-term EBITDA

Cipher Prescription drugs is a purchase, this analyst says

Mastercard desires to energy Africa’s cross-border future

Blinkit weighs long-term development, heavy investments over near-term EBITDA

Cipher Prescription drugs is a purchase, this analyst says

Mastercard desires to energy Africa’s cross-border future

What's Hot

Andrej Karpathy Releases ‘nanochat’: A Minimal, Finish-to-Finish ChatGPT-Type Pipeline You Can Practice in ~4 Hours for ~$100

Tokenizer and knowledge path

Mannequin, scaling, and “speedrun” goal

Mid-training, SFT, and gear use

Non-compulsory RL on GSM8K by way of a simplified GRPO loop

Price/high quality scaling and greater fashions

Analysis snapshot (speedrun tier)

Key Takeaways

Related Posts

Subscribe For Latest Updates