From-First-Principles Deep Learning

Micrograd From Scratch

A teaching-first deep learning build that reconstructs reverse-mode autograd, MLPs, optimizers, schedulers, and a tiny training stack in pure Python.

View GitHub Repo Open Benchmark JSON

Core Scope Autograd + MLP + Training

Scalar graph engine, model stack, optimizer step logic, and reproducible artifacts.

Best Benchmark Result Momentum

Best validation loss of 0.000002 on the shared tiny benchmark setup.

Engineering Intent Clarity Over Throughput

Readable internals first. This is meant to explain training mechanics, not compete with PyTorch.

What This Project Rebuilds

Scalar Value graph objects with reverse-mode backpropagation
Neuron, Layer, and MLP abstractions
SGD, Momentum, and Adam optimizers
Learning-rate schedulers and gradient clipping support
Mini-batch training, train/validation tracking, and benchmark artifact generation

Why It Matters

The point of the project is not novelty. The point is implementation-level understanding. Rebuilding the stack exposes how gradients flow, how optimizers actually update parameters, and what tradeoffs appear once you stop relying on framework internals.

Training Flow

Forward Pass
Build scalar graph

→

Backward Pass
Reverse-topological grad accumulation

→

Optimizer Step
Update parameter data

→

Trainer
Batching, metrics, scheduler

Optimizer Benchmark

Same dataset split, seed, architecture, and learning rate. Only the optimizer changes.

Read benchmark notes

Optimizer	Final Train Loss	Final Val Loss	Best Val Loss	Best Epoch
SGD	0.010457	3.068415	0.773429	1
Momentum	0.000223	0.000002	0.000002	41
Adam	0.000408	0.002925	0.002925	60

Design Tradeoffs

Scalar-level autograd keeps internals visible but makes training slow.
Small abstractions keep the code readable for learning and interview discussion.
The benchmark is intentionally tiny, so its results should be interpreted as educational, not general.

Limitations

No tensor backend, GPU support, mixed precision, or serious throughput story.
Numerical stability is limited compared with mature frameworks.
This is a proof-of-understanding artifact, not a production training library.