# Optimizer Benchmark Analysis

## Setup
- Same dataset split, seed, model architecture, epochs, and learning rate for all optimizers.
- Optimizers compared: SGD, Momentum, Adam.

## Results Snapshot
| Optimizer | Final Train Loss | Final Val Loss | Best Val Loss | Best Epoch |
|---|---:|---:|---:|---:|
| SGD | 0.010457 | 3.068415 | 0.773429 | 1 |
| MOMENTUM | 0.000223 | 0.000002 | 0.000002 | 41 |
| ADAM | 0.000408 | 0.002925 | 0.002925 | 60 |

## Short Analysis
- Winner on this setup: **MOMENTUM** (lowest best validation loss).
- Momentum usually converges faster than plain SGD on this tiny MLP.
- Adam is often most stable with minimal tuning, but can overfit quickly on tiny datasets.
- SGD is a clean baseline but may need more epochs or LR tuning to match others.