Rebuilding Micrograd

Overview

Micrograd is a minimalistic automatic differentiation (autograd) engine implemented from scratch in Python. It demonstrates the core principles of backpropagation and neural network training using only scalar values and basic mathematical operations.

Technical Architecture

Automatic Differentiation Engine (engine.py)

The heart of the system is the Value class, which wraps scalar numbers and tracks their computational history:

Core Data Structure


class Value:
data: float # The actual numerical value
grad: float # Gradient (partial derivative)
_prev: set # Parent nodes in computation graph
_op: str # Operation that created this value
_backward: function # Local gradient computation

Supported Operations

Arithmetic Operations:

Activation Functions:

Backpropagation Algorithm

The backward() method implements reverse-mode automatic differentiation:

  1. Local Gradient Computation: Each operation defines its local gradient via _backward lambda
  2. Chain Rule Application: Gradients propagate backward through the computation graph
  3. Accumulation: Gradients are accumulated for nodes with multiple parents

Neural Network Library (nn.py)

Built on top of the autograd engine, providing modular neural network components:

Module Base Class

Neuron Implementation


class Neuron(Module):
w: List[Value] # Weights (randomly initialized [-1, 1])
b: Value # Bias (randomly initialized [-1, 1])

Forward Pass: Computes weighted sum + bias, applies tanh activation
Parameters: Returns all weights and bias as trainable parameters

Layer Architecture

Multi-Layer Perceptron (MLP)

class MLP(Module):
layers: List[Layer] # Sequential layer composition

Architecture: Takes input size and list of output sizes per layer
Forward Pass: Sequential application of layers
Flexibility: Supports arbitrary depth and width configurations

Computational Complexity

Memory: O(n) where n is the number of operations in computation graph
Time:

Current Limitations

  1. Scalar Only: No vector/matrix operations
  2. Simple Optimizers: No built-in optimization algorithms
  3. Limited Activations: Only tanh, ReLU, and exponential
  4. No Regularization: No dropout, batch norm, or weight decay

Potential Extensions

  1. Tensor Support: Extend to multi-dimensional arrays
  2. More Activations: Sigmoid, softmax, GELU, etc.
  3. Optimizers: SGD, Adam, RMSprop implementations
  4. Loss Functions: Cross-entropy, MSE, etc.
  5. Regularization: L1/L2 penalties, dropout layers
References:

The spelled-out intro to neural networks and backpropagation: building micrograd