LLMs Roadmap by AhmadOsman

This is a log for my progress on the LLMs roadmap by AhmadOsman.

TLDR; there are 4 phases with resources that I will follow to have a solid foundation in LLMs.

Phases 1: Transformers

Intuition: 3Blue1Brown on Transformers/Attention. Jay Alammar’s Illustrated Transformer. Watch, take notes, and re-watch if you need to.
Formal: Stanford CS224N Natural Language Processing with Deep Learning (the lectures, not just the slides).
Paper: “Attention Is All You Need”. Don’t read it yet if you haven’t built the mental model above. Otherwise, you’ll drown. READ ONLY ONCE COMFORTABLE WITH ALL THE ABOVE.
Hands-on: Karpathy’s “Let’s Build GPT” (eureka moment, you’ll realize how simple all of it is).
Project: Reimplement a decoder-only GPT from scratch. Bonus points: swap in your own tokenizer, try BPE/SentencePiece.

Phases 2: Scaling Laws & Training for Scale

LLMs got good through figuring out what to scale, how to scale it, proving it could scale, and showing that it actually works.

Papers: “Scaling Laws for Neural Language Models” (Kaplan et al), then “Chinchilla” (Hoffmann et al). Learn the difference.
Distributed Training: Learn what Data, Tensor, and Pipeline Parallelism actually do. Then set up multi-GPU training with HuggingFace Accelerate. Yes, you’ll hate CUDA at some point. Such is life.
Project: Pick a model, run a small distributed job. Play with batch sizes, gradient accumulation. Notice how easy it is to run out of VRAM? Good. Welcome to my world .

Phases 3: Alignment & PEFT

Fine-tuning is not just a cheap trick. RLHF and PEFT are the reason you can actually use LLMs for real-world use cases.

RLHF: OpenAI’s “Aligning language models to follow instructions” blog post, then Ouyang et al’s paper. Grasp the SFT ➡️ Reward Model ➡️ RL pipeline. Don’t get lost in PPO math too much.
CAI/RLAIF: Read Anthropic’s “Constitutional AI”.
LoRA/QLoRA: Read both papers, then actually implement LoRA in PyTorch. If you can’t replace a Linear layer with a LoRA-adapted version, try again.
Project: Fine-tune an open model (e.g. gpt2, distilbert) with your own LoRA adapters. Do it for a real dataset, not toy text.

Phases 4: Production

You made it to the only part that most people ever see: the actual app.

Inference Optimization: Read the FlashAttention paper. Understand why it works, then try it with a quantized model.

Where To Learn Them

Below is what to read/watch for the this learning plan.

Math/CS Pre-Reqs

PyTorch Fundamentals

Transformers & LLMs

Scaling & Distributed Training

Alignment & PEFT

Inference