LLMs Roadmap by AhmadOsman
This is a log for my progress on the LLMs roadmap by AhmadOsman.
TLDR; there are 4 phases with resources that I will follow to have a solid foundation in LLMs.
Phases 1: Transformers
Phases 2: Scaling Laws & Training for Scale
LLMs got good through figuring out what to scale, how to scale it, proving it could scale, and showing that it actually works.
- Papers: “Scaling Laws for Neural Language Models” (Kaplan et al), then “Chinchilla” (Hoffmann et al). Learn the difference.
- Distributed Training: Learn what Data, Tensor, and Pipeline Parallelism actually do. Then set up multi-GPU training with HuggingFace Accelerate. Yes, you’ll hate CUDA at some point. Such is life.
- Project: Pick a model, run a small distributed job. Play with batch sizes, gradient accumulation. Notice how easy it is to run out of VRAM? Good. Welcome to my world .
Phases 3: Alignment & PEFT
Fine-tuning is not just a cheap trick. RLHF and PEFT are the reason you can actually use LLMs for real-world use cases.
- RLHF: OpenAI’s “Aligning language models to follow instructions” blog post, then Ouyang et al’s paper. Grasp the SFT ➡️ Reward Model ➡️ RL pipeline. Don’t get lost in PPO math too much.
- CAI/RLAIF: Read Anthropic’s “Constitutional AI”.
- LoRA/QLoRA: Read both papers, then actually implement LoRA in PyTorch. If you can’t replace a Linear layer with a LoRA-adapted version, try again.
- Project: Fine-tune an open model (e.g. gpt2, distilbert) with your own LoRA adapters. Do it for a real dataset, not toy text.
Phases 4: Production
You made it to the only part that most people ever see: the actual app.
- Inference Optimization: Read the FlashAttention paper. Understand why it works, then try it with a quantized model.
Where To Learn Them
Below is what to read/watch for the this learning plan.
Math/CS Pre-Reqs
PyTorch Fundamentals
Transformers & LLMs
Scaling & Distributed Training
Alignment & PEFT
Inference