Bridging Theory, Understanding, and Practice
Spring 2026 | UC Berkeley
Large-Scale AI as an End-to-End Engineering Discipline
Central inquiry: How do we build, train, and deploy large-scale AI systems by treating them as full-stack engineered artifacts, where hardware constraints, software stacks, and optimization dynamics jointly determine model behavior and performance?
This course examines the principles required to build, train, and deploy large-scale AI models. We treat large-scale AI as an end-to-end engineering discipline, where a model is a computational graph that must be trained, specialized, evaluated, deployed, monitored, and iterated onβunder hard constraints from hardware, data, and serving economics.
The course follows the lifecycle end to end:
NVIDIA is the compute sponsor for Scalable AI: Bridging Theory, Understanding, and Practice. Their support provides the GPU infrastructure that makes the course labs and projects possible.
Ed only. Please post all course questions on Ed (https://edstem.org/us/courses/94200/). We do not use Slack or other channels for course Q&A.
If you have not been automatically added, please use this link to add yourself with code BCQ8Mj. Once we finalize the enrollment, we will purge all non-students from the course forum to encourage free discussion among students.
Groups of 4β6. Each group receives 1 H100 node (8ΓH100) for the semester (8 total nodes on GCP). Each group is assigned a single static external IP address for SSH access.
Course questionnaire (Week 1): a short onboarding questionnaire was released in the first lecture (Due Friday Jan 23rd), and enrollment/compute allocations will be finalized by the end of the second week.
Students will gain hands-on experience with industry-standard tools and frameworks throughout the course:
Everything in one place: lecture plan, dated timeline, deadlines/milestones, and recommended readings. Dates below are for Spring 2026. Lecture plan subject to change.
| Date | Lecture / Events | Recommended Readings | Notes |
|---|---|---|---|
| Part 1: Architecture β Defining the Computational Graph and Scaling Strategies | |||
| Jan 20 |
L1. Course Overview and the Modern AI Stack
Key topics
Also:
|
||
| Jan 22 |
L2. All About Performance
Key topics
|
|
|
| Jan 27 |
L2. All About Performance Lecture Extension
Key topics
Also:
|
||
| Jan 29 |
L3. Architectures to Break Bottlenecks
Key topics
Also:
|
β | |
| Jan 30 |
[Special Lecture]
L4. Parallelism Strategies
Guest: Venkat Srinivasan
|
β | Guest |
| Feb 3 |
[Guest Lecture]
L5. NeMo AutoModel
Guest: Hemil Desai, Zhiyu Li
Key topicsPractical instruction for performance profiling, arithmetic intensity, and optimization workflows. |
|
Guest |
| Part 2: Pre-Training of Language Models β The Engineering of βBase Modelsβ | |||
| Feb 5 |
L6. Introduction to Pre-Training
Key topics
|
β | |
| Feb 10 |
[Guest Lecture]
L7. Powering Pre-Training: NeMo Curator
Guest: Sarah Yurick, Abhinav Garg
Key topicsHands-on curating work with NeMo Curator: building usable pre-training datasets. |
β | Guest |
| Feb 12 |
[Guest Lecture]
L8. Case Study: The Pre-Training of Nano-V3
Guest: Roger Waleffe (NVIDIA)
Key topicsA deep dive into pre-training data, evaluation, ablations, and the final βhero runβ design choices. Also:
|
β | Guest |
| Feb 17 |
L9. Optimizer Fundamentals
Key topics
Also:
|
β | |
| Feb 19 |
[Guest Lecture]
L10. Looking To The Future: Emerging Optimizers
Guest: Boxiang Wang
Key topicsMoving beyond AdamW to emerging approaches (e.g., Muon, SOAP), and how to reason about their tradeoffs at scale. |
β | Guest |
| Feb 24 |
[Guest Lecture]
L11. Looking To The Future: Synthetic Data Powering Pre-Training
Guest: Eric Tramel, Maarten Van Segbroeck
Key topicsPatterns and tooling for scalable synthetic data generation pipelines that can power pre-training, including quality controls and reproducibility. |
β | Guest |
| Part 3: Post-Training of Language Models β Specializing Models Through SFT and RL | |||
| Feb 26 |
L12. Intro To the LLM Post-Training Lifecycle and Evaluation
Key topics
Also:
|
β | |
| Mar 3 |
L13. The Data Powering Post-Training: SFT Data Engineering and RL Environments
Key topics
|
β | |
| Mar 5 |
[Guest Lecture]
L14. NeMo Data Designer Deep Dive
Guest: Dhruv Nathawani, Yev Meyer
Key topicsA practical walkthrough of NeMo Data Designer for building and iterating on high-quality post-training datasets. |
β | Guest |
| Mar 10 |
L15. Reinforcement Learning Algorithms Powering Near-Frontier Post-Training
Key topicsRL Algorithms and their application to post-training. |
β | |
| Mar 12 |
[Guest Lecture]
L16. Case Study: Post-Training of Nemotron-NanoV3 and NeMoRL
Guest:
Venkat Srinivasan,
Gerald Shen
Key topicsA detailed case study of post-training decisions, data strategy, and evaluation tradeoffs in practice. Also:
|
β | Guest |
| Part 4: Efficient Inference β Deployment Preparation and High-Performance Frameworks | |||
| Mar 17 |
[Guest Lecture]
L17. Deployment Preparation: Speculative Decoding, Quantization, Pruning, and NAS
Key topics
|
β | Guest |
| Mar 19 |
L18. Fundamentals and Overview of High-Performance Inference Frameworks
Key topicsA systems overview of modern inference engines, batching/scheduling, memory management, and throughput/latency tradeoffs. |
||
| Mar 24 |
Spring Recess No class
|
β | |
| Mar 26 |
Spring Recess No class
|
β | |
| Mar 31 |
[Guest Lecture]
L19. High-Performance Inference using Dynamo and TRT-LLM
Key topicsCompilation and runtime strategies for production inference, including graph capture, kernel selection, and end-to-end profiling. |
β | Guest |
| Apr 2 |
L20. High-Performance Inference using vLLM and SGLang
Key topicsServing engines and control runtimes for LLM applications: throughput optimization, KV cache management, and efficient scheduling. Also:
|
β | |
| Part 5: LLM Applications and Use Cases β Building Real-World AI Systems | |||
| Apr 7 |
L21. Fundamentals of Context Engineering
Key topicsDesigning prompts, retrieval, memory, and tool usage under latency/cost constraints and reliability targets. Also:
|
β | |
| Apr 9 |
[Guest Lecture]
L22. Agentic Applications
Key topicsBuilding agentic systems: planning, tool use, orchestration, evaluation, and failure modes. |
β | Guest |
| Apr 14 |
[Guest Lecture]
L23. Safety Guardrails
Key topicsDeployment-time safety: guardrails, red teaming, monitoring, and risk mitigation in real systems. |
β | Guest |
| Part 6: Research Greenfields β Emerging Research Directions | |||
| Apr 16 |
L24. Diffusion Language Models
Key topicsAlternative generative paradigms beyond autoregressive decoding and how they change training/inference tradeoffs. |
β | |
| Apr 21 |
L25. Harness Engineering
Key topicsMethods for engineering software harnesses for modern LLMs. |
β | |
| Apr 23 |
L26. Multi-Agent Systems and Architecture
Key topicsMulti-agent coordination, communication, and architecture patterns for scalable AI systems. Also:
|
β | |
| Apr 26 |
Draft project report due Milestone
Submit on Gradescope
|
β | Peer review cycle |
| Apr 28 |
Project Presentations (Session A) Milestone
Week prior to RRR week
|
β | |
| Apr 30 |
Project Presentations (Session B) Milestone
Week prior to RRR week
|
β | |
| May 1 |
Peer review due Milestone
Submit on Gradescope
|
β | |
| May 5 |
Poster session (RRR Week) Milestone
No formal lecture
|
β | |
| May 10 |
Final deliverables due Milestone
Code + final report
|
β | |
| May 15 |
Impact update Milestone
Proof of impact document
|
β | |
This course has no exams. Evaluation is based on assignments, a semester-long research project, scribing, and participation.
| Component | Weight |
|---|---|
| Research project | 50% |
| Assignments | 35% |
| Scribing (2 lectures per student) | 5% |
| Attendance & participation | 10% |
This course requires sustained weekly engagement (group work, compute usage, and in-person checkoffs). Letter grade only (no Satisfactory/Non-Satisfactory).
There are four group-based assignments, each spanning roughly 2β3 weeks. Assignments include conceptual questions and hands-on experiments (implementation, profiling, scaling, analysis) with an in-person oral checkoff/presentation per assignment.
All assignment release/due dates are listed in the Consolidated Schedule.
You have 3 total slack days across the semester for assignments. At most one slack day may be applied to any single assignment (extends deadline by 24 hours). No other late work is accepted without an approved exception.
The research project is group-based and open-ended, with an emphasis on producing something useful to the community.
All project milestones and dates are listed in the Consolidated Schedule.
Each student is expected to scribe 2 lectures. Scribe assignments will be posted on Ed. LLM assistance may be used for drafting, but the final scribe must be accurate, edited, and clearly written. If you use external sources or AI tools, cite them clearly.
Attendance is required. Participation includes contributing to in-class discussions and project check-ins, posting/answering on Ed, and providing constructive comments on lecture notes and slides.
Collaboration is encouraged within your group. Do not copy code or reports from other groups. If you use external code or AI assistants, cite the source clearly and ensure you understand the work you submit.
This course uses shared GPU infrastructure. For access/quota/networking issues, post on Ed and include timestamps, job IDs, logs, and a short repro description. Multi-node experiments require coordinating with another team to share nodes for a bounded window of time.
Submission platform (Gradescope): all written and code submissions are via Gradescope (4DDRWY), unless explicitly stated otherwise. Oral presentations/checkoffs require sign-up and in-person attendance.