Scalable AI

Bridging Theory, Understanding, and Practice

Spring 2026 | UC Berkeley

Large-Scale AI as an End-to-End Engineering Discipline

Announcements: Course information will be updated regularly. Check back for enrollment details, assignment deadlines, and lecture materials.

Course Overview

Central inquiry: How do we build, train, and deploy large-scale AI systems by treating them as full-stack engineered artifacts, where hardware constraints, software stacks, and optimization dynamics jointly determine model behavior and performance?

This course examines the principles required to build, train, and deploy large-scale AI models. We treat large-scale AI as an end-to-end engineering discipline, where a model is a computational graph that must be trained, specialized, evaluated, deployed, monitored, and iterated on—under hard constraints from hardware, data, and serving economics.

The course follows the lifecycle end to end:

Architecture → Pre-training → Post-training → Efficient inference → Applications → Research greenfields

Resources

Syllabus (PDF)

Course policies, project expectations, assignment workflow, and course structure.

View PDF Download

Tip: If your browser opens the PDF instead of downloading it, click “Download” in the PDF viewer.

Research Directions (PDF)

Suggested research directions and project ideas for the course.

View PDF Download

Lecture Videos

Lecture videos will be posted here after each lecture. Berkeley email necessary to access.

View YouTube

Course Team

Staff list may expand; check Ed for updates.

Logistics

Class Logistics

Course Number

EE 290 / 194

Instructors

Anant Sahai; Jiantao Jiao

Teaching and Support Staff

Haocheng Xi; Paul Zhou; Venkat Srinivasan (NVIDIA infrastructure and technology)

Full staff cards + photos in Course Team.

Lecture Time

Tuesdays & Thursdays, 9:30 AM – 11:00 AM

Location

521 Cory Hall

Office Hours

Instructor office hours: Tues & Thurs 11 - 12PM, Cory 258
Haocheng's TA Hours: Wed 4:30 - 5:30 (Soda 373)
Paul's TA Hours: Thurs 1:30 - 2:30 (BWW 1207)
Venkat's office hour (remote, link on Ed): Fri 3-4pm

Assignment Checkoffs

Checkoff slots by sign-up

Resources

Gradescope entry code: 4DDRWY

Communication

Ed only. Please post all course questions on Ed (https://edstem.org/us/courses/94200/). We do not use Slack or other channels for course Q&A.

If you have not been automatically added, please use this link to add yourself with code BCQ8Mj. Once we finalize the enrollment, we will purge all non-students from the course forum to encourage free discussion among students.

Compute

Groups of 4–6. Each group receives 1 H100 node (8×H100) for the semester (8 total nodes on GCP). Each group is assigned a single static external IP address for SSH access.

Course questionnaire (Week 1): a short onboarding questionnaire was released in the first lecture (Due Friday Jan 23rd), and enrollment/compute allocations will be finalized by the end of the second week.

Practical Tools

Students will gain hands-on experience with industry-standard tools and frameworks throughout the course:

NeMo AutoModel: Profiling, optimizing, and training LLMs
NeMo Curator: Scraping, cleaning, and curating large-scale pre-training corpora
NeMo Data Designer: Building high-quality post-training datasets
NeMo RL / NeMo Gym: Post-training infrastructure that powers Nemotron
Minitron: Deployment-efficiency preparation
Dynamo, TensorRT-LLM, SGLang, and vLLM: Efficient deployment
NeMo Guardrails / Garak: Safety during deployment

Consolidated Schedule

Everything in one place: lecture plan, dated timeline, deadlines/milestones, and recommended readings. Dates below are for Spring 2026. Lecture plan subject to change.

Date	Lecture / Events	Recommended Readings	Notes
Part 1: Architecture — Defining the Computational Graph and Scaling Strategies
Jan 20	L1. Course Overview and the Modern AI Stack 📄 Slides Key topics Course goals, structure, and expectations The full-stack view: hardware ↔ software ↔ optimization Lifecycle map: architecture → training → post-training → inference → applications Also: Questionnaire released Milestone	"The Bitter Lesson" — Rich Sutton "The Hardware Lottery" — Sara Hooker "Training Compute-Optimal Large Language Models" — Hoffmann et al. (Chinchilla) "An introduction to transformers" — Turner
Jan 22	L2. All About Performance 📄 Slides Key topics Matrix multiplication as the basis of deep learning The Roofline model and memory bandwidth bottlenecks Component-level cost analysis (compute vs. memory vs. communication) The economics of tokens: training vs. serving tradeoffs	How to Scale Your Model (Scaling Book), Chapters 1, 4, and 7 BRRR Intro GPU performance engineering resources
Jan 27	L2. All About Performance Lecture Extension Key topics Matrix multiplication as the basis of deep learning The Roofline model and memory bandwidth bottlenecks Component-level cost analysis (compute vs. memory vs. communication) The economics of tokens: training vs. serving tradeoffs Also: Research directions released Release A1 released (Parallelism Assignment) Release	How To Write A Fast Matrix Multiplication From Scratch With Tensor Cores
Jan 29	L3. Architectures to Break Bottlenecks 📄 Slides Key topics Parallelism mechanics (e.g., multi-dimensional parallelism) Interconnect topology and communication costs Automated orchestration and practical scaling patterns Also: Group formation due; questionnaire decisions finalized Milestone	—
Jan 30	[Special Lecture] L4. Parallelism Strategies 🎥 Recording 📄 Slides Guest: Venkat Srinivasan	—	Guest
Feb 3	[Guest Lecture] L5. NeMo AutoModel 📄 Slides Guest: Hemil Desai, Zhiyu Li Key topics Practical instruction for performance profiling, arithmetic intensity, and optimization workflows.	Automodel DSv3 implementation (Automodel) MoE component (Automodel) BRRR Intro MOE Scaling Field Notes	Guest
Part 2: Pre-Training of Language Models — The Engineering of “Base Models”
Feb 5	L6. Introduction to Pre-Training 📄 Slides Key topics Tokenization fundamentals The causal language modeling (CLM) task Pre-training fundamentals and scaling considerations How to evaluate pre-trained models Data curation: deduplication, quality filtering, and mixture design	—
Feb 10	[Guest Lecture] L7. Powering Pre-Training: NeMo Curator 📄 Slides Guest: Sarah Yurick, Abhinav Garg Key topics Hands-on curating work with NeMo Curator: building usable pre-training datasets.	—	Guest
Feb 12	[Guest Lecture] L8. Case Study: The Pre-Training of Nano-V3 📄 Slides Guest: Roger Waleffe (NVIDIA) Key topics A deep dive into pre-training data, evaluation, ablations, and the final “hero run” design choices. Also: Hypothesis statement v1 due Milestone	—	Guest
Feb 17	L9. Optimizer Fundamentals 📄 Slides Key topics SGD-family methods and adaptive optimizers (Adam/AdamW) Learning rate schedules, warmup, and stability Gradient clipping, normalization, and numerics Batch size, effective step size, and scaling behavior Also: A1 due (Feb 16) Deadline A2 released (Pre-Training Assignment) Release	—
Feb 19	[Guest Lecture] L10. Looking To The Future: Emerging Optimizers 📄 Slides Guest: Boxiang Wang Key topics Moving beyond AdamW to emerging approaches (e.g., Muon, SOAP), and how to reason about their tradeoffs at scale.	—	Guest
Feb 24	[Guest Lecture] L11. Looking To The Future: Synthetic Data Powering Pre-Training 📄 Slides (Part A) 📄 Slides (Part B) Guest: Eric Tramel, Maarten Van Segbroeck Key topics Patterns and tooling for scalable synthetic data generation pipelines that can power pre-training, including quality controls and reproducibility.	—	Guest
Part 3: Post-Training of Language Models — Specializing Models Through SFT and RL
Feb 26	L12. Intro To the LLM Post-Training Lifecycle and Evaluation 📄 Slides (Part A) 📄 Slides (Part B) Key topics Differences between pre-training and post-training data Chat templating and training fundamentals Post-training benchmarks and evaluation practices Also: Project proposal v2 due Milestone	—
Mar 3	L13. The Data Powering Post-Training: SFT Data Engineering and RL Environments 📄 Slides Key topics Synthetic data generation pipelines Rejection sampling and quality scoring Skills mixtures and data balancing RL environment construction and reward design basics	—
Mar 5	[Guest Lecture] L14. NeMo Data Designer Deep Dive 📄 Slides Guest: Dhruv Nathawani, Yev Meyer Key topics A practical walkthrough of NeMo Data Designer for building and iterating on high-quality post-training datasets.	—	Guest
Mar 10	L15. Reinforcement Learning Algorithms Powering Near-Frontier Post-Training 📄 Slides 📄 Slides Key topics RL Algorithms and their application to post-training.	—
Mar 12	[Guest Lecture] L16. Case Study: Post-Training of Nemotron-NanoV3 and NeMoRL 📄 Slides (Part A) 📄 Slides (Part B) Guest: Venkat Srinivasan, Gerald Shen Key topics A detailed case study of post-training decisions, data strategy, and evaluation tradeoffs in practice. Also: A2 due Deadline A3 release (Post-Training Assignment) on March 14thRelease	—	Guest
Part 4: Efficient Inference — Deployment Preparation and High-Performance Frameworks
Mar 17	[Guest Lecture] L17. Deployment Preparation: Speculative Decoding, Quantization, Pruning, and NAS 📄 Slides Key topics Speculative decoding: concepts and systems implications Advanced data types and quantization paradigms Calibration strategies and accuracy tradeoffs Pruning and neural architecture search (NAS) for inference efficiency	—	Guest
Mar 19	L18. Fundamentals and Overview of High-Performance Inference Frameworks 📄 Slides Key topics A systems overview of modern inference engines, batching/scheduling, memory management, and throughput/latency tradeoffs.	vLLM overview (blog)
Mar 24	Spring Recess No class	—
Mar 26	Spring Recess No class	—
Mar 31	[Guest Lecture] L19. High-Performance Inference using Dynamo and TRT-LLM 📄 Slides 📄 Slides Key topics Compilation and runtime strategies for production inference, including graph capture, kernel selection, and end-to-end profiling.	—	Guest
Apr 2	L20. High-Performance Inference using vLLM and SGLang 📄 Slides 📄 Slides Key topics Serving engines and control runtimes for LLM applications: throughput optimization, KV cache management, and efficient scheduling. Also: A3 due (Post-Training Assignment) Deadline on April 4th A4 released (Inference and Serving Assignment) Release	—
Part 5: LLM Applications and Use Cases — Building Real-World AI Systems
Apr 7	L21. Fundamentals of Context Engineering 📄 Slides Key topics Designing prompts, retrieval, memory, and tool usage under latency/cost constraints and reliability targets. Also: Midterm check-in report due Milestone	—
Apr 9	[Guest Lecture] L22. Agentic Applications 📄 Slides Key topics Building agentic systems: planning, tool use, orchestration, evaluation, and failure modes.	—	Guest
Apr 14	[Guest Lecture] L23. Safety Guardrails 📄 Slides Key topics Deployment-time safety: guardrails, red teaming, monitoring, and risk mitigation in real systems.	—	Guest
Part 6: Research Greenfields — Emerging Research Directions
Apr 16	L24. Diffusion Language Models 📄 Slides Key topics Alternative generative paradigms beyond autoregressive decoding and how they change training/inference tradeoffs.	—
Apr 21	L25. Harness Engineering Key topics Methods for engineering software harnesses for modern LLMs.	—
Apr 23	L26. Multi-Agent Systems and Architecture Key topics Multi-agent coordination, communication, and architecture patterns for scalable AI systems. Also: A4 due (Inference and Serving Assignment) Deadline	—
Apr 26	Draft project report due Milestone Submit on Gradescope	—	Peer review cycle
Apr 28	Project Presentations (Session A) Milestone Week prior to RRR week	—
Apr 30	Project Presentations (Session B) Milestone Week prior to RRR week	—
May 1	Peer review due Milestone Submit on Gradescope	—
May 5	Poster session (RRR Week) Milestone No formal lecture	—
May 10	Final deliverables due Milestone Code + final report	—
May 15	Impact update Milestone Proof of impact document	—

Grading & Policies

Grading Breakdown

This course has no exams. Evaluation is based on assignments, a semester-long research project, scribing, and participation.

Component	Weight
Research project	50%
Assignments	35%
Scribing (2 lectures per student)	5%
Attendance & participation	10%

Important: Failing any one component fails the class. You must complete every component satisfactorily.

Commitment & Letter Grading

This course requires sustained weekly engagement (group work, compute usage, and in-person checkoffs). Letter grade only (no Satisfactory/Non-Satisfactory).

Assignments

There are four group-based assignments, each spanning roughly 2–3 weeks. Assignments include conceptual questions and hands-on experiments (implementation, profiling, scaling, analysis) with an in-person oral checkoff/presentation per assignment.

All assignment release/due dates are listed in the Consolidated Schedule.

A1: Parallelism Assignment
A2: Pre-Training Assignment
A3: Post-Training Assignment
A4: Inference and Serving Assignment

Late Policy

You have 3 total slack days across the semester for assignments. At most one slack day may be applied to any single assignment (extends deadline by 24 hours). No other late work is accepted without an approved exception.

Enhancements

Your assignments will also be graded on enhancements -- contribute meaningfully by adding questions, or reframing the programmatic part to be more challenging. Each group only needs to enhance one assignment. Out of the 35% of your grade that is reserved for assigments, 10% will be graded based on enhancements, and the other 25% will be graded based on the core assignment.

Research Project

The research project is group-based and open-ended, with an emphasis on producing something useful to the community.

All project milestones and dates are listed in the Consolidated Schedule.

Technical quality (40%): hypothesis clarity, depth, correctness, rigor, and artifact quality (code + report + other deliverables).
Impact (10%): evidence of external usefulness (e.g., PR merged into open source, reproducible benchmark, public report, arXiv preprint).
Peer review: each group provides a structured review of another group’s report.

Scribing

Each student is expected to scribe 2 lectures. Scribe assignments will be posted on Ed. LLM assistance may be used for drafting, but the final scribe must be accurate, edited, and clearly written. If you use external sources or AI tools, cite them clearly.

Attendance & Participation

Attendance is required. Participation includes contributing to in-class discussions and project check-ins, posting/answering on Ed, and providing constructive comments on lecture notes and slides.

Academic Integrity

Collaboration is encouraged within your group. Do not copy code or reports from other groups. If you use external code or AI assistants, cite the source clearly and ensure you understand the work you submit.

Infrastructure Notes

This course uses shared GPU infrastructure. For access/quota/networking issues, post on Ed and include timestamps, job IDs, logs, and a short repro description. Multi-node experiments require coordinating with another team to share nodes for a bounded window of time.

Submission platform (Gradescope): all written and code submissions are via Gradescope (4DDRWY), unless explicitly stated otherwise. Oral presentations/checkoffs require sign-up and in-person attendance.

Course Overview

Resources

Course Team

Instructors

Teaching Staff

NVIDIA infrastructure and technology

Compute Sponsor

Thank you, NVIDIA.

Logistics

Class Logistics

Communication

Compute

Practical Tools

Consolidated Schedule

Grading & Policies

Grading Breakdown

Commitment & Letter Grading

Assignments

Late Policy

Enhancements

Research Project

Scribing

Attendance & Participation

Academic Integrity

Infrastructure Notes