Scalable AI

Bridging Theory, Understanding, and Practice

Spring 2026 | UC Berkeley

Large-Scale AI as an End-to-End Engineering Discipline

Announcements: Course information will be updated regularly. Check back for enrollment details, assignment deadlines, and lecture materials.

Course Overview

Central inquiry: How do we build, train, and deploy large-scale AI systems by treating them as full-stack engineered artifacts, where hardware constraints, software stacks, and optimization dynamics jointly determine model behavior and performance?

This course examines the principles required to build, train, and deploy large-scale AI models. We treat large-scale AI as an end-to-end engineering discipline, where a model is a computational graph that must be trained, specialized, evaluated, deployed, monitored, and iterated onβ€”under hard constraints from hardware, data, and serving economics.

The course follows the lifecycle end to end:

Architecture β†’ Pre-training β†’ Post-training β†’ Efficient inference β†’ Applications β†’ Research greenfields

Resources

Syllabus (PDF)
Course policies, project expectations, assignment workflow, and course structure.
Tip: If your browser opens the PDF instead of downloading it, click β€œDownload” in the PDF viewer.
Research Directions (PDF)
Suggested research directions and project ideas for the course.
Lecture Videos
Lecture videos will be posted here after each lecture. Berkeley email necessary to access.

Course Team

Logistics

Class Logistics

Course Number
EE 290 / 194
Instructors
Teaching and Support Staff
Haocheng Xi; Paul Zhou; Venkat Srinivasan (NVIDIA infrastructure and technology)
Full staff cards + photos in Course Team.
Lecture Time
Tuesdays & Thursdays, 9:30 AM – 11:00 AM
Location
521 Cory Hall
Office Hours
  • Instructor office hours: Tues & Thurs 11 - 12PM, Cory 258
  • Haocheng's TA Hours: Wed 4:30 - 5:30 (Soda 373)
  • Paul's TA Hours: Thurs 1:30 - 2:30 (BWW 1207)
  • Venkat's office hour (remote, link on Ed): Fri 3-4pm
Assignment Checkoffs
Checkoff slots by sign-up
Resources
Gradescope entry code: 4DDRWY

Communication

Ed only. Please post all course questions on Ed (https://edstem.org/us/courses/94200/). We do not use Slack or other channels for course Q&A.

If you have not been automatically added, please use this link to add yourself with code BCQ8Mj. Once we finalize the enrollment, we will purge all non-students from the course forum to encourage free discussion among students.

Compute

Groups of 4–6. Each group receives 1 H100 node (8Γ—H100) for the semester (8 total nodes on GCP). Each group is assigned a single static external IP address for SSH access.

Course questionnaire (Week 1): a short onboarding questionnaire was released in the first lecture (Due Friday Jan 23rd), and enrollment/compute allocations will be finalized by the end of the second week.

Practical Tools

Students will gain hands-on experience with industry-standard tools and frameworks throughout the course:

Consolidated Schedule

Everything in one place: lecture plan, dated timeline, deadlines/milestones, and recommended readings. Dates below are for Spring 2026. Lecture plan subject to change.

Date Lecture / Events Recommended Readings Notes
Part 1: Architecture β€” Defining the Computational Graph and Scaling Strategies
Jan 20
L1. Course Overview and the Modern AI Stack
Key topics
  • Course goals, structure, and expectations
  • The full-stack view: hardware ↔ software ↔ optimization
  • Lifecycle map: architecture β†’ training β†’ post-training β†’ inference β†’ applications
Also:
  • Questionnaire released Milestone
Jan 22
L2. All About Performance
Key topics
  • Matrix multiplication as the basis of deep learning
  • The Roofline model and memory bandwidth bottlenecks
  • Component-level cost analysis (compute vs. memory vs. communication)
  • The economics of tokens: training vs. serving tradeoffs
Jan 27
L2. All About Performance Lecture Extension
Key topics
  • Matrix multiplication as the basis of deep learning
  • The Roofline model and memory bandwidth bottlenecks
  • Component-level cost analysis (compute vs. memory vs. communication)
  • The economics of tokens: training vs. serving tradeoffs
Also:
  • Research directions released Release
  • A1 released (Parallelism Assignment) Release
Jan 29
L3. Architectures to Break Bottlenecks
Key topics
  • Parallelism mechanics (e.g., multi-dimensional parallelism)
  • Interconnect topology and communication costs
  • Automated orchestration and practical scaling patterns
Also:
  • Group formation due; questionnaire decisions finalized Milestone
β€”
Jan 30
[Special Lecture]
L4. Parallelism Strategies
Guest: Venkat Srinivasan
β€” Guest
Feb 3
[Guest Lecture]
L5. NeMo AutoModel
Key topics

Practical instruction for performance profiling, arithmetic intensity, and optimization workflows.

Guest
Part 2: Pre-Training of Language Models β€” The Engineering of β€œBase Models”
Feb 5
L6. Introduction to Pre-Training
Key topics
  • Tokenization fundamentals
  • The causal language modeling (CLM) task
  • Pre-training fundamentals and scaling considerations
  • How to evaluate pre-trained models
  • Data curation: deduplication, quality filtering, and mixture design
β€”
Feb 10
[Guest Lecture]
L7. Powering Pre-Training: NeMo Curator
Key topics

Hands-on curating work with NeMo Curator: building usable pre-training datasets.

β€” Guest
Feb 12
[Guest Lecture]
L8. Case Study: The Pre-Training of Nano-V3
Guest: Roger Waleffe (NVIDIA)
Key topics

A deep dive into pre-training data, evaluation, ablations, and the final β€œhero run” design choices.

Also:
  • Hypothesis statement v1 due Milestone
β€” Guest
Feb 17
L9. Optimizer Fundamentals
Key topics
  • SGD-family methods and adaptive optimizers (Adam/AdamW)
  • Learning rate schedules, warmup, and stability
  • Gradient clipping, normalization, and numerics
  • Batch size, effective step size, and scaling behavior
Also:
  • A1 due (Feb 16) Deadline
  • A2 released (Pre-Training Assignment) Release
β€”
Feb 19
[Guest Lecture]
L10. Looking To The Future: Emerging Optimizers
Key topics

Moving beyond AdamW to emerging approaches (e.g., Muon, SOAP), and how to reason about their tradeoffs at scale.

β€” Guest
Feb 24
[Guest Lecture]
L11. Looking To The Future: Synthetic Data Powering Pre-Training
Key topics

Patterns and tooling for scalable synthetic data generation pipelines that can power pre-training, including quality controls and reproducibility.

β€” Guest
Part 3: Post-Training of Language Models β€” Specializing Models Through SFT and RL
Feb 26
L12. Intro To the LLM Post-Training Lifecycle and Evaluation
Key topics
  • Differences between pre-training and post-training data
  • Chat templating and training fundamentals
  • Post-training benchmarks and evaluation practices
Also:
  • Project proposal v2 due Milestone
β€”
Mar 3
L13. The Data Powering Post-Training: SFT Data Engineering and RL Environments
Key topics
  • Synthetic data generation pipelines
  • Rejection sampling and quality scoring
  • Skills mixtures and data balancing
  • RL environment construction and reward design basics
β€”
Mar 5
[Guest Lecture]
L14. NeMo Data Designer Deep Dive
Key topics

A practical walkthrough of NeMo Data Designer for building and iterating on high-quality post-training datasets.

β€” Guest
Mar 10
L15. Reinforcement Learning Algorithms Powering Near-Frontier Post-Training
Key topics

RL Algorithms and their application to post-training.

β€”
Mar 12
[Guest Lecture]
L16. Case Study: Post-Training of Nemotron-NanoV3 and NeMoRL
Key topics

A detailed case study of post-training decisions, data strategy, and evaluation tradeoffs in practice.

Also:
β€” Guest
Part 4: Efficient Inference β€” Deployment Preparation and High-Performance Frameworks
Mar 17
[Guest Lecture]
L17. Deployment Preparation: Speculative Decoding, Quantization, Pruning, and NAS
Key topics
  • Speculative decoding: concepts and systems implications
  • Advanced data types and quantization paradigms
  • Calibration strategies and accuracy tradeoffs
  • Pruning and neural architecture search (NAS) for inference efficiency
β€” Guest
Mar 19
L18. Fundamentals and Overview of High-Performance Inference Frameworks
Key topics

A systems overview of modern inference engines, batching/scheduling, memory management, and throughput/latency tradeoffs.

Mar 24
Spring Recess No class
β€”
Mar 26
Spring Recess No class
β€”
Mar 31
[Guest Lecture]
L19. High-Performance Inference using Dynamo and TRT-LLM
Key topics

Compilation and runtime strategies for production inference, including graph capture, kernel selection, and end-to-end profiling.

β€” Guest
Apr 2
L20. High-Performance Inference using vLLM and SGLang
Key topics

Serving engines and control runtimes for LLM applications: throughput optimization, KV cache management, and efficient scheduling.

Also:
  • A3 due (Post-Training Assignment) Deadline on April 4th
  • A4 released (Inference and Serving Assignment) Release
β€”
Part 5: LLM Applications and Use Cases β€” Building Real-World AI Systems
Apr 7
L21. Fundamentals of Context Engineering
Key topics

Designing prompts, retrieval, memory, and tool usage under latency/cost constraints and reliability targets.

Also:
  • Midterm check-in report due Milestone
β€”
Apr 9
[Guest Lecture]
L22. Agentic Applications
Key topics

Building agentic systems: planning, tool use, orchestration, evaluation, and failure modes.

β€” Guest
Apr 14
[Guest Lecture]
L23. Safety Guardrails
Key topics

Deployment-time safety: guardrails, red teaming, monitoring, and risk mitigation in real systems.

β€” Guest
Part 6: Research Greenfields β€” Emerging Research Directions
Apr 16
L24. Diffusion Language Models
Key topics

Alternative generative paradigms beyond autoregressive decoding and how they change training/inference tradeoffs.

β€”
Apr 21
L25. Harness Engineering
Key topics

Methods for engineering software harnesses for modern LLMs.

β€”
Apr 23
L26. Multi-Agent Systems and Architecture
Key topics

Multi-agent coordination, communication, and architecture patterns for scalable AI systems.

Also:
  • A4 due (Inference and Serving Assignment) Deadline
β€”
Apr 26
Draft project report due Milestone
Submit on Gradescope
β€” Peer review cycle
Apr 28
Project Presentations (Session A) Milestone
Week prior to RRR week
β€”
Apr 30
Project Presentations (Session B) Milestone
Week prior to RRR week
β€”
May 1
Peer review due Milestone
Submit on Gradescope
β€”
May 5
Poster session (RRR Week) Milestone
No formal lecture
β€”
May 10
Final deliverables due Milestone
Code + final report
β€”
May 15
Impact update Milestone
Proof of impact document
β€”

Grading & Policies

Grading Breakdown

This course has no exams. Evaluation is based on assignments, a semester-long research project, scribing, and participation.

Component Weight
Research project 50%
Assignments 35%
Scribing (2 lectures per student) 5%
Attendance & participation 10%
Important: Failing any one component fails the class. You must complete every component satisfactorily.

Commitment & Letter Grading

This course requires sustained weekly engagement (group work, compute usage, and in-person checkoffs). Letter grade only (no Satisfactory/Non-Satisfactory).

Assignments

There are four group-based assignments, each spanning roughly 2–3 weeks. Assignments include conceptual questions and hands-on experiments (implementation, profiling, scaling, analysis) with an in-person oral checkoff/presentation per assignment.

All assignment release/due dates are listed in the Consolidated Schedule.

  • A1: Parallelism Assignment
  • A2: Pre-Training Assignment
  • A3: Post-Training Assignment
  • A4: Inference and Serving Assignment

Late Policy

You have 3 total slack days across the semester for assignments. At most one slack day may be applied to any single assignment (extends deadline by 24 hours). No other late work is accepted without an approved exception.

Enhancements

Your assignments will also be graded on enhancements -- contribute meaningfully by adding questions, or reframing the programmatic part to be more challenging. Each group only needs to enhance one assignment. Out of the 35% of your grade that is reserved for assigments, 10% will be graded based on enhancements, and the other 25% will be graded based on the core assignment.

Research Project

The research project is group-based and open-ended, with an emphasis on producing something useful to the community.

All project milestones and dates are listed in the Consolidated Schedule.

  • Technical quality (40%): hypothesis clarity, depth, correctness, rigor, and artifact quality (code + report + other deliverables).
  • Impact (10%): evidence of external usefulness (e.g., PR merged into open source, reproducible benchmark, public report, arXiv preprint).
  • Peer review: each group provides a structured review of another group’s report.

Scribing

Each student is expected to scribe 2 lectures. Scribe assignments will be posted on Ed. LLM assistance may be used for drafting, but the final scribe must be accurate, edited, and clearly written. If you use external sources or AI tools, cite them clearly.

Attendance & Participation

Attendance is required. Participation includes contributing to in-class discussions and project check-ins, posting/answering on Ed, and providing constructive comments on lecture notes and slides.

Academic Integrity

Collaboration is encouraged within your group. Do not copy code or reports from other groups. If you use external code or AI assistants, cite the source clearly and ensure you understand the work you submit.

Infrastructure Notes

This course uses shared GPU infrastructure. For access/quota/networking issues, post on Ed and include timestamps, job IDs, logs, and a short repro description. Multi-node experiments require coordinating with another team to share nodes for a bounded window of time.

Submission platform (Gradescope): all written and code submissions are via Gradescope (4DDRWY), unless explicitly stated otherwise. Oral presentations/checkoffs require sign-up and in-person attendance.