STA 250 - Theoretical Foundations of Modern AI, Winter 2024

Instructor: Spencer Frei

Time: 12:10pm-1:50pm Mondays and Wednesdays

Location: Storer 1342

Office hours: 2:30pm-3:30pm Fridays, MSB 4220

Information

This course serves as a survey of topics on the mathematical and statistical foundations of deep learning, the technology that underlies modern artificial intelligence. Topics covered include:

The course will be primarily proof-based but will also involve programming in Python (PyTorch/Jax/TensorFlow). Prerequisites include familiarity with programming in Python, basic machine learning, proof-based linear algebra, and probability theory. The course is ideally suited for motivated PhD students in Statistics, Mathematics, and Computer Science. It will be a fairly intensive course.

Course structure and evaluation

The first ~8 weeks of the course will consist of lectures, in-class quizzes, homeworks, and in-class reading group discussions. The last ~2 weeks of the course will be for presenting final projects.

A more detailed week-by-week schedule can be found at the bottom of this page.

Homework

Before the course begins, students should complete the attached Homework 0; this will not be collected, and no solutions will be provided, but should serve as a self-assessment. If you struggle with completing any of the non-programming questions, then the course will likely be quite difficult without a significant investment on your part.

If you are unfamiliar with software packages like PyTorch/Jax/TensorFlow, I would recommend getting started with PyTorch with the following tutorials.

All of the coding you will need to do for the course (and tutorial) can be done on your own laptop or on Google Colaboratory - a GPU will not be needed, although using the Google Colab GPU may help if you want to do an experiment-focused project.

You are welcome to collaborate with other students on your homework, but you must acknowledge this by writing the names of those with whom you collaborated at the top of your homework assignment. All homeworks must be written individually. You must submit your homeworks as LaTeX'd PDFs on Gradescope (accessible via Canvas).

Each homework will be graded on a random subset of the problems assigned. Homeworks will be graded for clarity of writing in addition to correctness.

In-class quizzes

There will be 6 short in-class quizzes. The worst two quizzes will be dropped, and the remaining 4 quizzes will constitute 10% of the grade. The quizzes will be open book (but no internet). It is highly recommended that you are consistently reviewing the material after it is presented in the lecture.

Reading group discussions

Throughout the course, we will spend 3 or 4 sessions thoroughly reading and discussing an influential paper related to the course material. The format for these discussions will be structured along the lines of Colin Raffel and Alec Jacobson's role-playing student seminars, also used by Aditi Raghunathan. For more details on the reading group, please see this page.

Project

There will be a final project for the course. For more details, see this page.

Class Schedule

References:

[T] - Telgarsky, deep learning theory 2021-10-27 v0.0-e7150f2d pdf

[SSBD] - Shalev Shwartz and Ben-David, Understanding Machine Learning

[AEP1] - Andréasson, Evgrafov, & Patriksson - An Introduction to Continuous Optimization Foundations and Fundamental Algorithms, 1st ed

Lecture Day Topics Lecture notes Additional references HW/Project/Reading Group Other
Jan 8 (M) Overview, approximation theory PDF [T] Ch2
Jan 10 (W) Convex optimization PDF [T] Ch7 Paper discussion groups assigned
Jan 15 (M) MLK Day, no class
Jan 17 (W) Non-convex optimization PDF Karimi et al '16 HW 1 released [PDF]
Jan 22 (M) Constrained optimization
KKT conditions
PDF [AEP1] Ch5,
Lyu-Li'19
Paper discussion report
due 1/23, noon
Jan 24 (W) Paper discussion:
Edge of Stability
Jan 29 (M) Implicit regularization I PDF [T] Ch10
Jan 31 (W) Implicit regularization II PDF [T] Ch10 Project milestone #1 due
Feb 5 (M) Uniform convergence I PDF [SSBD] Ch 26;
[T] Ch13
Feb 7 (W) Uniform convergence II PDF [SSBD] Ch 26;
[T] Ch13
McDiarmid Ineq. proof
Feb 12 (M) Uniform convergence III PDF [SSBD] Ch 26;
[T] Ch13
Project milestone #2 due;
Paper discussion report due 2/13, noon
Feb 14 (W) Paper discussion:
Understanding DL requires
rethinking generalization
Feb 19 (M) Presidents' Day, no class HW 1 due, 11:59pm
Feb 21 (W) Benign overfitting PDF HW2 released [PDF]
Feb 26 (M) No class
Paper discussion report due 2/27, noon
Feb 28 (W) Paper discussion:
Adversarial examples:
Bugs or features?
Mar 4 (M) Benign overfitting II; Transformers I PDF
Mar 6 (W) Transformers II PDF Paper discussion report due 3/10, noon
Mar 11 (M) Paper discussion:
In-context learning
Mar 13 (W) Final projects Project reports due @ noon
Mar 20 (W) No class HW 2 due, 11:59pm

This course is inspired by a number of other courses, including:

Quanquan Gu, UCLA - Foundations of Deep Learning

Tengyu Ma, Stanford - Machine Learning Theory

Aditi Raghunathan, CMU - Theoretical and Empirical Foundations of Modern Machine Learning

Matus Telgarsky, NYU - Deep Learning Theory

Fanny Yang, ETH Zurich - Guarantees in Machine Learning