Instructor: Spencer Frei
Time: 12:10pm-1:50pm Mondays and Wednesdays
Location: Storer 1342
Office hours: 2:30pm-3:30pm Fridays, MSB 4220
This course serves as a survey of topics on the mathematical and statistical foundations of deep learning, the technology that underlies modern artificial intelligence. Topics covered include:
The course will be primarily proof-based but will also involve programming in Python (PyTorch/Jax/TensorFlow). Prerequisites include familiarity with programming in Python, basic machine learning, proof-based linear algebra, and probability theory. The course is ideally suited for motivated PhD students in Statistics, Mathematics, and Computer Science. It will be a fairly intensive course.
The first ~8 weeks of the course will consist of lectures, in-class quizzes, homeworks, and in-class reading group discussions. The last ~2 weeks of the course will be for presenting final projects.
A more detailed week-by-week schedule can be found at the bottom of this page.
Before the course begins, students should complete the attached Homework 0; this will not be collected, and no solutions will be provided, but should serve as a self-assessment. If you struggle with completing any of the non-programming questions, then the course will likely be quite difficult without a significant investment on your part.
If you are unfamiliar with software packages like PyTorch/Jax/TensorFlow, I would recommend getting started with PyTorch with the following tutorials.
All of the coding you will need to do for the course (and tutorial) can be done on your own laptop or on Google Colaboratory - a GPU will not be needed, although using the Google Colab GPU may help if you want to do an experiment-focused project.
You are welcome to collaborate with other students on your homework, but you must acknowledge this by writing the names of those with whom you collaborated at the top of your homework assignment. All homeworks must be written individually. You must submit your homeworks as LaTeX'd PDFs on Gradescope (accessible via Canvas).
Each homework will be graded on a random subset of the problems assigned. Homeworks will be graded for clarity of writing in addition to correctness.
There will be 6 short in-class quizzes. The worst two quizzes will be dropped, and the remaining 4 quizzes will constitute 10% of the grade. The quizzes will be open book (but no internet). It is highly recommended that you are consistently reviewing the material after it is presented in the lecture.
Throughout the course, we will spend 3 or 4 sessions thoroughly reading and discussing an influential paper related to the course material. The format for these discussions will be structured along the lines of Colin Raffel and Alec Jacobson's role-playing student seminars, also used by Aditi Raghunathan. For more details on the reading group, please see this page.
There will be a final project for the course. For more details, see this page.
References:
[T] - Telgarsky, deep learning theory 2021-10-27 v0.0-e7150f2d pdf
[SSBD] - Shalev Shwartz and Ben-David, Understanding Machine Learning
[AEP1] - Andréasson, Evgrafov, & Patriksson - An Introduction to Continuous Optimization Foundations and Fundamental Algorithms, 1st ed
Lecture Day | Topics | Lecture notes | Additional references | HW/Project/Reading Group | Other |
---|---|---|---|---|---|
Jan 8 (M) | Overview, approximation theory | [T] Ch2 | |||
Jan 10 (W) | Convex optimization | [T] Ch7 | Paper discussion groups assigned | ||
Jan 15 (M) | MLK Day, no class | ||||
Jan 17 (W) | Non-convex optimization | Karimi et al '16 | HW 1 released [PDF] |
||
Jan 22 (M) | Constrained optimization KKT conditions |
[AEP1] Ch5, Lyu-Li'19 |
Paper discussion report due 1/23, noon |
||
Jan 24 (W) | Paper discussion: Edge of Stability |
||||
Jan 29 (M) | Implicit regularization I | [T] Ch10 | |||
Jan 31 (W) | Implicit regularization II | [T] Ch10 | Project milestone #1 due | ||
Feb 5 (M) | Uniform convergence I | [SSBD] Ch 26; [T] Ch13 |
|||
Feb 7 (W) | Uniform convergence II | [SSBD] Ch 26; [T] Ch13 McDiarmid Ineq. proof |
|||
Feb 12 (M) | Uniform convergence III | [SSBD] Ch 26; [T] Ch13 |
Project milestone #2 due; Paper discussion report due 2/13, noon |
||
Feb 14 (W) | Paper discussion: Understanding DL requires rethinking generalization |
||||
Feb 19 (M) | Presidents' Day, no class | HW 1 due, 11:59pm | |||
Feb 21 (W) | Benign overfitting | HW2 released [PDF] | |||
Feb 26 (M) | No class | Paper discussion report due 2/27, noon |
|||
Feb 28 (W) | Paper discussion: Adversarial examples: Bugs or features? |
||||
Mar 4 (M) | Benign overfitting II; Transformers I | ||||
Mar 6 (W) | Transformers II | Paper discussion report due 3/10, noon | |||
Mar 11 (M) | Paper discussion: In-context learning |
||||
Mar 13 (W) | Final projects | Project reports due @ noon | |||
Mar 20 (W) | No class | HW 2 due, 11:59pm |
This course is inspired by a number of other courses, including:
Quanquan Gu, UCLA - Foundations of Deep Learning
Tengyu Ma, Stanford - Machine Learning Theory
Aditi Raghunathan, CMU - Theoretical and Empirical Foundations of Modern Machine Learning
Matus Telgarsky, NYU - Deep Learning Theory
Fanny Yang, ETH Zurich - Guarantees in Machine Learning