Workshop on Statistical Inference / Learning Dynamics

Description:

The goal of this workshop is to present new advances in the areas of statistical inference and learning. The task of learning in high dimensions, be it a function, a class structure, or a hidden signal seen through noisy observations, is of central modern importance. We will have talks by experts presenting recent theoretical developments in understanding fundamental versions these types of questions, interspersed with open discussion time.

Logistics:

Date: May 20th-21st

Form to register:

https://docs.google.com/forms/d/e/1FAIpQLSfjOiMEB4SISwMVLuCDLo-3ZrPruGbaG4aoDB4jsPMqn_m9Qw/viewform

Panopto:

May 20th

https://northwestern.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=6fd68fc0-070d-4a39-bebe-b176012dd5c5

May 21st

https://northwestern.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=f256a974-4240-4cfd-a24e-b176012e9ec2

Schedule:

The workshop will be held in Mudd 3514, in the Northwestern Computer Science Department.

Monday, May 20th

1:00 pm-1:30pm	Welcome and open discussion
1:30 pm-2:20pm	Murat Erdogdu
2:20pm-2:30pm	Break
2:30pm-3:20pm	Song Mei
3:20pm-4pm	Coffee break
4:00pm-4:50pm	Elizabeth Collins-Woodfin

Tuesday, May 21st

8:30am-9:00am	Breakfast
9:00am-9:50am	Brice Huang
9:50am-10:00am	Break
10:00am-10:50am	Open problem discussion
10:50am-11:10am	Subhabrata Sen

Speakers:

Brice Huang (MIT)
Murat Erdogdu (University of Toronto)
Song Mei (UC Berkeley)
Subhabrata Sen (Harvard)
Elizabeth Collins-Woodfin (McGill)
Theodor Misiakiewicz (TTIC and UC Berkeley)

Titles and Abstracts:

Song Mei

Title: Revisiting neural network approximation theory in the age of generative AI

Abstract: Textbooks on deep learning theory primarily perceive neural networks as universal function approximators. While this classical viewpoint is fundamental, it inadequately explains the impressive capabilities of modern generative AI models such as language models and diffusion models. This talk puts forth a refined perspective: neural networks often serve as algorithm approximators, going beyond mere function approximation. I will explain how this refined perspective offers a deeper insight into the success of modern generative AI models.

Brice Huang

Title: Capacity threshold for the Ising perceptron

Abstract: We show that the capacity of the Ising perceptron is with high probability upper bounded by the constant $\alpha \approx 0.833$ conjectured by Krauth and Mézard, under the condition that an explicit two-variable function $S(\lambda_1,\lambda_2)$ is maximized at (1,0). The earlier work of Ding and Sun proves the matching lower bound subject to a similar numerical condition, and together these results give a conditional proof of the conjecture of Krauth and Mézard.

Theodor Misiakiewicz

Title: On the complexity of differentiable learning

Abstract: In this talk, we are interested in understanding the complexity of learning with gradient descent. We present some early attempts in that direction. We introduce differentiable learning queries (DLQ) as a subclass of statistical query algorithms, and consider learning the orbit of a distribution under a group of symmetry. In this setting, we can derive sharp upper and lower bounds for the query complexity of DLQ in terms of a “leap complexity”. We then illustrate how these results offer some insights on the training dynamics of neural networks.

Elizabeth Collins-woodfin

Title: High-dimensional dynamics of SGD for generalized linear models

Abstract: We analyze the dynamics of streaming stochastic gradient descent (SGD) in the high-dimensional limit when applied to generalized linear models with general data-covariance. We show that, when the number of parameters grows proportionally to the number of data, SGD converges to a deterministic equivalent, characterized by a system of ordinary differential equations. This framework allows us to obtain learning rate thresholds for stability as well as convergence guarantees. In addition to the deterministic equivalent, we introduce an SDE with a simplified diffusion coefficient, which allows us to analyze the dynamics of general statistics of SGD iterates. Finally, we extend this framework to SGD with adaptive learning rates (e.g. AdaGrad-Norm) and analyze the dynamics of these algorithms, including a phase transition in the learning rate for data with power-law covariance.

Subhabrata Sen

Title: A Mean-Field Approach to Empirical Bayes Estimation in High-dimensional Linear Regression

Abstract: We will discuss empirical Bayes estimation in high-dimensional linear regression. To facilitate computationally efficient estimation of the underlying prior, we will adopt a variational empirical Bayes approach, introduced originally in Carbonetto and Stephens (2012) and Kim et al. (2022). We will discuss asymptotic consistency of the nonparametric maximum likelihood estimator (NPMLE) and its (computable) naive mean field variational surrogate under mild assumptions on the design and the prior. Assuming, in addition, that the naive mean field approximation has a dominant optimizer, we will develop a computationally efficient approximation to the oracle posterior distribution, and establish its accuracy under the 1-Wasserstein metric. This enables computationally feasible Bayesian inference; e.g., construction of posterior credible intervals with an average coverage guarantee, Bayes optimal estimation for the regression coefficients, estimation of the proportion of non-nulls, etc.

Based on joint work with Sumit Mukherjee (Columbia) and Bodhisattva Sen (Columbia).

Murat Erdogdu

Title: Feature Learning in Two-layer Neural Networks under Structured Data

Abstract: We study the effect of gradient-based optimization on feature learning in two-layer neural networks. We consider a setting where the number of samples is of the same order as the input dimension and show that, when the input data is isotropic, gradient descent always improves upon the initial random features model in terms of prediction risk, for a certain class of targets. Further leveraging the practical observation that data often contains additional structure, i.e., the input covariance has non-trivial alignment with the target, we prove that the class of learnable targets can be significantly extended, demonstrating a clear separation between kernel methods and two-layer neural networks in this regime. We additionally consider sparse settings and show that pruning methods can lead to optimal sample complexity.

Organizers:

Antonio Auffinger (Northwestern University)
Siddharth Bandari (Toyota Technological Institute at Chicago)
Reza Gheissari (Northwestern University)
Vishesh Jain (University of Illinois Chicago)
Marcus Michelen (University of Illinois Chicago)

Workshop on Statistical Inference / Learning Dynamics

Join Our Newsletter

Success!

Special Program Announcement

Winter/Spring 2025 IDEAL Special Program on Deep Learning and Optimization

Click here to view the exciting series of workshops, courses, seminars and other activities!