Description:
The goal of this workshop is to present new advances in the areas of statistical inference and learning. The task of learning in high dimensions, be it a function, a class structure, or a hidden signal seen through noisy observations, is of central modern importance. We will have talks by experts presenting recent theoretical developments in understanding fundamental versions these types of questions, interspersed with open discussion time.
Logistics:
- Date: May 20th-21st
Schedule:
The workshop will be held in Mudd 3514, in the Northwestern Computer Science Department.
Monday, May 20th
1:00 pm-1:30pm | Welcome and open discussion |
1:30 pm-2:20pm | Murat Erdogdu |
2:20pm-2:30pm | Break |
2:30pm-3:20pm | Song Mei |
3:20pm-4pm | Coffee break |
4:00pm-4:50pm | Elizabeth Collins-Woodfin |
Tuesday, May 21st
8:30am-9:00am | Breakfast |
9:00am-9:50am | Brice Huang |
9:50am-10:00am | Break |
10:00am-10:50am | Open problem discussion |
10:50am-11:10am | Subhabrata Sen |
Speakers:
- Brice Huang (MIT)
- Murat Erdogdu (University of Toronto)
- Song Mei (UC Berkeley)
- Subhabrata Sen (Harvard)
- Elizabeth Collins-Woodfin (McGill)
- Theodor Misiakiewicz (TTIC and UC Berkeley)
Titles and Abstracts:
Song Mei
Title: Revisiting neural network approximation theory in the age of generative AI
Abstract: Textbooks on deep learning theory primarily perceive neural networks as universal function approximators. While this classical viewpoint is fundamental, it inadequately explains the impressive capabilities of modern generative AI models such as language models and diffusion models. This talk puts forth a refined perspective: neural networks often serve as algorithm approximators, going beyond mere function approximation. I will explain how this refined perspective offers a deeper insight into the success of modern generative AI models.
Brice Huang
Title: Capacity threshold for the Ising perceptron
Abstract: We show that the capacity of the Ising perceptron is with high probability upper bounded by the constant $\alpha \approx 0.833$ conjectured by Krauth and Mézard, under the condition that an explicit two-variable function $S(\lambda_1,\lambda_2)$ is maximized at (1,0). The earlier work of Ding and Sun proves the matching lower bound subject to a similar numerical condition, and together these results give a conditional proof of the conjecture of Krauth and Mézard.
Theodor Misiakiewicz
Title: On the complexity of differentiable learning
Abstract: In this talk, we are interested in understanding the complexity of learning with gradient descent. We present some early attempts in that direction. We introduce differentiable learning queries (DLQ) as a subclass of statistical query algorithms, and consider learning the orbit of a distribution under a group of symmetry. In this setting, we can derive sharp upper and lower bounds for the query complexity of DLQ in terms of a “leap complexity”. We then illustrate how these results offer some insights on the training dynamics of neural networks.
Elizabeth Collins-woodfin
Title: High-dimensional dynamics of SGD for generalized linear models
Abstract: We analyze the dynamics of streaming stochastic gradient descent (SGD) in the high-dimensional limit when applied to generalized linear models with general data-covariance. We show that, when the number of parameters grows proportionally to the number of data, SGD converges to a deterministic equivalent, characterized by a system of ordinary differential equations. This framework allows us to obtain learning rate thresholds for stability as well as convergence guarantees. In addition to the deterministic equivalent, we introduce an SDE with a simplified diffusion coefficient, which allows us to analyze the dynamics of general statistics of SGD iterates. Finally, we extend this framework to SGD with adaptive learning rates (e.g. AdaGrad-Norm) and analyze the dynamics of these algorithms, including a phase transition in the learning rate for data with power-law covariance.
Subhabrata Sen
Title: A Mean-Field Approach to Empirical Bayes Estimation in High-dimensional Linear Regression
Abstract: We will discuss empirical Bayes estimation in high-dimensional linear regression. To facilitate computationally efficient estimation of the underlying prior, we will adopt a variational empirical Bayes approach, introduced originally in Carbonetto and Stephens (2012) and Kim et al. (2022). We will discuss asymptotic consistency of the nonparametric maximum likelihood estimator (NPMLE) and its (computable) naive mean field variational surrogate under mild assumptions on the design and the prior. Assuming, in addition, that the naive mean field approximation has a dominant optimizer, we will develop a computationally efficient approximation to the oracle posterior distribution, and establish its accuracy under the 1-Wasserstein metric. This enables computationally feasible Bayesian inference; e.g., construction of posterior credible intervals with an average coverage guarantee, Bayes optimal estimation for the regression coefficients, estimation of the proportion of non-nulls, etc.
Based on joint work with Sumit Mukherjee (Columbia) and Bodhisattva Sen (Columbia).
Murat Erdogdu
Title: Feature Learning in Two-layer Neural Networks under Structured Data
Abstract: We study the effect of gradient-based optimization on feature learning in two-layer neural networks. We consider a setting where the number of samples is of the same order as the input dimension and show that, when the input data is isotropic, gradient descent always improves upon the initial random features model in terms of prediction risk, for a certain class of targets. Further leveraging the practical observation that data often contains additional structure, i.e., the input covariance has non-trivial alignment with the target, we prove that the class of learnable targets can be significantly extended, demonstrating a clear separation between kernel methods and two-layer neural networks in this regime. We additionally consider sparse settings and show that pruning methods can lead to optimal sample complexity.
Organizers:
- Antonio Auffinger (Northwestern University)
- Siddharth Bandari (Toyota Technological Institute at Chicago)
- Reza Gheissari (Northwestern University)
- Vishesh Jain (University of Illinois Chicago)
- Marcus Michelen (University of Illinois Chicago)