Workshop on Graph Representation Learning

Logistics

Date: May 31, 2024

Location: University of Illinois Chicago, Academic and Residential Complex (ARC) 241, 940 W. Harrison St.

Parking: For those driving to the workshop, attendees can park in any lot with visitor access. The closest options to the ARC are Lots 1B and the Harrison Street lot. Please refer to the image at the end of the page for marked parking structures.

Parking passes will be provided at the workshop for free parking in designated UIC parking buildings. Please remember to ask for a pass before leaving the workshop.

Registration: Click here to register

Description: Graph-structured data presents unique challenges to learning and inference tasks. In the last decade, deep learning methods have revolutionized this field, starting with deep embeddings of graphs, continuing with the rise of the graph neural network paradigm, and ending with current works on deep generative models of graph-structured data. At the same time, the causal inference community has independently developed their own representations of graph-structured data, capturing phenomena, such as social interactions, spatial dependence, network interference, and peer effects. This workshop will explore topics at the intersection of network analysis, machine learning, and causal inference. By bringing together leading experts and practitioners from these areas, the goal of this workshop is to share the latest advances and understand the potential for integration and cross-disciplinary collaboration.

Keynote Speakers:

Bryan Perozzi (Google)
Michelle Li (Harvard University)
Murat Kocaoglu (Purdue University)
Pantelis Loupos (University of California, Davis)

Organizers:

Eric Auerbach (Northwestern University)
Lorenzo Orecchia (University of Chicago)
Elena Zheleva (University of Illinois, Chicago)

Schedule:

8:30-9:00am	Breakfast
9:00-9:45am	Bryan Perozzi, Google Keynote: Giving a Voice to Your Graph: Data Representations in the LLM Age
9:45-10:30am	Pantelis Loupos, University of California Davis Keynote: Graph Neural Networks for Causal Inference Under Network Confounding
10:30-10:45am	Coffee break
10:45-11:05pm	Claire Donnat, University of Chicago Tuning the Geometry of Graph Neural Networks
11:05-11:25am	Kaize Ding, Northwestern University Data-Effiicent Graph Learning
11:25-11:45pm	Jiawei (Joe) Zhou, TTIC Learning Graph Structures and Dynamics on Networking Data
11:45-12:05pm	Federico Bugni, Northwestern Identification and Inference on Treatment Effects under Covariate-Adaptive Randomization and Imperfect Compliance
12:05-1:30pm	Lunch
1:30-2:15pm	Michelle Li, Harvard University Keynote: Contextual Learning on Graphs for Precision Medicine
2:15-3:00pm	Murat Kocaoglu, Purdue University Keynote: Causal Machine Learning: Fundamentals and Applications
3:00-3:15pm	Coffee break
3:15-3:35pm	Arvind Ramanathan, Argonne National Lab Learning Useful Graph Representations for Drug Discovery
3:35-3:55pm	Lu Cheng, University of Illinois Chicago Conformal Methods for Reliable and Fair Machine Learning
3:55-4:10pm	Poster spotlights
4:10-5:30pm	Poster reception

Abstracts:

Speaker: Bryan Perozzi, Google

Title: Giving a Voice to Your Graph: Data Representations in the LLM Age
Abstract: Graphs are powerful tools for representing complex real-world relationships, essential for tasks like analyzing social networks or identifying financial trends. While large language models (LLMs) have revolutionized natural text reasoning, their application to graphs remains an understudied frontier. To bridge this gap, we need to transform structured graph data into representations LLMs can process. This talk delves into our work on finding the correct graph inductive bias for Graph ML and developing strategies to convert graphs into language-like formats for LLMs. I’ll explore our work on “Talking Like a Graph”, and our parameter-efficient method, GraphToken, which learns an encoding function to extend prompts with explicit structured information.

Speaker: Pantelis Loupos, UC Davis

Title: Graph Neural Networks for Causal Inference Under Network Confounding
Abstract: We study causal inference with observational network data. A challenging aspect of this setting is the possibility of interference in both potential outcomes and selection into treatment, for example due to peer effects in either stage. We therefore consider a nonparametric setup in which both stages are reduced forms of simultaneous-equations models. This results in high-dimensional network confounding, where the network and covariates of all units constitute sources of selection bias. The literature predominantly assumes that confounding can be summarized by a known, low-dimensional function of these objects, and it is unclear what selection models justify common choices of functions. We show that graph neural networks (GNNs) are well suited to adjust for high-dimensional network confounding. We establish a network analog of approximate sparsity under primitive conditions on interference and demonstrate that the model has low-dimensional structure that makes estimation feasible and justifies the use of shallow GNN architectures.

Speaker: Claire Donnat, University of Chicago

Title: Tuning the Geometry of Graph Neural Networks
Abstract: By recursively summing node features over entire neighborhoods, spatial graph convolution operators have been heralded as the key to the success of Graph Neural Networks (GNNs). Yet, despite the multiplication of GNN methods across tasks and applications, the effect of this aggregation operation has yet to be analyzed. In fact, while most recent efforts in the GNN community have focused on optimizing the architecture of the neural network, fewer works have attempted to characterize (a) the different classes of spatial convolution operators, (b) their impact on the geometry of the embedding space, and (c) how the choice of a particular convolution should relate to properties of the data}. In this talk, we propose to begin answering all three questions by dividing existing operators into two main classes (symmetric vs. row-normalized spatial convolutions), and show how these correspond to different implicit biases on the data. Finally, we show that this convolution operator is in fact tunable, and explicit regimes in which certain choices of convolutions — and therefore, embedding geometries — might be more appropriate.

Speaker: Kaize Ding, Northwestern University

Title: Data-Efficient Graph Learning
Abstract: The world around us — and our understanding of it — is rich in relational structure: from atoms and their interactions to objects and entities in our environments. Graphs, with nodes representing entities and edges representing relationships between entities, serve as a common language to model complex, relational, and heterogeneous systems. Despite the success of recent deep graph learning, the efficacy of existing efforts heavily depends on the ideal data quality of the observed graphs and the sufficiency of the supervision signals provided by the human-annotated labels, leading to the fact that those carefully designed models easily fail in resource-constrained scenarios. In this talk, I will present my recent research contributions centered around data-efficient learning for relational and heterogeneous graph-structured data. I will introduce what data-efficient graph learning is and my contributions to different research problems under its umbrella, including graph few-shot learning, graph weakly-supervised learning, and graph self-supervised learning. Based on my recent work, I will elucidate how to push forward the performance boundary of graph learning models especially graph neural networks with low-cost human supervision signals.

Speaker: Jiawei Zhou, TTIC

Title: Learning Graph Structures and Dynamics on Networking Data
Abstract: Modern communication systems rely on complex and dynamic computer networks, where effective management and security are paramount to societal function. These networks are structured as graphs and exhibit rapid traffic changes, presenting unique challenges for machine learning applications on both data formulation and model development. This talk explores early attempts to automate the learning of network structures and dynamics using deep neural networks, focusing on critical areas such as network security and attack detection. We will discuss the utilization of real networking data despite the lack of high-quality annotations to support the learning process, as well as modeling design choices with different focuses. These learning methodologies include the integration of graph neural networks to detect anomalous structures within networks, or the adaptation of successful NLP techniques, specifically Transformer-based self-supervised learning, to capture dynamic traffic features. Our results demonstrate the potential of deep learning in this domain but also highlight significant obstacles in data collection, learning efficiency, and evaluation.

Speaker: Federico Bugni, Northwestern University

Title: Identification and Inference on Treatment Effects under Covariate-Adaptive Randomization and Imperfect Compliance
Abstract: Randomized controlled trials (RCTs) frequently utilize covariate-adaptive randomization (CAR) (e.g., stratified block randomization) and commonly suffer from imperfect compliance. This paper studies the identification and inference for the average treatment effect (ATE) and the average treatment effect on the treated (ATT) in such RCTs with a binary treatment. We first develop characterizations of the identified sets for both estimands. Since data are generally not i.i.d. under CAR, these characterizations do not follow from existing results. We then provide consistent estimators of the identified sets and asymptotically valid confidence intervals for the parameters. Our asymptotic analysis leads to concrete practical recommendations regarding how to estimate the treatment assignment probabilities that enter in estimated bounds. In the case of the ATE, using sample analog assignment frequencies is more efficient than using the true assignment probabilities. On the contrary, using the true assignment probabilities is preferable for the ATT.

Speaker: Michelle Li, Harvard

Title: Contextual Learning on Graphs for Precision Medicine
Abstract: Precision medicine requires reasoning over interconnected data across multiple modalities to tailor medical decisions based on the context of individual patients. Graphs are universal descriptors for systems of interacting elements, and deep learning on biomedical graphs has facilitated advancements in medicine, including accelerated disease gene prioritization and drug target identification. However, existing graph-based models are context-free: unable to adjust their outputs based on the contexts in which they operate. We innovate two fundamental contextual learning algorithms, SHEPHERD and PINNACLE, to tackle medical questions for which patient and cell type contexts, respectively, are important. SHEPHERD addresses the challenge of low sample sizes among rare diseases by infusing patient data with external biomedical knowledge. It considers individual patients as unique subgraphs in a rare disease knowledge graph to learn patient-specific contexts derived from relationships such as genotype-phenotype and disease-gene associations, phenotype ontology, and genetic pathways. SHEPHERD’s contextualized patient representations are optimized for multi-faceted rare disease diagnosis: performing causal gene discovery, retrieving “patients-like-me” with the same causal gene or disease, and providing interpretable characterizations of novel disease presentations. PINNACLE leverages cell-type-specific gene expression as well as cellular and tissue organization to resolve the role of a protein depending on the cell type context. It generates unique protein representations for every cell type context using cell-type-specific protein interaction networks constructed from single-cell transcriptomic atlases, and enforces the global organization of these representations with a metagraph of cell type communication and tissue hierarchy. PINNACLE’s context-aware protein representations enable the analysis of drug effects across cell type contexts and the prediction of therapeutic targets in a cell-type-specific manner. Overall, we demonstrate the potential of contextualized models to empower precision medicine, from rare disease diagnosis to drug discovery.

Speaker: Murat Kocaoglu, Purdue

Title: Causal Machine Learning: Fundamentals and Applications
Abstract: Causal knowledge is central to solving complex decision-making problems in many fields from engineering, medicine to cyber physical systems. Causal inference has also recently been identified as a key capability to remedy some of the issues modern machine learning systems suffer from, from explainability and fairness to generalization. In this talk, we first provide a short introduction to probabilistic causal inference. Next, we discuss how deep neural networks can be used to obtain a representation of the causal system and help solve complex, high-dimensional causal inference problems with deep generative models. We will also discuss some machine learning applications of the proposed algorithms.

Speaker: Arvind Ramanathan, Argonne National Laboratory

Title: Learning Useful Graph Representations for Drug Discovery
Abstract: We discuss the use of graph neural networks (GNNs) in the context of drug discovery workflows. We will share some vignettes of how GNNs can be leveraged to represent large molecular libraries using molecular building blocks (i.e., fragments, scaffold, linkers/decorations), and how GNNs can be used to navigate latent representations of molecular hypergraphs leveraging transformer networks to operate on molecular building blocks to generate new molecules. We will demonstrate that GNNs possess some unique representational advantages for molecular building blocks (compared to other techniques), while allowing intuitive discovery of novel molecules that can result in binding to and inhibiting SARS-CoV-2 viral protein targets. Further, we show that GNNs can be used to accelerate virtual screening protocols by at least an order of magnitude while spanning much larger chemical spaces than currently possible. We also discuss how incorporating human feedback within GNNs can potentially result in novel molecules with desirable functional properties in the context of drug discovery.
Collaboration with: Rick Stevens, Anima Anandkumar, Austin Clyde, Ryien Hosseini, Ashka Shah, Filipo Simini.

Speaker: Lu Cheng, UIC

Title: Conformal Methods for Reliable and Fair Machine Learning
Abstract: Machine learning has made remarkable strides over the past decade. As its applications become more widespread in real-world scenarios, we face the crucial challenge of addressing the biases and opacity of these models to make them fairer and more reliable. In this talk, I will discuss two recent works aimed at developing effective machine learning systems using conformal prediction. This framework is model-agnostic and independent of distribution, providing a solid base for uncertainty estimation. We explore both theoretical and practical aspects to leverage conformal prediction, focusing on two key areas: (1) coverage-based fairness, which guarantees consistent treatment and equivalent coverage across different groups; and (2) graph neural networks (GNNs) tailored for conformalized link prediction, which offer reliable coverage with the benefit of compact prediction intervals.

Parking visual for UIC:

Workshop on Graph Representation Learning

Join Our Newsletter

Success!

Special Program Announcement

Winter/Spring 2025 IDEAL Special Program on Deep Learning and Optimization

Click here to view the exciting series of workshops, courses, seminars and other activities!