Workshop on Privacy and Interpretability in Generative AI: Peering into the Black Box

Logistics

Date: Friday, November 22, 2024

Location: llinois Institute of Technology (IIT) IIT MTCC ballroom (Address: McCormick Tribune Campus Center, 3201 S State St, Chicago, IL 60616)

Registration: https://forms.gle/VNHnw84aBnxaD3kH9

Zoom Link

Description

To forge healthy and productive Human-AI ecosystems, researchers need to anticipate the nature of this interaction at every stage to stave off concerns of societal disruption and to usher in a harmonious future. A primary way in which AI is anticipated to become part of human life is through augmenting human capabilities instead of replacing them. What are the greatest potentials for this augmentation in various fields and what ought to be its limits? In the short term, AI is expected to continue to rely on the vast recorded and demonstrated knowledge and experience of people. How can the contributors of this knowledge feel adequately protected in their rights and compensated for their role in ushering in AI? As these intelligent systems are woven into the lives and livelihood of people, insight into how they operate and what they know becomes crucial to establish trust and regulate them. How can human privacy be maintained in such pervasive ecosystems and is it possible to interpret the operations, thoughts, and actions of AI? IDEAL will address these critical questions in a 3-part workshop as part of its Fall 2024 Special Program on Interpretability, Privacy, and Fairness, which will span 3 days across 3 IDEAL campuses

Friday, November 22, 2024: Privacy and Interpretability in Generative AI: Peering into the Black Box

The rapid advancement of Generative AI and large language models (LLMs), such as GPT-4, has raised critical concerns about privacy and interpretability. These models are trained on vast datasets, which may inadvertently include sensitive or personal information, creating the risk of unintentionally disclosing private data through their outputs. Consequently, privacy-preserving mechanisms have become essential to mitigate these risks. At the same time, the inherent complexity and opacity of LLMs make it difficult to understand their decision-making processes, undermining trust and accountability. Enhancing interpretability is key to ensuring that users and developers can comprehend how these models produce specific outputs, thereby improving transparency and fostering trust. Addressing these challenges is essential for building AI systems that are not only secure but also ethical and comprehensible.

Speakers:

Chirag Agarwal (UVA)
Sam Buchanan (TTIC)
Gregoire Fournier (University of Illinois at Chicago) & Daniel Linna (Northwestern University)
Jinyuan Jia (Pennsylvania State University)
Arvind Ramanathan (ANL)
Zhimei Ren (UPenn)
Filippo Simini (ANL)

Schedule:

8:00 – 8:55 Breakfast

8:55 – 9:00 Opening remarks

9:00 – 9:50 Chirag Agarwal (UVA)

9:50 – 10:40 Arvind Ramanathan (ANL)

10:40 – 11:00 Coffee break

11:00 – 11:50 Zhimei Ren (UPenn)

11:50 – 1:20 Lunch

1:20 – 2:10 Sam Buchanan (TTIC)

2:10 – 3:00 Jinyuan Jia (PSU)

3:00 – 3:20 Coffee break

3:20 – 4:10 Filippo Simini (ANL)

4:10 – 5:00 Daniel Linna (NW) & Gregoire Fournier (UIC)

5:00 – Discussion

Abstracts:

Speaker: Chirag Agarwal (UVA)

Title: The (Un)Reliability of Self-Explanation in Large Language Models

Abstract: Large Language Models (LLMs) have emerged as powerful tools that are effective at various natural language tasks. Recent works show that modern LLMs can generate self-explanations (SEs), which elicit their intermediate reasoning steps for explaining their behavior. Self-explanations have seen widespread adoption owing to their conversational and plausible nature. However, there is little to no understanding of how i) faithful, ii) uncertain, or iii) hallucinated are these explanations. In this talk, we will first analyze the effectiveness of LLMs in explaining other complex predictive models and generate post hoc explanations. Next, we will discuss that while LLMs are adept at generating plausible explanations — seemingly logical and coherent to human users — these explanations do not necessarily align with the reasoning processes of the LLMs and are unfaithful and, often, hallucinated, raising concerns about their reliability. Finally, we highlight that the current trend toward increasing the plausibility of SEs, primarily driven by the demand for user-friendly interfaces, may come at the cost of diminishing their faithfulness. We call upon the community to develop novel methods to enhance different trustworthy properties of SEs and better aid the LLM in reasoning, thereby enabling transparent deployment of LLMs in diverse high-stakes settings.

Speaker: Sam Buchanan (TTIC)

Title: White-Box Transformers via Sparse Rate Reduction

Abstract: In this talk, we contend that a natural objective of representation learning is to compress and transform the distribution of the data, say sets of tokens, towards a low-dimensional Gaussian mixture supported on incoherent subspaces. The goodness of such a representation can be evaluated by a principled measure, called sparse rate reduction, that simultaneously maximizes the intrinsic information gain and extrinsic sparsity of the learned representation. From this perspective, popular deep network architectures, including transformers, can be viewed as realizing iterative schemes to optimize this measure. Particularly, we derive a transformer block from alternating optimization on parts of this objective: the multi-head self-attention operator compresses the representation by implementing an approximate gradient descent step on the coding rate of the features, and the subsequent multi-layer perceptron sparsifies the features. This leads to a family of white-box transformer-like deep network architectures, which we call CRATE, which are mathematically fully interpretable. Experiments show that these networks, despite their simplicity, indeed learn to compress and sparsify representations of large-scale real-world image and text datasets, and achieve performance close to highly engineered transformer-based models, including ViT and GPT2.

Speaker: Gregoire Fournier (University of Illinois at Chicago) & Daniel Linna (Northwestern University)

Title: On Legal Applications of LLMs, the example of Structured Legal Argumentation in Landlord-Tenant Law

Abstract: LLMs have been explored for their capabilities in law such as extracting structured representations from legal texts, predicting rhetorical roles in legal cases, assisting
in thematic legal analysis. Methods of prompt engineering and contextual provision
for the practical application of LLMs have been explored in the context of insolvency law, tax
law, analyzing court opinions for the interpretation of legal concepts, and provide legal information.
We focus on the particular application of generating structured legal arguments with LLMs.

Speaker: Jinyuan Jia (Pennsylvania State University)

Title: On the Security Risks of LLM Systems

Abstract: Large language models (LLMs) have achieved remarkable success due to their exceptional generative capabilities. Despite their success, they also face security and safety challenges when deployed in many real-world applications. In this talk, I will discuss potential attacks to LLM systems such as prompt injection attacks and knowledge corruption attacks, where an attacker can inject malicious instructions/texts to induce an LLM to generate attacker-desired outputs. For instance, in retrieval-augmented generation (RAG) systems, knowledge databases can provide up-to-date and domain-specific knowledge that enhances the output of an LLM. However, knowledge databases also introduce a new attack surface. In particular, we show an attacker can inject malicious texts into the knowledge database of a RAG system to make an LLM generate attacker-chosen answers for attacker-chosen questions. Our results show a few texts can make the attack successful for a knowledge database with millions of texts. Finally, I will discuss potential strategies and challenges in defending against attacks.

Speaker: Arvind Ramanathan (ANL)

Title: Building interpretable AI systems for understanding complex biology

Abstract: We will describe some of our ongoing research in the context of building artificial intelligence (AI) systems that can be used to probe complex biological phenomena. Our recent work has focused on building frontier AI models for genomic data to understand and model viral evolution. However interpreting the outputs from such models in the context of mapping evolutionary trajectories can be particularly hard. We discuss how augmenting such models with basic knowledge about phylogenetic analyses can enable building interpretable models that can not only reason about viral evolution, but also provide insights into what variants of viruses one may need to track (as part of forecasting). We will illustrate this in the context of studying evolution of the SARS-CoV-2 virus, but will also showcase its generalizability in the context of protein design applications as well.

Speaker: Zhimei Ren (UPenn)

Title: Conformal Alignment: Knowing When to Trust Foundation Models with Guarantees

Abstract: Before deploying outputs from foundation models in high-stakes tasks, it is imperative to ensure that they align with human values. For instance, in radiology report generation, reports generated by a vision-language model must align with human evaluations before their use in medical decision-making. This paper presents Conformal Alignment, a general framework for identifying units whose outputs meet a user-specified alignment criterion. It is guaranteed that on average, a prescribed fraction of selected units indeed meet the alignment criterion, regardless of the foundation model or the data distribution. Given any pre-trained model and new units with model-generated outputs, Conformal Alignment leverages a set of reference data with ground-truth alignment status to train an alignment predictor. It then selects new units whose predicted alignment scores surpass a data-dependent threshold, certifying their corresponding outputs as trustworthy. Through applications to question answering and radiology report generation, we demonstrate that our method is able to accurately identify units with trustworthy outputs via lightweight training over a moderate amount of reference data. En route, we investigate the informativeness of various features in alignment prediction and combine them with standard models to construct the alignment predictor.

Speaker: Filippo Simini (ANL)

Title: Privacy-preserving generative AI for dynamic graphs

Abstract: Dynamic graphs, also known as temporal graphs, are graph structures where nodes, edges, and their associated attributes can change over time. They are particularly valuable for modeling dynamic, non-structured data, which is prevalent in various fields such as chemistry, biology, social networks, financial systems, and transportation networks. These graphs can capture the evolving relationships, interactions, and patterns within complex systems. We present a modeling framework for generating synthetic dynamic graphs that accurately reproduce the essential statistical properties of real-world graphs while preserving sensitive information, such that no synthetic entity (node or edge) can be uniquely identified with a real entity. We specifically focus on developing a comprehensive set of evaluation metrics to assess the quality of the synthetic graphs and to preserve identifiable and proprietary information of the real data. We conclude by discussing challenges and potential avenues for future research in this field.

Organizers:

Organizers: Bingui Wang (IIT), Ren Wang (IIT), and Gyorgy Turan (UIC)

Workshop on Privacy and Interpretability in Generative AI: Peering into the Black Box

Join Our Newsletter

Success!

Special Program Announcement

Winter/Spring 2025 IDEAL Special Program on Deep Learning and Optimization

Click here to view the exciting series of workshops, courses, seminars and other activities!