Logistics
Date: Friday, September 27, 2024
Location: Northwestern University, 3rd floor Mudd library (Room: 3514) 2233 Tech Dr, Evanston, IL 60208.
Parking: For those driving to the workshop, attendees can park in the North Campus garage 2311 N Campus Dr #2300, Evanston, IL 60208. https://maps.northwestern.edu/
Parking passes will be provided at the workshop for free parking in designated NU parking building. Please remember to ask for a pass before leaving the workshop.
Registration: https://forms.gle/VJTootmmG5bUHxxw6
Zoom Link: Join Here
YouTube: https://youtu.be/pJATGq51-sQ
Description:
The aim of this workshop is to explore theoretical foundations of optimally combining human and statistical judgments. Complementarity, referring to the superior performance of a human paired with a statistical model over either alone, is a goal when deploying predictive models to support decision-making in high-stakes domains like medicine or criminal justice. However, considerable empirical evidence suggests that complementarity is difficult to design for and achieve in practice, even when experts are assumed to have access to information that a model may not. This workshop considers how to rigorously define, design for, and evaluate human-AI complementarity
8:30-9:00: Breakfast
9:00-9:05: Opening Remarks
9:05-9:45: Hussein Mozannar (Microsoft Research): Who Should Predict? Conditional Delegation in Human-AI Teams
9:45-9:50: Hussein Mozannar Q/A
9:50-10:30: Jann Spiess (Stanford University): Algorithmic Assistance with Recommendation-Dependent Preferences
10:30-10:35: Jann Spiess Q/A
10:35-11:05: Coffee Break
11:05-11:45: Ming Yin (Purdue University): Modeling Interaction Dynamics to Promote Human-AI Complementarity in Decision Making
11:45-11:50: Ming Yin Q/A
11:50-12:30: Keyon Vafa (Harvard University): Do Large Language Models Perform the Way People Expect?
12:30-12:35: Keyon Vafa Q/A
12:35-1:30: Lunch
2:00-4:00: Student meetings with speakers
Organizers:
- Jessica Hullman (Northwestern University)
- Jason Hartline (Northwestern University)
Abstracts:
Speaker: Hussein Mozannar (MIT)
Title: Who Should Predict? Conditional Delegation in Human-AI Teams
Abstract:
AI systems are augmenting humans’ capabilities in settings such as healthcare and programming, leading to the formation of human-AI teams. A fundamental aspect of an effective human-AI team is the ability to delegate tasks strategically. Delegation enables each part of the team to focus on parts of the task they excel at and thus enables complementarity. In this talk, we will first discuss how to design AI models that have the ability to delegate tasks to humans. We will then study the mirror-setting, and try to understand how humans decide to delegate tasks to their AI counterparts with the aim of helping humans make better delegation decisions. We will study both settings when considering tasks that require a single action (classification) or multiple actions (programming, web-browsing).
Bio: Hussein Mozannar is a Senior Researcher at Microsoft Research AI Frontiers. He obtained his PhD from MIT in Social & Engineering Systems in 2024. His research focuses on augmenting humans with AI to help them complete tasks more efficiently. Specifically, he focuses on building AI models that complement human expertise and designing interaction schemes to facilitate human-AI interaction. Applications of his research include programming and healthcare.
Speaker: Jann Spiess (Stanford)
Title: Algorithmic Assistance with Recommendation-Dependent Preferences
Abstract:
When an algorithm provides risk assessments, we typically think of them as helpful inputs to human decisions, such as when risk scores are presented to judges or doctors. However, a decision-maker may not only react to the information provided by the algorithm. The decision-maker may also view the algorithmic recommendation as a default action, making it costly for them to deviate, such as when a judge is reluctant to overrule a high-risk assessment for a defendant or a doctor fears the consequences of deviating from recommended procedures. To address such unintended consequences of algorithmic assistance, we propose a principal-agent model of joint human-machine decision-making. Within this model, we consider the effect and design of algorithmic recommendations when they affect choices not just by shifting beliefs, but also by altering preferences. We motivate this assumption from institutional factors, such as a desire to avoid audits, as well as from well-established models in behavioral science that predict loss aversion relative to a reference point, which here is set by the algorithm. We show that recommendation-dependent preferences create inefficiencies where the decision-maker is overly responsive to the recommendation. As a potential remedy, we discuss algorithms that strategically withhold recommendations, and show how they can improve the quality of final decisions.
Speaker: Keyon Vafa (Harvard)
Title: Do Large Language Models Perform the Way People Expect?
Abstract:
What makes large language models (LLMs) impressive is also what makes them hard to evaluate: their diversity of uses. To evaluate these models, we must understand the purposes they will be used for. In this talk, we’ll consider a setting where these deployment decisions depend on people’s beliefs about where an LLM will perform well. We model such beliefs as the consequence of a human generalization function: having seen what an LLM gets right or wrong, people generalize to where else it might succeed. We collect a dataset of how humans make generalizations about LLM capabilities and show that the human generalization function has predictable structure. We then evaluate LLM alignment with the human generalization function. Our results show that — especially for cases where the cost of mistakes is high — more capable models (e.g. GPT-4) can do worse on the instances people choose to use them for, exactly because they are not aligned with the human generalization function.
Speaker: Ming Yin (Purdue)
Title: Modeling Interaction Dynamics to Promote Human-AI Complementarity in Decision Making
Abstract:
Artificial intelligence (AI) technologies have been increasingly integrated into human workflows. For example, the usage of AI-based decision aids in human decision-making processes has resulted in a new paradigm of AI-assisted decision making—that is, the AI-based decision aid provides a decision recommendation to the human decision makers, while humans make the final decision. The increasing prevalence of human-AI collaborative decision making highlights the need to quantitatively model the interaction dynamics between humans and AI in these collaborative processes, which can inform better designs of AI-based decision aids to promote human-AI complementarity. In this talk, I’ll discuss a few examples illustrating how we build computational models of humans’ decision capabilities, reaction to AI’s assistive information, and adoption of AI recommendations. We then leverage these models to adjust whether, when, how to provide AI recommendations, as well as what recommendations to provide, eventually leading to significant improvement in human-AI joint decision-making performance.
Parking visual for NU: