Workshop on Theoretical Foundations of Human-AI Complementarity

Logistics

Date: Friday, September 27, 2024

Location: Northwestern University, 3rd floor Mudd library (Room: 3514) 2233 Tech Dr, Evanston, IL 60208.

Parking: For those driving to the workshop, attendees can park in the North Campus garage 2311 N Campus Dr #2300, Evanston, IL 60208. https://maps.northwestern.edu/txt/facility/646 You’ll exit the garage on the opposite side from the car entrance and you’ll see Mudd Library directly in front of you across a grassy lawn area. Take the elevator to your right in the library lobby to the 3rd floor.

Parking passes will be provided at the workshop for free parking in designated NU parking building. Please remember to ask for a pass before leaving the workshop.

Registration: https://forms.gle/VJTootmmG5bUHxxw6

Zoom Link: Join Here

YouTube: https://youtu.be/pJATGq51-sQ

Description:

The aim of this workshop is to explore theoretical foundations of optimally combining human and statistical judgments. Complementarity, referring to the superior performance of a human paired with a statistical model over either alone, is a goal when deploying predictive models to support decision-making in high-stakes domains like medicine or criminal justice. However, considerable empirical evidence suggests that complementarity is difficult to design for and achieve in practice, even when experts are assumed to have access to information that a model may not. This workshop considers how to rigorously define, design for, and evaluate human-AI complementarity

Schedule:

8:30-9:00: Breakfast

9:00-9:05: Opening Remarks

9:05-9:45: Hussein Mozannar (Microsoft Research): Who Should Predict? Conditional Delegation in Human-AI Teams

9:45-9:50: Hussein Mozannar Q/A

9:50-10:30: Jann Spiess (Stanford University): Algorithmic Assistance with Recommendation-Dependent Preferences

10:30-10:35: Jann Spiess Q/A

10:35-11:05: Coffee Break

11:05-11:45: Ming Yin (Purdue University): Modeling Interaction Dynamics to Promote Human-AI Complementarity in Decision Making

11:45-11:50: Ming Yin Q/A

11:50-12:30: Keyon Vafa (Harvard University): Do Large Language Models Perform the Way People Expect?

12:30-12:35: Keyon Vafa Q/A

12:35-1:30: Lunch

2:00-4:00: Student meetings with speakers

Organizers:

Jessica Hullman (Northwestern University)
Jason Hartline (Northwestern University)

Abstracts:

Speaker: Hussein Mozannar (MIT)

Title: Who Should Predict? Conditional Delegation in Human-AI Teams

Abstract:

AI systems are augmenting humans’ capabilities in settings such as healthcare and programming, leading to the formation of human-AI teams. A fundamental aspect of an effective human-AI team is the ability to delegate tasks strategically. Delegation enables each part of the team to focus on parts of the task they excel at and thus enables complementarity. In this talk, we will first discuss how to design AI models that have the ability to delegate tasks to humans. We will then study the mirror-setting, and try to understand how humans decide to delegate tasks to their AI counterparts with the aim of helping humans make better delegation decisions. We will study both settings when considering tasks that require a single action (classification) or multiple actions (programming, web-browsing).

Bio: Hussein Mozannar is a Senior Researcher at Microsoft Research AI Frontiers. He obtained his PhD from MIT in Social & Engineering Systems in 2024. His research focuses on augmenting humans with AI to help them complete tasks more efficiently. Specifically, he focuses on building AI models that complement human expertise and designing interaction schemes to facilitate human-AI interaction. Applications of his research include programming and healthcare.

Speaker: Jann Spiess (Stanford)

Title: Algorithmic Assistance with Recommendation-Dependent Preferences

Abstract:

When an algorithm provides risk assessments, we typically think of them as helpful inputs to human decisions, such as when risk scores are presented to judges or doctors. However, a decision-maker may not only react to the information provided by the algorithm. The decision-maker may also view the algorithmic recommendation as a default action, making it costly for them to deviate, such as when a judge is reluctant to overrule a high-risk assessment for a defendant or a doctor fears the consequences of deviating from recommended procedures. To address such unintended consequences of algorithmic assistance, we propose a principal-agent model of joint human-machine decision-making. Within this model, we consider the effect and design of algorithmic recommendations when they affect choices not just by shifting beliefs, but also by altering preferences. We motivate this assumption from institutional factors, such as a desire to avoid audits, as well as from well-established models in behavioral science that predict loss aversion relative to a reference point, which here is set by the algorithm. We show that recommendation-dependent preferences create inefficiencies where the decision-maker is overly responsive to the recommendation. As a potential remedy, we discuss algorithms that strategically withhold recommendations, and show how they can improve the quality of final decisions.

Speaker: Keyon Vafa (Harvard)

Title: Do Large Language Models Perform the Way People Expect?

Abstract:

What makes large language models (LLMs) impressive is also what makes them hard to evaluate: their diversity of uses. To evaluate these models, we must understand the purposes they will be used for. In this talk, we’ll consider a setting where these deployment decisions depend on people’s beliefs about where an LLM will perform well. We model such beliefs as the consequence of a human generalization function: having seen what an LLM gets right or wrong, people generalize to where else it might succeed. We collect a dataset of how humans make generalizations about LLM capabilities and show that the human generalization function has predictable structure. We then evaluate LLM alignment with the human generalization function. Our results show that — especially for cases where the cost of mistakes is high — more capable models (e.g. GPT-4) can do worse on the instances people choose to use them for, exactly because they are not aligned with the human generalization function.

Speaker: Ming Yin (Purdue)

Title: Modeling Interaction Dynamics to Promote Human-AI Complementarity in Decision Making

Abstract:

Artificial intelligence (AI) technologies have been increasingly integrated into human workflows. For example, the usage of AI-based decision aids in human decision-making processes has resulted in a new paradigm of AI-assisted decision making—that is, the AI-based decision aid provides a decision recommendation to the human decision makers, while humans make the final decision. The increasing prevalence of human-AI collaborative decision making highlights the need to quantitatively model the interaction dynamics between humans and AI in these collaborative processes, which can inform better designs of AI-based decision aids to promote human-AI complementarity. In this talk, I’ll discuss a few examples illustrating how we build computational models of humans’ decision capabilities, reaction to AI’s assistive information, and adoption of AI recommendations. We then leverage these models to adjust whether, when, how to provide AI recommendations, as well as what recommendations to provide, eventually leading to significant improvement in human-AI joint decision-making performance.

Parking visual for NU:

Workshop on Theoretical Foundations of Human-AI Complementarity

Join Our Newsletter

Success!

Special Program Announcement

Winter/Spring 2025 IDEAL Special Program on Deep Learning and Optimization

Click here to view the exciting series of workshops, courses, seminars and other activities!