Black-Box Interpretability of Large Language Models: A Model-Agnostic Framework

Monday, May 11, 2:15–2:35 p.m.
Room 236
Presenter: Brennen Yu
Modality: Traditional Talk

Abstract

Large language models (LLMs) have achieved remarkable performance across diverse tasks, yet their opacity presents significant challenges for deployment in high-stakes domains such as medicine and law, where explainability is essential. Traditional interpretability methods that examine model internals—including attention mechanisms and gradient analyses—are unavailable for closed APIs and often inadequately capture the complex, emergent behaviors characteristic of large-scale models. There is a clear need for Black-Box Interpretability methods that can accurately predict when or why an LLM will exhibit specific behaviors, while allowing for flexibility of use and expertise. While much work has been done to develop novel Black-Box Interpretability methods, the literature is scattered and there is a clear need to compare the strengths and weaknesses of the latest methods. We conduct a systematic review of Black-Box Interpretability of LLMs and a benchmark of Black-Box Interpretability methods on different models and model sizes.

Program

Check out the Program page for the full program!

Questions About the Conference?

Check out our FAQ page for answers and contact information!

Online Master of Science in Computer Science (OMSCS)

College of Computing

Search

Black-Box Interpretability of Large Language Models: A Model-Agnostic Framework

Abstract

Program

Questions About the Conference?

News Feed

College of Computing

Georgia Institute of Technology