Skip to main content
  • UC Berkeley
  • College of Letters & Science
Berkeley University of California

URAP

Project Descriptions
Spring 2025

URAP Home Project Listings Application Contact

Interpretability in AI Systems: Developing Transparent and Explainable AI Models

Dawn Song, Professor  
Electrical Engineering and Computer Science  

Applications for Spring 2025 are closed for this project.

As AI models become more complex, understanding their internal decision-making processes becomes increasingly challenging. This project aims to advance the interpretability of AI systems, making their operations transparent and their decisions explainable to users and developers alike.

Areas of focus include:
- Explainable AI Techniques: Developing methods such as attention mechanisms, representation engineering, and surrogate models to elucidate how AI models make decisions.
- User-Centric Interpretability: Creating tools that present model explanations in an accessible manner for non-technical stakeholders.
- Benchmarking Interpretability: Establishing metrics and benchmarks to evaluate the effectiveness of different interpretability approaches.
- Interpretable Model Architectures: Designing AI architectures inherently more interpretable without significantly compromising performance.

Role: The project will involve both the creation of novel interpretability methods and the evaluation of existing techniques across various AI applications.

Qualifications: - Solid understanding of machine learning and deep learning.

- Experience with model evaluation and validation.

- Proficient in Python and machine learning libraries such as PyTorch.

- Strong analytical and problem-solving skills.

Day-to-day supervisor for this project: Zhun Wang

Hours: 12 or more hours

 Engineering, Design & Technologies

Return to Project List

Office of Undergraduate Interdisciplinary Studies, Undergraduate Division
College of Letters & Science, University of California, Berkeley
Accessibility   Nondiscrimination   Privacy Policy