Interpretability in AI Systems: Developing Transparent and Explainable AI Models
Dawn Song, Professor
Electrical Engineering and Computer Science
Applications for Spring 2025 are closed for this project.
As AI models become more complex, understanding their internal decision-making processes becomes increasingly challenging. This project aims to advance the interpretability of AI systems, making their operations transparent and their decisions explainable to users and developers alike.
Areas of focus include:
- Explainable AI Techniques: Developing methods such as attention mechanisms, representation engineering, and surrogate models to elucidate how AI models make decisions.
- User-Centric Interpretability: Creating tools that present model explanations in an accessible manner for non-technical stakeholders.
- Benchmarking Interpretability: Establishing metrics and benchmarks to evaluate the effectiveness of different interpretability approaches.
- Interpretable Model Architectures: Designing AI architectures inherently more interpretable without significantly compromising performance.
Role: The project will involve both the creation of novel interpretability methods and the evaluation of existing techniques across various AI applications.
Qualifications: - Solid understanding of machine learning and deep learning.
- Experience with model evaluation and validation.
- Proficient in Python and machine learning libraries such as PyTorch.
- Strong analytical and problem-solving skills.
Day-to-day supervisor for this project: Zhun Wang
Hours: 12 or more hours
Engineering, Design & Technologies