URAP Project Descriptions

Fine-tuning vision-language models for clinical reasoning

Ahmed Alaa, Professor
Electrical Engineering and Computer Science

Check back for status

Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions to assist in diagnostic and treatment tasks. However, VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information. This challenge is particularly pronounced in the medical domain, where we do not only require VLM outputs to be accurate in single interactions but also to be consistent with clinical reasoning and diagnostic pathways throughout multi-turn conversations. For this purpose, this project will develop new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge.

Role: - Conduct experiments on public datasets and maintain a project codebase
- Regular presentation of progress in weekly lab meetings
- Assist in manuscript writing
- Workload expected to be > 12 hrs/week

Qualifications: - Strong experience with training deep learning models
- Experience with multi-gpu programming is desirable
- Strong interest in pursuing graduate studies

Day-to-day supervisor for this project: Alex Schubert, Graduate Student

Hours: 12 or more hours

Digital Humanities and Data Science Engineering, Design & Technologies

Return to Project List