Fine-tuning vision-language models for clinical reasoning
Ahmed Alaa, Professor
Electrical Engineering and Computer Science
Check back for status
Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions to assist in diagnostic and treatment tasks. However, VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information. This challenge is particularly pronounced in the medical domain, where we do not only require VLM outputs to be accurate in single interactions but also to be consistent with clinical reasoning and diagnostic pathways throughout multi-turn conversations. For this purpose, this project will develop new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge.
Role: - Conduct experiments on public datasets and maintain a project codebase
- Regular presentation of progress in weekly lab meetings
- Assist in manuscript writing
- Workload expected to be > 12 hrs/week
Qualifications: - Strong experience with training deep learning models
- Experience with multi-gpu programming is desirable
- Strong interest in pursuing graduate studies
Day-to-day supervisor for this project: Alex Schubert, Graduate Student
Hours: 12 or more hours
Related website: https://proceedings.neurips.cc/paper_files/paper/2023/hash/2b1d1e5affe5fdb70372cd90dd8afd49-Abstract-Conference.html
Related website: https://proceedings.neurips.cc/paper_files/paper/2023/hash/2b1d1e5affe5fdb70372cd90dd8afd49-Abstract-Conference.html