Clinical Translational and Data Science Research in Radiology (cardiothoracic imaging - lung cancer & interstitial lung disease; radiological communication with large language model assistance; Privacy-preserving medical image & text processing with homomorphic encryption)
Jae Ho Sohn, Professor
UC San Francisco
Applications for Spring 2026 are closed for this project.
Radiologists are doctors who specialize in extracting clinically useful information from medical images (such as CT, chest X-ray, and MRIs) and communicating these findings with other doctors. We leverage abstract clinical reasoning and visual pattern recognition skills to extract hidden information in the images that may be important for treatment planning and clinical prognostication. Communication to other physicians with precise, succinct, and clear language is an important skill set of a radiologist.
As a physician with engineering background, I have been running my lab since 2016 and trained more than several dozen undergraduate, medical, and graduate students who now advanced to leading positions in academia, industry, and medical centers. I have been a faculty member at UCSF since 2021 and now part of the UCSF Center for Intelligent Imaging. In collaboration with engineering faculty members and researchers at UCSF, UC Berkeley, and various companies, I look forward to working together with students and trainees to advance cardiothoracic imaging and radiological data science research at large together.
My main research questions are:
1) Cardiothoracic Imaging relating to Lung Cancer and Interstitial Lung Disease: How do we leverage big data to extract useful hidden information from medical images (aka imaging biomarker discovery), specifically in the domain of lung cancer imaging and image-guided interventions? We work with x-rays, CTs, and MRIs, with special emphasis in lung cancer screening CTs. What determines a high risk lung nodule and high risk cancers? What predicts complications or issues with image-guided lung interventions?
2) Radiological Communication with Large (Vision) Language Model Assistance: radiology reports are quasi-annotations of the associated radiology images. How can we extract useful information from these radiology reports (which in turn are annotations of associated images)? Can we create a domain-specific query algorithm to match similar images and reports? Can we optimize the radiology report language to improve our communication? Can we generate radiological images from report descriptions? Can we generate radiological reports from images? Can we leverage large language models to create a radiological assistant? We explore these questions in collaboration with engineers at Berkeley as well as industry partners.
3) Clinical translation of homomorphic encryption for privacy-preserving medical image & text processing: homomorphic encryption is a mathematical technique that allows the application of mathematical operation (such as machine learning) while the data remains encrypted. We seek to clinically translate and validate this technology in imaging and text data by doing a pilot study of data sharing and crowd-based data processing to enable clinical trial registry.
Role: For machine learning focused projects: project design, participation in collaborative meetings with engineers, coding, evaluating, and writing manuscripts
For clinical or translational focused projects: project design, participation in collaborative meetings with clinicians, patient recruitment (if relevant for specific project), data analysis, and manuscript writing.
Qualifications: Most successful students in my team were those who were self-motivated, willing to ask questions, and proactively seek out solutions from me, colleagues, and websites.
For machine learning focused projects:
-Familiarity with basic concepts in machine learning. Proficiency utilizing StackOverflow and/or LLM to assist in creating codes.
-Ideally, concurrent enrollment or completion of a machine learning course that focuses on both traditional approaches (logistic regression, SVM, ensemble learning), basics of deep learning, and/or large language models
For clinical and translational research:
-Familiarity and willingness to learn basic statistics for data analysis
-Patience and drive to collect data, analyze (using excel and python), create figures for manuscript, and write manuscript under senior researcher or faculty guidance.
-Strong communication skills and passion for patient interaction (if working on patient facing project).
Day-to-day supervisor for this project: Masha Bondarenko
Hours: to be negotiated
Off-Campus Research Site: 185 Berry St, Suite 350, San Francisco, CA 94107. During initial phase of orientation, we invite everyone to spend some time in-person at UCSF China Basin campus. Afterwards and during course period, some projects (typically machine learning projects) may allow hybrid/remote work. Patient facing projects will typically require in-person presence.
Related website: https://profiles.ucsf.edu/jae.sohn
Related website: https://sohnlab.ucsf.edu