URAP Project Descriptions

Radiological Data Science and Clinical Research (lung cancer imaging and intervention clinical/data science research, radiological large language model, clinical AI validation, novel imaging modality 0.55T lung MRI clinical translation)

Jae Ho Sohn, Professor
UC San Francisco

Closed. This professor is continuing with Fall 2024 apprentices on this project; no new apprentices needed for Spring 2025.

Radiologists are doctors who specialize in extracting clinically useful information from medical images (such as CT, chest X-ray, and MRIs) and communicating these findings with other doctors. We leverage abstract clinical reasoning and visual pattern recognition skills to extract hidden information in the images that may be important for treatment planning and clinical prognostication. Communication to other physicians with precise, succinct, and clear language is an important skill set of a radiologist.

As a physician with engineering background, I have been running my lab since 2016 and trained more than 30 undergraduate, medical, and graduate students who now advanced to leading positions in academia, industry, and medical centers. I have been a faculty member at UCSF since 2021 and now part of the UCSF Center for Intelligent Imaging. In collaboration with engineering faculty members and researchers at UCSF, UC Berkeley, and various companies, I look forward to working together with students and trainees to advance cardiothoracic imaging and radiological data science research at large together.

My main research questions are:
1) Lung cancer imaging and image-guided intervention: How do we leverage big data to extract useful hidden information from medical images (aka imaging biomarker discovery), specifically in the domain of lung cancer imaging and image-guided interventions? We work with x-rays, CTs, and MRIs, with special emphasis in lung cancer screening CTs. What determines a high risk lung nodule and high risk cancers? What predicts complications or issues with image-guided lung interventions?

2) Radiological natural language processing and Q&A machines: radiology reports are quasi-annotations of the associated radiology images. How can we extract useful information from these radiology reports (which in turn are annotations of associated images)? Can we create a domain-specific query algorithm to match similar images and reports? Can we optimize the radiology report language to improve our communication? Can we generate radiological images from report descriptions? Can we generate radiological reports from images? Can we leverage large language models to create a radiological assistant? We explore these questions in collaboration with engineers at Berkeley as well as industry partners.

3) Clinical translation and evaluation of novel imaging modality such as 0.55T Lung MRI. We are one of the few sites in the world to install this new imaging modality that requires further safety testing, clinical validation, and evaluation. In collaboration with Surbeck advanced imaging center PhD MR physics faculty, Siemens industry scientists, and clinician colleagues, we have been recruiting patients, scanning them, evaluating the scans with various signal processing and machine learning algorithms, and then conducting multi-reader performance studies with radiologists.

Role: For machine learning focused projects: project design, participation in collaborative meetings with engineers, coding, evaluating, and writing manuscripts

For clinical or translational focused projects: project design, participation in collaborative meetings with clinicians, patient recruitment (if relevant for specific project), data analysis, and manuscript writing.

Qualifications: Most successful students in my team were those who were self-motivated, willing to ask questions, and proactively seek out solutions from me, colleagues, and websites.

For machine learning focused projects:
-Familiarity with Python, R, and/or STATA and ideally with at least some relevant machine learning skills for the project (e.g. at least familiarity with huggingface or similar tutorial for transformers for large language model projects).
-Ideally, concurrent enrollment or completion of a machine learning course that focuses on both traditional approaches (logistic regression, SVM, ensemble learning) and basics of deep learning

For clinical and translational research:

-Familiarity and willingness to learn basic statistics for data analysis
-Patience and drive to collect data, analyze (using excel and python), create figures for manuscript, and write manuscript under senior researcher or faculty guidance.
-Strong communication skills and passion for patient interaction & care (if working on patient facing project).

Hours: 6-8 hrs

Off-Campus Research Site: Most machine learning focused projects can be fully or nearly fully remote; For in person or hybrid projects, we are at 185 Berry St, Suite 350, San Francisco, CA 94158

Related website: https://profiles.ucsf.edu/jae.sohn

Biological & Health Sciences Digital Humanities and Data Science

Return to Project List