Jae Ho Sohn, Professor

Closed (1) Big Data in Radiology Research (imaging biomarker discovery, natural language processing, value/workflow efficiency in cardio-thoracic imaging)

Applications for fall 2021 are now closed for this project.

Radiologists are doctors who specialize in extracting useful information from medical images (such as CT, chest X-ray, and MRIs) and communicating the findings with other doctors. We leverage abstract clinical reasoning and visual pattern recognition skills to extract hidden information in the images that may be important for treatment planning and clinical prognostication. Communication to other physicians with precise, succinct, and clear language is an important skill set of a radiologist.

As a physician with engineering background, I have been running my lab since 2016 (previously known as the Big Data in Radiology, BDRAD) and trained more than 30 undergraduate, medical, and graduate students who now advanced to leading positions in academia, industry, and medical centers. I have been hired as a new faculty member at UCSF since 2021 and now part of the UCSF Center for Intelligent Imaging. I look forward to working together with students and training them in biomedical data science and machine learning research.

My main research questions are:
1) Computer Vision: How do we leverage big data to extract useful hidden information from medical images (aka imaging biomarker discovery), specifically in the domain of cardiac and pulmonary imaging? We work with x-rays, CTs, and MRIs, with special emphasis in lung cancer screening CTs. We have been working with radiomics, segmentation architectures, ConvLSTM, and vision transformers most recently.

2) Natural language processing and content-based image retrieval (CBIR): radiology reports are quasi-annotations of the associated radiology images. How can we extract useful information from these radiology reports (which in turn are annotations of associated images)? Can we create a domain-specific query algorithm to match similar images and reports? Can we optimize the radiology report language to improve our communication? We have used both traditional and transformer (BERT) based NLP approaches to tackle these questions.

3) Clinical big data research in cardiothoracic imaging: how can we improve the value of imaging? could we improve the cost vs benefit ratio of imaging by boosting diagnostic value or workflow efficiency? could we identify the ideal candidates for specific types of imaging (especially cancer screening)? My lab focuses on lung cancer screening CTs and cardiac MRIs. We use data science techniques such as XGboost and random forest to tackle these questions.

Research design, data processing, model development, manuscript/abstract writing, figure design, national/local poster and paper presentation (RSNA, SIIM, etc).

The lab has always been remote in nature since 2016, but we occasionally invite our students to the UCSF campus to present research and meet our faculty members. My office and CPU/GPU resources are physically located at UCSF China Basin campus.

Qualifications: Most successful students in my team were those who were self-motivated, willing to ask questions, and proactively seek out solutions from me, colleagues, and web (such as stack overflow). -Familiarity with Python, R, and/or STATA. -Ideally, concurrent enrollment or completion of a machine learning course that focuses on both traditional approaches (logistic regression, SVM, ensemble learning) and basics of deep learning

Weekly Hours: 6-8 hrs

Related website: https://profiles.ucsf.edu/jae.sohn