Foundation Models and Machine Learning for Cell Type Annotation
Peng He, Professor
UC San Francisco
Applications for Fall 2025 are closed for this project.
This project focuses on applying and benchmarking state-of-the-art machine learning models—including foundation models like GeneFormer and SCimilarity—for automated cell type annotation using single-cell and spatial genomics data. The goal is to overcome the bottleneck of manual cell type labeling in large-scale datasets by developing scalable and transferable computational solutions.
Role: Subprojects:
Process single-cell RNA-seq and ATAC-seq data
Apply embedding models and train classifiers for cell type prediction
Benchmark model performance and interpret latent representations
Work on subprojects such as transfer learning (CellSage), doublet detection (DouCLing), CNV inference, surface marker discovery, and metadata inference (23andMe-like models)
Qualifications: Strong Python skills; background in machine learning, data science, or computational biology is preferred
Day-to-day supervisor for this project: Konstantinos Stasinos, Post-Doc
Hours: to be negotiated
Off-Campus Research Site: Hybrid/remote working is also allowed
Related website: https://peng-he-lab.github.io/
Related website: https://profiles.ucsf.edu/peng.he