Skip to main content
  • UC Berkeley
  • College of Letters & Science
Berkeley University of California

URAP

Project Descriptions
Fall 2025

URAP Home Project Listings Application Contact

Foundation Models and Machine Learning for Cell Type Annotation

Peng He, Professor  
UC San Francisco  

Applications for Fall 2025 are closed for this project.

This project focuses on applying and benchmarking state-of-the-art machine learning models—including foundation models like GeneFormer and SCimilarity—for automated cell type annotation using single-cell and spatial genomics data. The goal is to overcome the bottleneck of manual cell type labeling in large-scale datasets by developing scalable and transferable computational solutions.


Role: Subprojects:

Process single-cell RNA-seq and ATAC-seq data
Apply embedding models and train classifiers for cell type prediction
Benchmark model performance and interpret latent representations
Work on subprojects such as transfer learning (CellSage), doublet detection (DouCLing), CNV inference, surface marker discovery, and metadata inference (23andMe-like models)

Qualifications: Strong Python skills; background in machine learning, data science, or computational biology is preferred

Day-to-day supervisor for this project: Konstantinos Stasinos, Post-Doc

Hours: to be negotiated

Off-Campus Research Site: Hybrid/remote working is also allowed

Related website: https://peng-he-lab.github.io/
Related website: https://profiles.ucsf.edu/peng.he

 Engineering, Design & Technologies   Biological & Health Sciences

Return to Project List

Office of Undergraduate Interdisciplinary Studies, Undergraduate Division
College of Letters & Science, University of California, Berkeley
Accessibility   Nondiscrimination   Privacy Policy