Skip to main content
  • UC Berkeley
  • College of Letters & Science
Berkeley University of California

URAP

Project Descriptions
Spring 2025

URAP Home Project Listings Application Contact

Machine Learning Approaches for Automated Cell Type Classification in Single-Cell Genomics Data

Peng He, Professor  
UC San Francisco  

Applications for Spring 2025 are closed for this project.

This cutting-edge research project focuses on developing and optimizing machine learning tools to automatically identify cell types and states from single-cell genomics data. Single-cell technologies have revolutionized our understanding of cellular diversity, but the manual annotation of cell types remains a significant bottleneck in data analysis. This project aims to address this challenge by creating supervised learning models that can automatically classify cells based on their molecular profiles.
The research involves working with two distinct types of single-cell data:

Single-cell RNA sequencing (scRNA-seq), which measures gene expression levels
Single-cell ATAC sequencing (scATAC-seq), which measures chromatin accessibility

The project will leverage existing annotated datasets especially the high-resolution cell atlases established in our lab to train and optimize classification models, exploring various machine learning approaches and parameter optimization strategies to achieve accurate and robust cell type prediction.

Role: The undergraduate researcher will be actively involved in the following tasks:
For scRNA-seq Analysis:

Process and prepare transcript count matrices for machine learning applications
Implement and test various classification algorithms
Conduct systematic parameter optimization experiments
Evaluate model performance using standard metrics
Document results and maintain detailed experimental records

For scATAC-seq Analysis:

Compare different dimensionality reduction techniques for feature selection
Analyze both peak score matrices and genomic bin score matrices
Implement and evaluate various classification approaches
Optimize model parameters for improved accuracy

Learning Outcomes:

Gain practical experience in machine learning and bioinformatics
Develop proficiency in programming for biological data analysis
Learn essential concepts in single-cell genomics
Acquire skills in data visualization and scientific documentation
Understand the principles of model optimization and evaluation
Experience working with large-scale biological datasets

Qualifications: Strong programming experience in Python or R
Basic understanding of statistics and probability
Familiarity with linear algebra concepts
Experience with data analysis and visualization

Day-to-day supervisor for this project: Konstantinos Stasinos, Post-Doc

Hours: to be negotiated

Off-Campus Research Site: Hybrid/remote working is also allowed

Related website: https://peng-he-lab.github.io/
Related website: https://profiles.ucsf.edu/peng.he

 Engineering, Design & Technologies   Biological & Health Sciences

Return to Project List

Office of Undergraduate Interdisciplinary Studies, Undergraduate Division
College of Letters & Science, University of California, Berkeley
Accessibility   Nondiscrimination   Privacy Policy