Exploring Gene Signatures in Large-Scale Single-Cell Datasets
Peng He, Professor
UC San Francisco
Applications for Spring 2026 are closed for this project.
This project focuses on the discovery, optimization, and interpretation of gene signatures in massive single-cell and spatial transcriptomics datasets. Students will contribute to the development of scalable computational methods to extract meaningful biological patterns across thousands of samples and millions of cells.
The research involves both algorithm development and biological interpretation, offering a strong training experience at the intersection of computation and genomics. The students are expected to contribute to publications based on their data mining or coding.
Role: Subprojects may include:
Developing and optimizing efficient hierarchical clustering algorithms for large-scale transcriptomic data
Evaluating curated gene sets (e.g., cell cycle, metabolic, hormone, and signaling pathways) for their enrichment across datasets
Creating automated pipelines for signature scoring and cross-sample comparison
Developing new methods for feature gene selection to enhance interpretability and classification
Students will:
Work with curated and public single-cell datasets
Apply and improve clustering and enrichment analysis algorithms
Explore how gene signatures define cell states and tissue environments
Contribute to generalizable software modules and reports
Qualifications: Strong interest in genomics, gene regulation, or computational biology
Proficiency in Python or R for data analysis
Prior exposure to clustering, enrichment analysis, or gene set analysis is a plus
Day-to-day supervisor for this project: Wenjiang Zhou
Hours: 9-11 hrs
Off-Campus Research Site: on site/hybrid/off-campus all acceptable
Related website: https://peng-he-lab.github.io/
Related website: https://profiles.ucsf.edu/peng.he