Developing Statistical Methods for Single-Cell Patient Cohort Data
Elizabeth Purdom, Professor
Statistics
Closed. This professor is continuing with Fall 2024 apprentices on this project; no new apprentices needed for Spring 2025.
This project involves developing statistical methodologies to analyze data from single-cell sequencing of individual patients. Single-cell sequencing of mRNA measures the amount of mRNA of each gene found in individual cells. It measures the diversity of mRNA within cells, and when performed on many individuals can allow us to link the cellular diversity of an individual with, for example, their observed health outcome. Methodologies to do this require accounting for the variability within a patient and between patients. This project is focused on developing these methodologies and evaluating them on existing patient cohort datasets.
Role: The research group has already done some initial benchmarking of our methods on a handful of datasets. The primary tasks of the student(s) selected for this project will be to 1) help us scale our existing benchmarking up to many more datasets so that we can evaluate how our methods do on a wider diversity of datasets and 2) assist in cross-dataset comparisons and critical evaluation of the performance of the methods. At the end of this project, the student(s) will gain hands-on experience in computational biology data analysis and statistical methodology development.
For Task (1) the student(s) will be responsible for downloading appropriate public datasets, processing the data for analysis, and implementing methodologies developed by the research group on the datasets.The student(s) will have access to existing code developed by the group for implementing many of these tasks, and will also be responsible for building on the existing code to streamline the data processing tasks efficiently across datasets. The student will also be expected to do exploratory data analysis tasks for each dataset.
Task (2) is open-ended and can develop into a larger project depending on the skill and motivation of the student.
Qualifications: Students should be experienced in using R, and at least basic familiarity with using Unix and Github. Students should have a background in statistics, machine learning, and/or data analysis techniques (e.g. upper division coursework).
Knowledge of cellular biology is not required, but beneficial.
Day-to-day supervisor for this project: Maggie Kuang, Ph.D. candidate
Hours: 9-11 hrs
Mathematical and Physical Sciences Biological & Health Sciences