Classifying archaic ancestry in human genomes using machine learning models
Priya Moorjani, Professor
Center for Computational Biology, Molecular and Cell Biology
Applications for Fall 2024 are closed for this project.
Our lab studies human evolutionary genetics using genomic data from present-day and ancient DNA samples. We aim to understand how different populations relate to each other and what are some of the genes related to human adaptation and diseases. To this end, we develop computational and statistical methods and perform population genetic simulations, as well as analyze large-scale genomic datasets. In this project, we will use machine learning (ML) models to identify and classify regions of archaic ancestry in humans. After migrating out of Africa, humans encountered archaic hominins, including Neanderthals and Denisovans. As a result, all non-Africans today have ~4% archaic ancestry in their genomes. We are interested in applying machine learning (ML) methods, including large language models, to identify and classify regions of the genome by archaic ancestry. Our results will shed light on the history and legacy of human interactions with our archaic ancestors and help us better understand human evolution.
Role: The project will involve data analysis using large-scale human genomics datasets and conducting simulations to explore various demographic models and parameters related to the history of modern humans. This is an exciting opportunity to work at the intersection of genetics and computer science, to learn about human evolutionary history. The student will also have the opportunity to work with other lab members to develop technical skills in computational and statistical genetics. The successful candidate will start by reading relevant papers and gaining a familiarity with genetic datatypes, human evolution concepts, and relevant ML models. They will then train machine learning models with existing large-scale genomic datasets, test each methods accuracy, benchmark against existing methods, and interpret results within an evolutionary context.
Qualifications: Proficiency in a programming language (e.g., Python, R or
equivalent) and coursework in introductory biology is required. Candidate should have an interest in human evolution, genetics, computational biology, or bioinformatics. Candidates with experience with machine learning and AI methods are encouraged to apply. A prospective undergraduate researcher should expect to commit a minimum of 12 hours per week to research during the semester. Students who are looking for research experience, ideally with the goal of doing an honors thesis, will be strongly favored.
Day-to-day supervisor for this project: Sarah Johnson, Ph.D. candidate
Hours: 12 or more hours
Related website: http://moorjanilab.org/
Biological & Health Sciences Digital Humanities and Data Science Engineering, Design & Technologies Mathematical and Physical Sciences