Critical Assessment of Genome Interpretation (CAGI)
Steven Brenner, Professor
Plant and Microbial Biology
Closed. This professor is continuing with Spring 2024 apprentices on this project; no new apprentices needed for Fall 2024.
The field of genome interpretation is essential for our understanding of human biology and the advancement of personalized medicine. However, the rapid accumulation of genomic data far exceeds our capacity for reliable interpretation. Consequently, the majority of variation discovered by next generation sequencing technologies is of unknown significance. These variants represent one of the greatest current challenge in clinical genetics. Currently, over a hundred predictive methods for interpreting genomic variation exist. Yet, the reliability of these methods has not been ascertained, and the field lacks a clear consensus on how to evaluate methods addressing variant interpretation.
The Critical Assessment of Genome Interpretation (CAGI) is a community experiment that evaluates the prediction of phenotypes from genetic variation. Over its five editions, CAGI has developed over 50 challenges, and objectively assessed the performance of submitted predictions against experimental and clinical characterizations. These challenges have identified bottlenecks in genome interpretation, highlighted innovations, and helped inform guidelines for clinical practice.
Based on our collection of experimental datasets and expertise in method assessment, we are now seeking to develop a calibrated reference for all available predictive methods. This reference will introduce the first standard of its kind in the field of genome interpretation, and serve as a go-to resource for both method developers and the broader biomedical and clinical community. We already have an agreement in place for incorporating this reference in widely used databases of functional predictions, such as dbNSFP.
Role: To inform development of the CAGI calibrated reference, you will generate comparisons between experimentally determined and predicted effects of missense variants from CAGI and the dbNSFP database set.
You will work closely with Tina Bakolitsa (the CAGI Project Scientist) to collect data for all experimentally characterized missense variants in CAGI challenges, and match them to corresponding predictions from the dbNSFP database. Based on the CAGI data, predictions will be assessed using our in-house R package.
The project will entail a detailed literature review of current available methods for variant interpretation, and use of our in-house R package.
Qualifications: The ideal candidate is willing to learn and knows how to write programs (in any language); knowledge of genomics is a plus. Applicants with GPA under 3.6 will be considered only in exceptional circumstances.
Other requirements
Candidates must:
• Attend 3-hour lab meeting every week
• Attend personal genomics subgroup meeting every week
• Adhere to other lab policy (including weekly notebooks to track research, semester reports)
• Register for credits, regardless of their program-specific requirements
• Research full time in the lab during the summer, if invited to do so
Hours: 12 or more hours
Related website: https://genomeinterpretation.org
Related website: http://compbio.berkeley.edu/