Steven Brenner, Professor

Closed (1) A unified analysis strategy for RNA-seq based differential gene expression and alternative slicing analysis (computational biology).

Closed. This professor is continuing with Spring 2021 apprentices on this project; no new apprentices needed for Fall 2021.

RNA-seq, a transcriptome profiling technology that uses next-generation sequencing platforms, has been widely used in biological and medical research. RNA-seq has the potential to quantify both gene expression and alternative splicing. It is routinely used to reveal transcriptome differences, either gene expression or splicing, between two or more biological conditions. The goal of this kind of analyses is often to direct future research, that is, to identify potentially causal genes whose expression or splicing can explain or distinguish different biological conditions.

However, determining whether gene expression or splicing analyses should be pursued for more in-depth study is often a question of trial and error. A plausible but widely used solution is to carry out both differential gene expression and splicing analyses and compare the number of detected differentially expressed genes and differentially spliced events. However, the measured changes, TPM (Transcripts Per Kilobase Million) for gene expression, and PSI (Percent Spliced In) for splicing, are not comparable and the parameters of the analyses, such as p-value or FDR cutoff and expression fold change or PSI change threshold, are often arbitrary. Thus the numbers of detected genes or events are not comparable. Subsequent functional gene set analysis such as KEGG or GO enrichment, may provide considerable insights in some cases, but still highly relies on individual judgement.

The goal of this project is to develop statistical strategies to answer the question: is gene expression or splicing the major change of the transcriptome? More broadly, we aim to create a framework that places changes in transcript expression in a framework consistent with transcript alternative splicing.

Our initial analyses demonstrate that principal component analyses (PCA) for gene expression or splicing event usage may partially answer this question. Moreover, isoform expression, which can be quickly calculated using Kmer-based strategies, contains both gene expression and splicing information, and thus can be used to answer this question. The student will develop statistical strategy to reveal major transcriptome changes using isoform expression data and develop a web tool that is available to the public.

Day-to-day supervisor for this project: Zhiqiang Hu, Post-Doc

Qualifications: The student must work on the project for at least 12 hours per week in the lab. The student will meet with the mentor every week and attend group meetings. The student will adhere to other lab policies (including weekly notebooks to track research and semester reports) and register for academic credits. The student is required to continue the project during the spring semester and the summer of 2020, if invited to do so. Qualifications: (1) desire to learn and conduct basic research; (2) some understanding of molecular biology concepts and next-generation sequencing (NGS) technologies; (3) Linux and programming (R, Perl or Python) experience. Candidates who have completed core statistics classes are preferred. Experience with NGS data analysis is preferred. Applicants with GPA under 3.6 will be considered only in exceptional circumstances.

Weekly Hours: 12 or more hours

Related website: http://compbio.berkeley.edu

Closed (2) Coevolution of nonsense-mediated mRNA decay and alternative splicing

Closed. This professor is continuing with Spring 2021 apprentices on this project; no new apprentices needed for Fall 2021.

Nonsense-mediated mRNA decay (NMD) is a eukaryotic RNA surveillance pathway that degrades aberrant transcripts harboring premature termination codons. This pathway, in conjunction with alternative splicing, regulates gene expression post-transcriptionally. Hundreds to thousands of transcripts are degraded by NMD in diverse species, including many important regulators of developmental and stress response pathways. While the NMD pathway exists in all eukaryotes, it also exhibits considerable differentiation among species. For example, SMG1, a factor phosphorylating UPF1, is not present in Arabidopsis thaliana and multiple fungi.

Our previous study in humans suggested that (1) many of the NMD factors can be regulated by the NMD pathway; (2) splicing factors are highly enriched in NMD targets; (3) nearly all SR proteins and many hnRNPs produce isoforms that can be degraded by the NMD pathway. Based on the above observations, we will explore whether the NMD pathway arose and co-evolved with alternative splicing. Specifically, we will test (1) the NMD pathway first targeted NMD factors and splicing factors, then spread to more broad targets; (2) NMD factors also evolved as splicing patterns diverged. We will use NMD inhibited RNA-seq data (both public data and data generated by our lab or our collaborators) in various species.


Day-to-day supervisor for this project: Tina Bakolitsa, Post-Doc

Qualifications: The student must work on the project for at least 12 hours per week in the lab. The student will meet with the mentor every week and attend group meetings. The student will adhere to other lab policies (including weekly notebooks to track research and semester reports) and register for academic credits. The student is required to continue the project during the spring semester and the summer of 2020, if invited to do so. Qualifications: (1) desire to learn and conduct basic research; (2) some understanding of molecular biology concepts and next-generation sequencing (NGS) technologies; (3) Linux and programming (R, Perl or Python) experience. Candidates who have completed core statistics classes are preferred. Experience with NGS data analysis is preferred. Applicants with GPA under 3.6 will be considered only in exceptional circumstances.

Weekly Hours: 12 or more hours

Related website: http://compbio.berkeley.edu

Closed (3) Automatic identification of protein domains

Closed. This professor is continuing with Spring 2021 apprentices on this project; no new apprentices needed for Fall 2021.

Proteins often fold into compact structural units, called domains. Protein domains are basic units of protein function and evolution. Delineating domain boundaries is a prerequisite for further analyses of protein structures. However, this process is largely a manual process and the accuracy of these computer programs is still not satisfactory. This project will include two parts: critical assessment of current protein domain identification programs, and development of approaches to improve the accuracy by combining existing computer programs. Students interested in AJAX web development are also invited to help improve the web interface for displaying current data on protein domain architectures.

Day-to-day supervisor for this project: John-Marc Chandonia, Staff Researcher

Qualifications: The ideal candidate is willing to learn and knows how to write programs (in any language); knowledge of protein structures is a plus. Applicants with GPA under 3.6 will be considered only in exceptional circumstances. Other requirements: Candidates must: •Attend 3-hour lab meeting every week •Attend personal genomics subgroup meeting every week •Adhere to all lab policies (including weekly notebooks to track research, semester reports) •Must register for credits, regardless of program-specific requirements.

Weekly Hours: 9-11 hrs

Related website: http://compbio.berkeley.edu

Closed (4) Critical Assessment of Genome Interpretation (CAGI)

Closed. This professor is continuing with Spring 2021 apprentices on this project; no new apprentices needed for Fall 2021.

The field of genome interpretation is essential for our understanding of human biology and the advancement of personalized medicine. However, the rapid accumulation of genomic data far exceeds our capacity for reliable interpretation. Consequently, the majority of variation discovered by next generation sequencing technologies is of unknown significance. These variants represent one of the greatest current challenge in clinical genetics. Currently, over a hundred predictive methods for interpreting genomic variation exist. Yet, the reliability of these methods has not been ascertained, and the field lacks a clear consensus on how to evaluate methods addressing variant interpretation.

The Critical Assessment of Genome Interpretation (CAGI) is a community experiment that evaluates the prediction of phenotypes from genetic variation. Over its five editions, CAGI has developed over 50 challenges, and objectively assessed the performance of submitted predictions against experimental and clinical characterizations. These challenges have identified bottlenecks in genome interpretation, highlighted innovations, and helped inform guidelines for clinical practice.

Based on our collection of experimental datasets and expertise in method assessment, we are now seeking to develop a calibrated reference for all available predictive methods. This reference will introduce the first standard of its kind in the field of genome interpretation, and serve as a go-to resource for both method developers and the broader biomedical and clinical community. We already have an agreement in place for incorporating this reference in widely used databases of functional predictions, such as dbNSFP.

To inform development of the CAGI calibrated reference, you will generate comparisons between experimentally determined and predicted effects of missense variants from CAGI and the dbNSFP database set.

You will work closely with Tina Bakolitsa (the CAGI Project Scientist) to collect data for all experimentally characterized missense variants in CAGI challenges, and match them to corresponding predictions from the dbNSFP database. Based on the CAGI data, predictions will be assessed using our in-house R package.

The project will entail a detailed literature review of current available methods for variant interpretation, and use of our in-house R package.
, Staff Researcher

Qualifications: The ideal candidate is willing to learn and knows how to write programs (in any language); knowledge of genomics is a plus. Applicants with GPA under 3.6 will be considered only in exceptional circumstances. Other requirements Candidates must: • Attend 3-hour lab meeting every week • Attend personal genomics subgroup meeting every week • Adhere to other lab policy (including weekly notebooks to track research, semester reports) • Register for credits, regardless of their program-specific requirements • Research full time in the lab during the summer, if invited to do so

Weekly Hours: 12 or more hours

Related website: https://genomeinterpretation.org
Related website: http://compbio.berkeley.edu/