Maryam Vareth, Researcher

Closed (1) Deep Learning in Medical Imaging

Applications for Spring 2020 are now closed for this project.

UCSF Department of Radiology and Biomedical Imaging and Berkeley Institute for Data Science (BIDS) are excited to offer a combined educational and research opportunity for motivated undergraduate students in the medical imaging research team.

“Towards Data Driven Medicine”: Advances in artificial intelligence have the potential in transforming the field of medicine. Medical diagnostics and treatments are fundamentally a data problem. Turning medical images, lab tests, genomics, patient histories into accessible, clinically-relevant insights requires new collaborations between the traditional domains of biomedical research and data science specialties like machine learning. Our laboratory at UCSF specializes in acquisition, reconstruction, post-processing, and quantitative analysis of Magnetic Resonance (MR) brain images. The wealth of the imaging data collected in our laboratory over the years has not been utilized to its full potential. Working in close collaboration with Berkeley Institute for Data Science we would like to develop methods, tools and pipelines to fully utilize our imaging data to help clinicians make better decisions about treatment strategies for patients with brain tumors using deep learning approaches.

Deep learning is a new and powerful machine learning method, which utilizes a range of neural network architectures to perform several imaging tasks, which up to now have included segmentation, object (i.e. lesion or region of interest) detection and classification. Deep learning methods are different from the conventional machine learning methods (i.e. support vector machine (SVM) and random forest (RF)) in one major sense: the latter rely on feature extraction methods to train the algorithm, whereas deep learning methods learn the image data directly without a need for feature extraction.

Multiple opportunities for projects that capture and extract information from our imaging data are available. Our projects are centered around a very popular deep learning method in medical imaging field which is convolutional neural networks (CNN). We will be using the NiftyNet platform ( to build new solutions to our various imaging problems. The possible projects include brain tumor segmentation, image reconstruction, image synthesis, etc. The details for each project will be discussed during the interview process. URAP students will be tasked with developing, implementing, refining, and testing algorithms and workflows to achieve the specific goal in their chosen project. They will work in teams and closely with graduate students, post-docs and data scientist mentors.

The specific selection of available tasks will depend on the chosen project and the progress in the time line of the project, and the student's experience and preferences but the common tasks are:

- Attend the group meeting in BIDS most likely on Wednesdays (exact time will be announced later)
- Attend individual meeting with the supervisor once at the beginning, in the middle and at the end of the first and second semester.
- Lead 1-3 weekly seminars/hackathons on deep learning research paper and coding discussions per semester.
- Give a formal presentation at the end of each semester to BIDS and UCSF community
-Upon successful progress, it is expected that students submit/present at a national research meeting. Students are encouraged to seek out and apply for undergraduate research grants.

Day-to-day supervisor for this project: Maryam Vareth, Ph.D., Staff Researcher

Qualifications: Students from various majors are encouraged to apply, including but not limited to EECS, BioE, CS, data science, math, and statistics. We are looking for 4-6 highly inquisitive students who have Requirement: - Proficiency in programming languages (Python and/or MATLAB and/or R preferred) - Familiarity with the Linux/Unix environment - Working knowledge of Version Control (such as Github). - Great teamwork (organization, communication skills, punctuality, reliability, etc) - Interest in data science, medical imaging, machine learning, engineering and healthcare research Recommended: - Working knowledge of basic machine learning and deep learning (cost function, cross-validation, overfitting, error analysis, etc). - Working knowledge of Tensorflow/Keras or Pytorch. - Working knowledge of signal processing & image processing

Weekly Hours: to be negotiated

Related website:

Closed (2) Cryptography of the unknown regions of genomes

Closed. This professor is continuing with Fall 2019 apprentices on this project; no new apprentices needed for Spring 2020.

As it becomes cheaper and quicker to sequence the genomes of organisms and more sophisticated informatics tools make analysis more meaningful, some of the more intractable questions about the intricacies of regulation of genetic mechanisms can be posed and answered. Currently, scientists are able to determine which regions of the genomes are coding regions, or genes, by their easily recognizable syntax of coding for amino acids, but the patterning of non-coding regulatory regions, such as enhancer sequences, is still very much a mystery. Enhancers can either repress or active gene expression and mutations in enhancers are associated with developmental problems in many species and can cause human disease. Considering the dramatic influence enhancers have on development, we are still do not know how to identify them in a genome for further analyses. Are there common rules that govern enhancer sequence architecture? This project aims to answer these questions.

We are approaching this problem from a computational perspective by mapping known characteristics of DNA and doing comparative analysis across ~25 different species of Drosophila (fruit fly). The first part of the project will be data wrangling, which consists of annotating and organizing the DNA data (long strings of letters). We will then create workflows to map features onto these sequences, combine existing datasets, and with the end goal of feeding the data into machine learning algorithms to predict function in non- coding regions of DNA.

Day-to-day supervisor for this project: Ciera Martinez, Ph.D., Post-Doc

Qualifications: We are looking for 1-4 highly inquisitive students who enjoy working through problems. Strong organizational skills and attention to details are required. Knowledge of python and/or R is required. Interest in machine learning, SQL, statistics, git, biology, and visualization preferred.

Weekly Hours: 9-11 hrs

Closed (3) Natural History and Data Science

Closed. This professor is continuing with Fall 2019 apprentices on this project; no new apprentices needed for Spring 2020.

Natural history data is broad in scope and can range from species occurrence data to environmental measurements to basically any data that describes how organisms interact with each other and their environment. In the past 15 years there has been an increasing drive to digitize this data and make this data available through public databases. This project explores these vast datasets from a data science perspective. We will be performing both team and self-directed exploratory analyses with immediate products in the form of medium length reports that will be published online.

The purpose of this work is to:

1. provide tutorial guides on how to access and analyze this data
2. survey the types of questions that are possible in these databases
3. Provide critical analysis and make recommendations / build tools to increase usability of these databases
4. perform biological / ecological research that can be expanded in greater depth for peer reviewed publication.
, Post-Doc

Qualifications: We are looking for 4-6 highly inquisitive students who enjoy working through problems. Strong organizational skills and attention to details is required. Knowledge of python and/or R required. Interest in SQL, statistics, git, biology, ecology, and visualization preferred.

Weekly Hours: 9-11 hrs

Closed (4) Hydrologic forecasting for the East River, CO

Closed. This professor is continuing with Fall 2019 apprentices on this project; no new apprentices needed for Spring 2020.

River flow forecasting is essential for planning reservoir operations, defense strategies against flooding, and fluvial ecosystems management plans. However, flow forecasting is a highly uncertain science. One of the biggest uncertainties lies in resolving the timescales over which water is stored in the subsurface and time lags between perturbations in hydrometeorological variables and perturbations in streamflow.

To reduce this uncertainty, we are synthesizing data from highly instrumented watersheds throughout the world into a common database. With the organized and cleaned data, we will be applying statistical techniques from information theory to identify the critical timescales over which predictors of streamflow are relevant to streamflow forecasting. One of the watersheds that we will be treating as a case-study is the East River, CO, a site that has been intensively monitored by scientists at Lawrence Berkeley Lab. We are recruiting students to help in the process of database compilation and analysis of results.

The recruited student will be working with a large and multidisciplinary team of researchers at Berkeley and Berkeley Lab. Specific tasks will include implementing a workflow to ensure that datasets downloaded from instrumented watersheds are comparable and operational. This may involve running code to fill gaps in the datasets or aggregate different datapoints in time. Once the complete dataset from a particular site has been compiled, the student will interpret the output of the code that derives time lags between precipitation, other hydroclimatic variables, and discharge, drawing conclusions and syntheses.

Learning outcomes will be improved experience with scientific computing and programming, and exposure to a central challenge in hydrology. Students will also gain familiarity with cutting-edge tools emerging from the discipline of Information Theory and will gain insight into different modeling approaches for forecasting river flow.

Day-to-day supervisor for this project: Zexuan Xu, Post-Doc

Qualifications: The student should have programming (Python is preferred, though strong programming skills in another language, such as Matlab, will also be viewed favorably) and computing (Github experience is preferred) skills. The student ideally has also had some exposure to statistics and is comfortable interpreting scientific graphs. This would be an ideal opportunity for a statistics, CS, or civil engineering student who would like to gain experience with environmental applications. It could also be a good fit for an earth science or geography student with strong programming skills.

Weekly Hours: to be negotiated