URAP Project Descriptions

Rapid Reviews\ Infectious Diseases (RR\ID) Data Science Project

Stefano M. Bertozzi, Professor
Public Health

Applications for Fall 2025 are closed for this project.

Rapid Reviews\Infectious Diseases (RR\ID) [rrid.mitpress.mit.edu], is an initiative of the MIT Press and the University of California, Berkeley. It is an open access, rapid-review overlay journal for the accelerated curation and peer review of COVID-19 and emerging infections disease-related research. RR\ID takes a transdisciplinary approach to discuss, curate, and communicate seminal research of public interest.

The RR Editorial Office is composed of senior editorial leadership and five core domains: (1) Biological & Chemical Sciences (2) Physical Sciences & Engineering; (3) Medical Sciences; (4) Public Health and (5) Social Sciences. We have editorial teams based at UC Berkeley as well as in India, Mexico, Vietnam, Rwanda. We plan to expand to at least 8 new countries over the next two years. Each team is led by an associate and assistant editor with domain-specific expertise and is supported by scholars from across the globe. Each week, the core teams meet individually to discuss the most impactful preprints in their domain, and every Friday, all teams meet together to pitch their top choices to the senior editorial leadership.

RR has collaborated with data scientists at Berkeley and at LBNL since our inception to develop AI/ML/NLP-tools to support our work. We seek URAP students to work with our faculty and staff and with other data science students to increasingly automate and improve the quality and efficiency of our work – and by extension serve as a model for all innovative open science efforts.

Role: Creating interactive, user-friendly databases to support our internal functioning. We have previously used Sheets and Slides to support our operations, but our scale has exceeded their capacity to support multiple simultaneous users and we have begun developing higher-capacity applications using Airtable and SoftR applications.
We are working on developing real-time dashboards for performance management that support all phases of our operation and that synthesize data from our publishing platform (Janeway) and our internal operations data. This would include creative content visualization.

Conducting retrospective analyses on publishing metrics to better understand predictors of success (peer review acceptance and performance) that consider both characteristics of the manuscript being reviewed and the characteristics of the peer reviewers identified. Propose operational changes and/or decision-support tools to improve workflow efficiency and quality.

Work on development of AI-enabled tools with internal and external (Allen AI, Prophy) collaborators to improve AI tools that support scientific review. These include tools to: (1) Disambiguate authors/reviewers (the need to distinguish the J Doe who is a coauthor of the paper we are interested in from other J Does)
Improve the identification and also prioritization of “nearest neighbor” papers to the manuscript under review that consider methodological proximity more than topical proximity. (2) Rank authors of nearest neighbor papers by their proximity to the manuscript under review considering the totality of their published works (requires effective disambiguation) and their proximity to the authors (potential conflict of interest). (3) Explore other methods of identifying peer reviewers (e.g. network analysis exploring who publishes with whom).

If you are interested, please reach out as soon as possible to Stefano Bertozzi, Editor-in-Chief (sbertozzi@berkeley.edu), Hildy Fong Baker, Managing Director (hildy.fong@berkeley.edu), and Boma Levy-Braide, Operations Manager (b.levy-braide@berkeley.edu)

Qualifications: We are interested in students with experience in a number of areas related to data science. We can use students with AI training, with HTML skills, with database creation/management skills. We detail below some specific needs with some of our projects, but if you have other related data science/programming skills and are interested in applying them to infectious diseases, please contact us.

Examples of specific skills:
--Solid foundation in key machine learning concepts including supervised and unsupervised learning, neural networks, regularization, and optimization techniques.
--Experienced in data preprocessing, feature extraction, and augmentation.
--Proficient in managing large datasets, with a good understanding of data storage, retrieval, and pipeline creation.
--Hands-on experience with machine learning frameworks like TensorFlow, PyTorch, or Keras, and ability to fine-tune pretrained models.
--Able to address model performance issues such as overfitting, underfitting, and bias, with strong analytical skills to interpret results and make data-driven improvements.
--Data cleaning, transformation, and visualization to work with data in Google Sheets and Airtable.
--Web development, particularly UX/UI design and HTML
--Developing web pages with complex functionalities beyond what Softr can support.

Day-to-day supervisor for this project: Hildy Fong Baker, PhD, Staff Researcher

Day-to-day supervisor for this project: Hildy Fong Baker , Staff Researcher

Hours: 6-8 hrs

Off-Campus Research Site: The work will be largely remote but there will be some workshops in-person.

Related website: https://rrid.softr.app/
Related website: https://rrid.softr.app/

Biological & Health Sciences Digital Humanities and Data Science

Return to Project List