URAP Project Descriptions

Large language models for guiding research in bioinformatics

Reza Abbasi-Asl, Professor
Neuroscience

Applications for Spring 2025 are closed for this project.

There is an immense amount of unstructured and uncollated data in neuroscience and bioinformatics that could be used to guide knowledge discovery. Large language models (LLMs) have shown potential in extracting and analyzing unstructured natural language corpora and are promising for semi-automated processing of text into scientific knowledge such as annotation of novel cell-types or molecular patterns. We are interested in applying LLMs and conversational agents to assist researchers in exploratory data analysis and indexing large biological datasets and biomedical databases.

Role: The successful candidate will start by reading relevant papers, understanding the domain context in and using prompt engineering approaches to interact with language models and conversational interfaces. Then use statistical principles to analyze and interpret the outcome. The candidate will document the outcomes and present them in the form of a research publication. Note that the successful candidate will be working remotely on this project.

Qualifications:
Qualifications: Candidate should have experience with LLMs or related conversational interfaces such as one or more of ChatGPT, Bard, Vicuna, Galactica, etc. Knowledge in statistics, programming with Python, and existing issues with LLMs (such as fairness, trustworthiness, etc.) is required. Familiarity with machine learning and using third-party APIs, natural language processing, prompt engineering, and general knowledge of problems in computational biology and bioinformatics is desirable but not essential.

Hours: 12 or more hours

Off-Campus Research Site: remote

Related website: http://abbasilab.org

Digital Humanities and Data Science Biological & Health Sciences

Return to Project List