Analyze Physics literature with modern language models
Closed. This professor is continuing with Spring 2024 apprentices on this project; no new apprentices needed for Fall 2024.
Having a virtual assistant who has read and internalized decades of scientific literature is a dream that may come true in the next decade. Such an assistant will significantly speed up scientific discovery and understanding by human scientists, and eventually become agents of new knowledge itself. To make this a reality requires close collaborations between researchers working on natural language processing, machine learning, and the various STEM fields. We will attempt to make a baby step here by applying modern language models to Physics literature
Role: The student will collect publically available Physics literature, such as abstracts of Physical Review series or APS March meetings. They will then process the texts and use them to train/finetune modern language models like GPT, in order to analyze the vibes, trends, correlations, and other properties of these texts. Another related, but more ambitious direction is to integrate a language model with a computer-vision model such that the algorithm can parse figures and plots in a journal paper.
Qualifications: You are a pro in natural language processing, including state-of-art transformer-based models, and are excited about using them to assist scientific understanding and discovery. STEM/Physics and computer vision knowledge is nice to have, but not required.
Hours: 9-11 hrs
Related website: https://arxiv.org/abs/2111.13786
Related website: https://arxiv.org/abs/2204.01467