Topic Modeling of Public Comments on California's K-12 Ethnic Studies Curriculum
Eos Trinidad, Professor
Education
Applications for Fall 2024 are closed for this project.
This project aims to analyze over 10,000 public comments submitted to the California Department of Education regarding the K-12 Ethnic Studies Curriculum. We will use advanced natural language processing and machine learning techniques, specifically topic modeling, to uncover patterns and themes in this large text dataset. The comments themselves are fascinating, covering topics like racism, power, narrative, the purpose of public education and engagement in local politics. This project is a great opportunity to use machine learning/topic modeling in the service of social science and policy research. The apprentice will gain valuable experience in processing and analyzing large text datasets while contributing to our understanding of education policy.
Role: The successful candidate will start by familiarizing themselves with the project context and relevant literature on topic modeling and its applications in social science research. Initial work will revolve around managing and preprocessing of a large PDF based dataset. Candidate(s) will then assist in developing and implementing algorithms to process and analyze the public comments.
Specific tasks include:
Manage large dataset of PDFs
Generate random subset of dataset
Converting image-based PDFs to text-based PDFs
Topic/Sentiment analysis
Identify and extract author/origin from public comments
Generate per-PDF meta-data encapsulating critical extracted information
Qualifications: Candidates should have strong programming skills in Python and experience with natural language processing and machine learning. Ideal Candidates will have experience with: converting PDF images into text-based PDFs; NLTK (or similar libraries); Familiarity with version control systems and open source projects (e.g., Git); Background or interest in education policy, ethnic studies, or related social sciences
Day-to-day supervisor for this project: Emily Reich, Graduate Student
Hours: to be negotiated
Off-Campus Research Site: This position can be fully remote, but space can be made available at Berkeley Way West for weekly meetings and work time.
Social Sciences Education, Cognition & Psychology