Building a dataset to analyze the representation of race in literature
David Bamman, Professor
Information, School of
Closed. This professor is continuing with Fall 2023 apprentices on this project; no new apprentices needed for Spring 2024.
Many computational studies on people in text focus on gender. We are interested in broadening the dimensions of identity labels analyzed in natural language processing and cultural analytics. In particular, we hope to carefully curate resources and datasets for investigating the representation of race in fiction, especially in literature used in American school curricula. The primary research will involve reading novels and analyzing the social identities of people in them to create an annotated dataset of fictional characters. Then, we will explore the affordances of this data for measuring and characterizing the representation of people in books. This project would be accompanied by a reading group where students gain a broader understanding of current considerations around race in natural language processing, dataset curation practices in AI, and cultural analytics. Up to three positions will be filled.
Role: Tasks: Read research literature, annotate data, learn how to carry out academic research, craft research questions, weekly participation in group meetings
Qualifications: Qualifications: Background in ethnic studies, education, English literature, or sociology. No programming experience required, but this URAP can act as a stepping stone into digital humanities and cultural analytics, and learning goals can be modified based on the prior experience of the student.
Day-to-day supervisor for this project: Lucy Li, Graduate Student
Hours: 9-11 hrs
Off-Campus Research Site: Online
Engineering, Design & Technologies Arts & Humanities Digital Humanities and Data Science