URAP Project Descriptions

Reading and annotating novels (beyond English) for NLP

David Bamman, Professor
Information, School of

Applications for Fall 2025 are closed for this project.

LitBank an annotated dataset of fiction to support tasks in natural language processing and the computational humanities. While it currently exists for English, we'll be branching out to create similar resources for other languages as well (including Spanish, Japanese, German and other languages). The primary research will involve carrying out linguistic annotations (e.g., reading novels and marking the people and places contained within them), and exploring the affordances of this data for work in cultural analytics. This URAP is for students who are interested the field of cultural analytics (using empirical methods to study culture) and come from a humanities/social science background.

Role: Research will involve annotating novels for the linguistic phenomena described above. We will contextualize this research by reading and discussing papers in cultural analytics. Participation in biweekly group meetings to discuss progress and questions (lasting one hour) is required.

Qualifications: A background in humanities or social sciences, and an interest in the digital humanities/cultural analytics, is preferred. Fluency in a language beyond English is required (in your application, please detail what languages you are fluent in). This position does not require programming experience.

Hours: 9-11 hrs

Engineering, Design & Technologies Arts & Humanities Digital Humanities and Data Science

Return to Project List