Skip to main content
  • UC Berkeley
  • College of Letters & Science
Berkeley University of California

URAP

Project Descriptions
Spring 2025

URAP Home Project Listings Application Contact

Reading and annotating novels (beyond English) for NLP

David Bamman, Professor  
Information, School of  

Applications for Spring 2025 are closed for this project.

LitBank an annotated dataset of fiction to support tasks in natural language processing and the computational humanities. While it currently exists for English, we'll be branching out to create similar resources for other languages as well (including Spanish, Japanese, German and other languages). The primary research will involve carrying out linguistic annotations (e.g., reading novels and marking the people and places contained within them), and exploring the affordances of this data for work in cultural analytics. This URAP is for students who are interested the field of cultural analytics (using empirical methods to study culture) and come from a humanities/social science background.

Role: Research will involve annotating novels for the linguistic phenomena described above. We will contextualize this research by reading and discussing papers in cultural analytics. Participation in biweekly group meetings to discuss progress and questions (lasting one hour) is required.

Qualifications: A background in humanities or social sciences, and an interest in the digital humanities/cultural analytics, is preferred. Fluency in a language beyond English is required (in your application, please detail what languages you are fluent in). This position does not require programming experience.

Hours: 9-11 hrs

Off-Campus Research Site: Online

 Engineering, Design & Technologies   Arts & Humanities   Digital Humanities and Data Science

Return to Project List

Office of Undergraduate Interdisciplinary Studies, Undergraduate Division
College of Letters & Science, University of California, Berkeley
Accessibility   Nondiscrimination   Privacy Policy