David Bamman, Professor

Closed (1) Machine learning for the computational humanities

Applications for Spring 2019 are now closed for this project.

There are several research opportunities available for undergraduates in my group for fall/spring 2019 around projects in the "computational humanities"; these roles are for students either with a technical/programming background (e.g., CS), or with a background in the humanities (e.g., English, comp lit, etc.). Technical roles require strong programming skills and good performance in CS 189 (machine learning) or INFO 159 (natural language processing).

1. Automatically generating indexes for printed books. An index in a printed book acts a map to the important people, places, and concepts in the book; the human act of creating an index is a form of organization placed on the book. You will implement several NLP models and evaluate their performance and conduct an empirical analysis on the changing forms of indexing styles over time (using data from 100,000 books).

2. Natural language processing for literature. Many state-of-the-art methods in natural language processing are optimized for contemporary newswire; in this track, you will develop methods for improving NLP for literary texts and also pioneer new NLP tasks suited to this domain (such as distinguishing "narration" from "description" passages or identifying which scenes in a novel are coreferent with each other).

3. [Literature students]. Annotating and analyzing the structure of characters and scenes in books; this research opportunity is primarily for students in literature departments. The primary focus will be on creating new data for the computational analysis of literature, by annotating the characters, settings and scene boundaries in novels.

Past URAP research has appeared at EMNLP 2016 and EMNLP 2017; when applying, mention the specific research project(s) you're interested in. The strongest applications will have done some legwork on the research problem (such as a basic literature review).


All projects will involve reading research literature, creating annotated data for training and evaluation, and/or building models using techniques from machine learning and natural language processing. Participation in biweekly group meetings to discuss progress and questions (lasting one hour) is required.

Projected outcomes:

Learn about different areas of NLP, machine learning and the digital humanities; gain hands-on experience with a creating a dataset (a fundamental step in data science).

Qualifications: Technical roles: strong programming skills and good performance in CS 189 (machine learning) and/or INFO 159 (natural language processing). Humanities roles: major in the specific research area (e.g, English, comp lit); upper division standing and strong interest in the application of empirical methods/data science.

Weekly Hours: 9-11 hrs