David Bamman, Professor

Closed (1) Reading and annotating novels (in English, Spanish, Russian, German or Japanese) for NLP

Check back for status

There are several open positions available for undergraduates to work on the LitBank project (https://github.com/dbamman/litbank). LitBank an annotated dataset of fiction to support tasks in natural language processing and the computational humanities. We'll be expanding LitBank to include contemporary works by Black authors (from the Black Books Interactive Project) and global Anglophone fiction, along with works of fiction in Spanish, Russian, German and Japanese. The primary research will involve carrying out linguistic annotations (e.g., reading novels and marking the people and places contained within them), and exploring the affordances of this data for work in cultural analytics.


Tasks:

Research will involve creating reading works of fiction and annotating linguistic phenomena and reading papers in cultural analytics. Participation in biweekly group meetings to discuss progress and questions (lasting one hour) is required.


Qualifications: Fluency in English, Spanish, Russian, German or Japanese.

Weekly Hours: 9-11 hrs

Off-Campus Research Site: Online

Closed (2) NLP and online communities

Check back for status

The Internet is a rich and diverse source of cultural and social phenomena, and much of this information exists in the form of text. We are seeking research assistants to work on projects using social media conversations and posts to improve our understanding of online communities. Possible projects areas include: measuring linguistic variation and change, quantifying bias and harm towards marginalized groups, tracking ideas in fringe communities, and operationalizing social dynamics using language. Up to one position will be filled.

Tasks: Reading research literature, collecting/scraping/annotating data, designing experiments, building models using machine learning and natural language processing methods, and communicating progress with collaborators.

Project outcomes: Learn how to do academic research, write a paper describing hypotheses, methods, and results.


Day-to-day supervisor for this project: Lucy Li, Graduate Student

Qualifications: Qualifications: Strong programming skills and good performance in CS 189 (machine learning) and/or INFO 159 (natural language processing). Upper division standing and interest in sociology, psychology, and/or linguistics.

Weekly Hours: 9-11 hrs

Off-Campus Research Site: Online

Closed (3) Developing BookNLP for English, Spanish, Russian, Japanese, and German

Check back for status

There are several research opportunities available for undergraduates for expanding BookNLP (https://github.com/dbamman/book-nlp), a natural language processing pipeline for books and other long documents. We'll be focusing on developing BookNLP for Python, and expanding its capacity to the languages of Spanish, Russian, Japanese, and German. This work will involve developing methods for named entity recognition, coreference resolution, and quotation attribution for those languages.


Tasks:

All projects will involve reading research literature, creating annotated data for training and evaluation, and/or building models using techniques from machine learning and natural language processing. Participation in biweekly group meetings to discuss progress and questions (lasting one hour) is required.

Qualifications: Strong programming skills and solid command of NLP (e.g., evidenced through strong performance in INFO 159). Fluency in English, Spanish, Russian, Japanese or German.

Weekly Hours: 9-11 hrs

Off-Campus Research Site: Online