Developing BookNLP for English, Spanish, Russian, Japanese, and German
David Bamman, Professor
Information, School of
Closed. This professor is continuing with Fall 2023 apprentices on this project; no new apprentices needed for Spring 2024.
There are several research opportunities available for undergraduates for expanding BookNLP (https://github.com/booknlp/booknlp), a natural language processing pipeline for books and other long documents. We'll be focusing on developing BookNLP for Python, and expanding its capacity to the languages of Spanish, Russian, Japanese, and German. This work will involve developing methods for named entity recognition, coreference resolution, and quotation attribution for those languages.
Role: Tasks:
All projects will involve reading research literature, creating annotated data for training and evaluation, and/or building models using techniques from machine learning and natural language processing. Participation in biweekly group meetings to discuss progress and questions (lasting one hour) is required.
Qualifications: Strong programming skills and solid command of NLP (e.g., evidenced through strong performance in INFO 159). Fluency in English, Spanish, Russian, Japanese or German.
Hours: 9-11 hrs
Off-Campus Research Site: Online
Engineering, Design & Technologies Arts & Humanities Digital Humanities and Data Science