The Evolution of Gender Roles in News Media
Heather Haveman, Professor
Sociology, Sociology
Applications for Spring 2025 are closed for this project.
I am analyzing a longitudinal database of articles published in the Washington Post newspaper from 1977 to 2024.
The primary goal is to understand how gender roles are portrayed in news media. To do this, we are using several natural-language-processing techniques. We are building a dictionary of gender-denoting terms that occur in this corpus. We will code how much gender-denoting terms are used, and in which sections of the paper. We will also develop decade-specific gender axes in semantic space, using both static word embeddings (word2vec) and transformer-based contextual embeddings (BERT). And we may use topic modeling to determine what journalists and their sources report about what activities and ideas gender is associated with.
Role: Students will write code in Python/Jupyter notebooks (and occasionally R) to analyze text data -- newspaper articles from the Washington Post newspaper. They may also be asked to join/merge our main database with external data. Finally, they will occasionally be asked to read a technical paper to learn about a new computational or statistical technique.
Students will learn how a project unfolds, and how we discover the stories that we to tell with data. They will also interact with Professor Haveman regularly (usually weekly) and gain experience documenting their work so that others can understand it.
Qualifications: Much of the data is text, so we need apprentices with experience in NLP: regular expressions, named-entity recognition, sentiment analysis, word embeddings, topic models, etc. We seek apprentices who can code in Python (and for some tasks, R), experience with ML classifiers, a deep understanding of statistical analysis, data visualization, and an appreciation for the nuances of managing and analyzing complex datasets.
The work will entail writing clean, well-documented scripts in Python/Jupyter notebooks or RStudio scripts to clean data, produce descriptive/exploratory statistics, conduct multivariate analyses, and visualize results. Documentation is critical to ensure that others -- your future selves, Professor Haveman, and others -- can fully comprehend what you have done and why. The work will also require students to LOOK at data, to validate analytical methods and make sure they are capturing what we intend.
Apprentices must stick to a schedule and follow through on deliverables. Being organized and detail-oriented is critical for developing a clear data-analysis pipeline. You need to inspect the data frequently to make sure that your code is doing what you want it to do. You also need to clearly and completely document you code so that other team members can understand it, and in the future, you can easily revise or reuse it.
Hours: 6-8 hrs
Social Sciences Digital Humanities and Data Science