Culture from Employee Discourse
Heather Haveman, Professor
Sociology, Sociology
Applications for Spring 2025 are closed for this project.
I am analyzing employee descriptions (reviews) of their firms from Glassdoor.com.
We are measuring the gender slant of employing organizations’ conceptions of what work means, who workers are, and who has power. Such gendered conceptions can erect barriers to equality by framing ideal workers, work activities, and company goals as male. This raises questions about how well women fit into organizations and whether they are competent in many jobs.
We observe employees’ conceptions of their firms by using natural-language processing (NLP) techniques (specifically, word embeddings) to analyze what workers say about their firms. We measure how much discourse about tech firms is slanted male vs. neutral or female using techniques derived from computer science and computational linguistics. We also investigate associations between the gender slant of employee discourse and things that firms value (e.g., innovation, performance, and sustainability) and gender stereotypes (e.g., men are assertive, women are friendly). We also seek to predict which employees in which firms will talk about their firms in more gendered ways. We want to compare the gender slant of employee discourse between industries that are male-dominated (e.g., tech, finance) and those that are female-dominated (e.g., retail).
Role: Students will write code in Python/Jupyter notebooks (and occasionally R) to analyze text data -- employee reviews from Glassdoor.com. They may also be asked to join/merge our main database with external data. Finally, they will occasionally be asked to read a technical paper to learn about a new computational or statistical technique.
Students will learn how a project unfolds, and how we discover the stories that we to tell with data. They will also interact with Professor Haveman regularly (usually weekly) and gain experience documenting their work so that others can understand it.
Qualifications: Much of the data is text, so we need apprentices with experience in NLP: regular expressions, named-entity recognition, sentiment analysis, word embeddings, topic models, etc. We seek apprentices who can code in Python (and for some tasks, R), experience with ML classifiers, a deep understanding of statistical analysis, data visualization, and an appreciation for the nuances of managing and analyzing complex datasets.
The work will entail writing clean, well-documented scripts in Python/Jupyter notebooks or RStudio scripts to clean data, produce descriptive/exploratory statistics, conduct multivariate analyses, and visualize results. Documentation is critical to ensure that others -- your future selves, Professor Haveman, and others -- can fully comprehend what you have done and why. The work will also require students to LOOK at data, to validate analytical methods and make sure they are capturing what we intend.
Apprentices must stick to a schedule and follow through on deliverables. Being organized and detail-oriented is critical for developing a clear data-analysis pipeline. You need to inspect the data frequently to make sure that your code is doing what you want it to do. You also need to clearly and completely document you code so that other team members can understand it, and in the future, you can easily revise or reuse it.
Hours: 6-8 hrs
Related website: http://www.heatherhaveman.net/
Social Sciences Digital Humanities and Data Science