What Do People Talk about When they Talk about Work?
Heather Haveman, Professor
Sociology, Sociology
Applications for Spring 2024 are closed for this project.
My research team is analyzing employee reviews from Glassdoor.com to understand what topics or themes are most salient to employees. In other words, we want to know what employees say when they describe their jobs and workplaces.
Although there has been a lot of research on a few topics that people might discuss – job satisfaction, work-life balance, and career aspirations – we don’t have a holistic picture of what matters to employees. So we are conducting an inductive study based on employees' voluntary, anonymous statements on the Glassdoor portal.
To do this, we are using an algorithm (BERTopic) that models themes in documents based on word embeddings from BERT (rather than words, as in LDA-based topic models). We already have preliminary results on a random sample of the data. In the fall, we will resume work to refine our analyses and start to write them up.
Role: We are looking for apprentices to work closely with Professor Haveman in weekly meetings, either in person or on zoom. The work will entail writing clean, well-documented scripts in Python/Jupyter notebooks to clean data, produce descriptive/exploratory statistics, conduct multivariate analyses, and visualize results. Documentation is critical to ensure that others -- your future selves, Professor Haveman, and others -- can fully comprehend what you have done and why. The work will also require students to LOOK at data, to validate analytical methods and make sure they are capturing what we intend.
The work will entail writing clean, well-documented scripts in Python/Jupyter notebooks to clean data, produce descriptive/exploratory statistics, conduct multivariate analyses, and visualize results. Documentation is critical to ensure that others -- your future selves, Professor Haveman, and others -- can fully comprehend what you have done and why.
Students will learn how a project unfolds, and how we discover the stories that we to tell with data. They will also interact with Professor Haveman regularly (usually weekly) and gain experience documenting their work so that others can understand it.
Qualifications: Much of the data is text, so we need apprentices with experience in NLP: regular expressions, named-entity recognition, sentiment analysis, word embeddings, topic models, etc.. We also need apprentices with proficiency in Python, experience with ML classifiers, a deep understanding of statistical analysis, and an appreciation for the nuances of managing and analyzing complex datasets.
I value apprentices who pay close attention to detail, are enthusiastic, and can stick to a schedule and follow through on deliverables. Apprentices must have be willing attend carefully to the details of their coding assignments. This required them to inspect the raw data frequently to make sure that their code is doing what it is supposed to do. It also requires them to clearly and completely document their code so that other team members can understand it, and in the future, they can easily revise or reuse it.
Hours: 6-8 hrs
Social Sciences Digital Humanities and Data Science