Heather Haveman, Professor

Closed (1) Computational Analysis of Social Science Research

Applications for Fall 2018 are now closed for this project.

This project traces the trajectory of research ideas across over 400,000 articles published in the social sciences (economics, sociology, etc.) & professional fields (management, public policy, etc.) from the 1970s onward. It uses data from JSTOR Data for Research. JSTOR is one of the largest online repository of academic journal articles, and its Data for Research arm provides summary data on articles archived on JSTOR: 1-grams, 2-grams, 3-grams, & metadata.

Students will write programs in Python to:
1) clean data from this database by eliminating book reviews, errata, and front matter;
2) measure the extent to which the articles published in a given year engage with each theory, based on concept dictionaries;
3) validate the concept dictionaries using word embeddings (word2vec);
4) analyze the dynamics of connections/relationships between theories using cosine similarity; and
5) generate data visualizations of engagement with each theory.
Other tasks (e.g., topic modeling) will depend on the results.

Qualifications: 1) Experience programming in Python. 2) Familiarity with natural-language processing techniques: dictionary methods, cosine similarity, word embeddings (word2vec, doc2vec), topic modeling, etc. 3) Willingness to work as part of an integrated team of faculty, doctoral students, and undergraduate researchers.

Weekly Hours: to be negotiated

Related website: http://www.heatherhaveman.net/

Closed (2) I Regret To Inform You That Your Private Information Has Been Compromised

Applications for Fall 2018 are now closed for this project.

Privacy is one of the central issues of our time. All things being equal, we assume that most people prefer privacy; it is a foundational right enshrined in the “penumbras” of the 1st, 3rd, 4th, 5th, 9th, and 14th amendments of the U.S. constitution as well several state constitutions (including CA, MA, and WA) and the Universal Declaration of Human Rights. Despite our appreciation of privacy, police officers wear body cameras, customer loyalty programs track purchases, and the Transportation Safety Administration performs full body scans. This paradox illuminates the deep ambivalence in modern American society about privacy, and a largely untapped area of research in Sociology. This research seeks to understand the deeper cultural logics inherent in shifting views on privacy in the modern world as well as the evolution of its meaning historically in the US context.

Day-to-day tasks: (1) data entry (2) transcribing audio files of interviews (3) database searches (4) reviewing articles to determine relevant materials (5) managing and organizing files, (6) extracting general information about articles and interviews, (7) attending privacy focused events and taking notes, and (8) assisting the supervisor with a bit of “public sociology” including information collection and dissemination on privacy, and planning and executing privacy focused event(s). Opportunities for different kinds of work will expand as the project progresses, and aspects of the project that the undergraduate is interested in will be prioritized when possible. Apprentices will meet as a group weekly on Friday and will do group work occasionally.

Learning outcomes: This is a valuable opportunity to get hands-on experience on the data collection process and insight into how data collection relates to the larger research agenda/goals (Still trying to figure out if graduate school is for you? Students can also expect to learn skills relevant and marketable outside of the research space. Students will meet weekly with the research supervisor and larger team in order to ask questions, workshop solutions, and discuss ongoing work and findings.

Day-to-day supervisor for this project: Naniette H. Coleman, Ph.D. candidate

Qualifications: Qualifications: Qualifications: Students should be detail-oriented, organized, and excited by/interested in learning more about research. This project is open to undergraduates of all years and all majors, and there are no course prerequisites. You will be trained in anything you are required to do. A passion for the topic of study is preferred, but not required.

Weekly Hours: to be negotiated

Off-Campus Research Site: Unless meeting with the supervisor, faculty mentor, or research group (weekly on Friday) students may work anywhere they like.



Closed (4) Charter schools and the business age: Analyzing and visualizing web text and school data

Applications for Fall 2018 are now closed for this project.

If you’re interested in how today’s business age structures organizations and their messages to stakeholders, want to contribute to a team data collection & analysis effort focused on innovation in education, and are willing to challenge yourself through hands-on learning, then this is the project for you!

Here’s our focus: How does the push to run schools like businesses--complete with performance targets, incentives, and centralization in culture and governance--shape the growing charter school sector? Which charters survive and thrive in this political climate: those that stress standards-based rigor and college-readiness (traditional model), or those that prioritize independent thinking and socio-emotional development (progressive model)? And how does this differentiation affect charter school segregation--that is, do progressive schools serve white students in affluent, liberal communities while traditional schools serve students of color in poor or conservative communities?

To answer these questions, our team has extracted web text from the websites of every U.S. charter school open today. We have begun parsing and analyzing these textual and quantitative data, and there is much more to do!

I am looking for outside-the-box, independent thinkers/tinkerers with significant computer science (CS), statistics, and/or coding skills to collaborate on the following coding challenges:

1.) Data management: Merge, clean, and update unique, complex data structure composed of multiple large-scale educational data sets, including school academic performance measures and webs of charter management organization membership.
Key packages/fluencies: Pandas, SQL, spreadsheets

2.) Web-crawling/scraping: Expand and modify Scrapy Cluster framework to collect URLs and web contents from charter schools over time using Internet Archive; parse text from word documents and images.
Key packages/fluencies: Docker, Scrapy, BeautifulSoup, HTML, SQL, web frameworks generally

3.) Text parsing & analysis: Parse and filter web-crawling output, look for qualitative and quantitative patterns in texts (document length, distinctive words, etc.), cluster school learning approaches deductively (custom dictionaries) and inductively (topic models, k-means clustering, word embeddings).
Key packages/fluencies: Natural Language Processing, BeautifulSoup, computational text analysis generally

4.) Geospatial analysis: Examine geographic patterns in charter school proliferation, size, performance, and especially ideology within race- and class-structured school districts and Census tracts.
Key packages/fluencies: Folium, matplotlib, GeoPandas, Tableau (less important), GIS generally

5.) Statistical analysis: Investigate interrelated effects of community characteristics (race, poverty, education, etc.) on school ideology and academic performance.
Key packages/fluencies: Mixed-effects/hierarchical models (primarily), XGBoost, statistics & machine learning generally

Day-to-day supervisor for this project: Jaren Haber, Ph.D. candidate

Qualifications: Significant to advanced experience with Python is a must. I am specifically looking for experience with data management; web-crawling/scraping and parsing; and textual, geospatial, and statistical analysis, as outlined above. Other important qualities: Independent initiative, collaborative spirit, and timeliness in completing tasks.

Weekly Hours: to be negotiated
Related website: https://osf.io/zgh5u/