Heather Haveman, Professor

Closed (1) Oski Lab - Web-Scraping | Text Parsing| Machine Learning| Databases with Start-Ups and Entrepreneurs

Applications for Spring 2018 are now closed for this project.

For this project URAP apprentices will develop code in order to collect and clean data for a variety of research projects. Apprentices will apply their programming skills to scrape product data from publicly available websites and to turn messy unstructured data sets into shiny clean data sets available for reproducible research.

Participants will scrape data on a number of markets and phenomena. They will also develop code to analyze the discourse surrounding these markets as found in electronic forums visited by market participants and articles published in the mainstream and specialized press. Students will work with machine learning packages for text analysis to help analyze the millions of observations collected via our web scrapers.

You can find more about our research projects here: https://oskilab.github.io.

Cyrus Dioun will be the day-to-day contacts to field questions, trouble-shoot problems and address everyday issues.

Undergraduates that are proficient in programming languages and statistics will help us collect, clean, and analyze large data sets.

Day-to-day supervisor for this project: Cyrus Dioun

Qualifications: Advanced coding skills. Proficiency with machine learning, SQL, and/or distributed computing.

Weekly Hours: to be negotiated

Related website: https://oskilab.github.io
Related website: http://bids.berkeley.edu/people/cyrus-dioun

Closed (2) Social Stigma and Legal Threat in Cannabis Markets: Analyzing Media Coverage and Legal Statutes

Applications for Spring 2018 are now closed for this project.

How do firms in controversial industries respond to social stigma and legal threat?

Students will help answer these questions by collecting data on (1) media coverage and (2) legal statutes affecting the cannabis industry.

Students will use Lexis-Nexis to collect newspaper and media coverage of cannabis companies in a number of different state markets. Each students will be assigned a set of cannabis producers and asked to scour local, state, and national media coverage of the dispensary to see if there is local opposition or support for these cannabis markets. Students will also be asked to collect ordinances, laws, and statutes regulating or prohibiting cannabis providers in these states.


Meeting bi-weekly, collecting and analyzing newspaper and media articles, writing weekly memos describing progress.

Day-to-day supervisor for this project: Cyrus Dioun, Ph.D. candidate

Qualifications: Self-starter with attention to detail. Well organized and proficient in searching databases and using excel.

Weekly Hours: to be negotiated

Closed (3) Chefs and Cooks: Exploring Race and Gender in Fine Dining and Food Writing

Closed. This professor is continuing with Fall 2017 apprentices on this project; no new apprentices needed for Spring 2018.

Who is a chef? What does a chef look like? How are the chef and the chef identity related to the food chefs prepare and/or the value of that food? How is the image or narrative of the chef shaped by or influenced by gender and race? Much of the sociological research about food focuses on either the processes of agro-ecology and the production of ingredients (farming, agriculture, the flawed food production system) or the process of eating (who eats what, what does this say about individuals’ or groups’ position in the social hierarchy, etc.). This project examines the neglected realm of professional cooking with an eye towards the gender and race dynamics of the professional fine dining kitchen. By collecting and analyzing data from the top food industry magazines and publications, we will examine some of the complexities and major trends surrounding the cultural meaning of food and fine food cooking in contemporary New York City and San Francisco.

Students' primary responsibility is data collection. Apprentices will scan, read and code magazine articles from the leading fine food magazines in the United State, learning about the data collection and analysis process with social scientific content analysis and text analysis.

Day-to-day supervisor for this project: Gillian Gualtieri, Ph.D. candidate

Qualifications: Attention to Detail; Timeliness; Interest in Topic; Restaurant Industry Experience (not required, but helpful); Apprentices must be able to regularly travel to downtown San Francisco to the SF Public library independently (via BART or other means)

Weekly Hours: 6-9 hrs

Off-Campus Research Site: 100 Larkin St San Francisco, CA 94102

Closed (4) Charter Schools and the Business Age: Web-Scraping and Text Analysis

Applications for Spring 2018 are now closed for this project.

If you’re interested in how today’s business age structures organizations and their messages to stakeholders, want to contribute to a team data collection & analysis effort focused on innovation in education, and are willing to challenge yourself through hands-on learning, then this is the project for you!

Here’s our focus: How does the push to run schools like businesses--complete with performance targets, incentives, and centralization in culture and governance--shape the growing charter school sector? Which charters survive and thrive in this political climate: those that stress standards-based rigor and college-readiness (traditional model), or those that prioritize independent thinking and socio-emotional development (progressive model)? And how does this differentiation affect charter school segregation--that is, do progressive schools serve white students in affluent, liberal communities while traditional schools serve students of color in poor or conservative communities?

To answer these questions, our team will extract and analyze mission statements (MSs) from the websites of every U.S. charter school open today.

I am looking to add a small number of students to each of two teams:

1.) Manual coders visit the websites of charter schools and charter management organizations (CMOs) and retrieve information into a spreadsheet. The specific focus this semester will be checking the accuracy of URLs and assembling a comprehensive list of the member schools of CMOs.

2.) CS coders: Outside-the-box, independent thinkers/tinkerers with significant computer science (CS) and/or coding skills to collaborate in web-crawling/scraping charters’ sites, performing text analysis and machine learning (natural language processing, dictionary methods, and topic models), and statistical regression.

Day-to-day supervisor for this project: Jaren Haber, Ph.D. candidate

Qualifications: 1) Manual coders: Detail-oriented, responsible, diligent folks who work well with others. No coding or research experience required for this data collection work. 2) CS coders: Significant to advanced experience with Python is a must. I am specifically looking for experience with implementing GIS (geographic information systems) coding in GeoPandas (or similar); running machine learning algorithms with TensorFlow (or similar); and/or experience web-crawling/scraping from the Internet Archive. Also desirable is experience with text analysis in Python (especially word embeddings and advanced custom dictionary analysis), working knowledge of Selenium and/or the scrapy framework, and background in statistical analysis. Other important qualities: Independent initiative, collaborative spirit, and timeliness in completing tasks.

Weekly Hours: to be negotiated

Related website: https://github.com/URAP-charter
Related website: https://osf.io/zgh5u/