Heather Haveman, Professor

Closed (1) Oski Lab - Web-Scraping | Text Parsing| Machine Learning| Databases with Start-Ups and Entrepreneurs

Check back for status

For this project URAP apprentices will develop code in order to collect and clean data for a variety of research projects. Apprentices will apply their programming skills to scrape product data from publicly available websites and to turn messy unstructured data sets into shiny clean data sets available for reproducible research.

Participants will scrape data on a number of markets and phenomena. They will also develop code to analyze the discourse surrounding these markets as found in electronic forums visited by market participants and articles published in the mainstream and specialized press. Students will work with machine learning packages for text analysis to help analyze the millions of observations collected via our web scrapers.

You can find more about our research projects here: https://oskilab.github.io.

Cyrus Dioun will be the day-to-day contacts to field questions, trouble-shoot problems and address everyday issues.

Undergraduates that are proficient in programming languages and statistics will help us collect, clean, and analyze large data sets.

Day-to-day supervisor for this project: Cyrus Dioun

Qualifications: Advanced coding skills. Proficiency with machine learning, SQL, and/or distributed computing.

Weekly Hours: to be negotiated

Related website: https://github.com/URAP-charter
Related website: http://bids.berkeley.edu/people/cyrus-dioun

Closed (2) Social Stigma and Legal Threat in Cannabis Markets: Analyzing Media Coverage and Legal Statutes

Check back for status

How do firms in controversial industries respond to social stigma and legal threat?

Students will help answer these questions by collecting data on (1) media coverage and (2) legal statutes affecting the cannabis industry.

Students will use Lexis-Nexis to collect newspaper and media coverage of cannabis companies in a number of different state markets. Each students will be assigned a set of cannabis producers and asked to scour local, state, and national media coverage of the dispensary to see if there is local opposition or support for these cannabis markets. Students will also be asked to collect ordinances, laws, and statutes regulating or prohibiting cannabis providers in these states.


Meeting bi-weekly, collecting and analyzing newspaper and media articles, writing weekly memos describing progress.

Day-to-day supervisor for this project: Cyrus Dioun, Ph.D. candidate

Qualifications: Self-starter with attention to detail. Well organized and proficient in searching databases and using excel.

Weekly Hours: to be negotiated

Closed (3) Chefs and Cooks: Exploring Race and Gender in Fine Dining and Food Writing

Closed. This professor is continuing with Spring 2018 apprentices on this project; no new apprentices needed for Fall 2018.

Who is a chef? What does a chef look like? How are the chef and the chef identity related to the food chefs prepare and/or the value of that food? How is the image or narrative of the chef shaped by or influenced by gender and race? Much of the sociological research about food focuses on either the processes of agro-ecology and the production of ingredients (farming, agriculture, the flawed food production system) or the process of eating (who eats what, what does this say about individuals’ or groups’ position in the social hierarchy, etc.). This project examines the neglected realm of professional cooking with an eye towards the gender and race dynamics of the professional fine dining kitchen. By collecting and analyzing data from the top food industry magazines and publications, we will examine some of the complexities and major trends surrounding the cultural meaning of food and fine food cooking in contemporary New York City and San Francisco.

Students' primary responsibility is data collection. Apprentices will scan, read and code magazine articles from the leading fine food magazines in the United State, learning about the data collection and analysis process with social scientific content analysis and text analysis.

Day-to-day supervisor for this project: Gillian Gualtieri, Ph.D. candidate

Qualifications: Attention to Detail; Timeliness; Interest in Topic; Restaurant Industry Experience (not required, but helpful); Apprentices must be able to regularly travel to downtown San Francisco to the SF Public library independently (via BART or other means)

Weekly Hours: 6-8 hrs

Off-Campus Research Site: 100 Larkin St San Francisco, CA 94102

Open (4) Charter schools and the business age: Analyzing and visualizing web text and school data

Open. Apprentices needed for the fall semester. Enter your application on the web beginning August 15th. The deadline to apply is Monday, August 27th at 9 AM.

If you’re interested in how today’s business age structures organizations and their messages to stakeholders, want to contribute to a team data collection & analysis effort focused on innovation in education, and are willing to challenge yourself through hands-on learning, then this is the project for you!

Here’s our focus: How does the push to run schools like businesses--complete with performance targets, incentives, and centralization in culture and governance--shape the growing charter school sector? Which charters survive and thrive in this political climate: those that stress standards-based rigor and college-readiness (traditional model), or those that prioritize independent thinking and socio-emotional development (progressive model)? And how does this differentiation affect charter school segregation--that is, do progressive schools serve white students in affluent, liberal communities while traditional schools serve students of color in poor or conservative communities?

To answer these questions, our team has extracted web text from the websites of every U.S. charter school open today. We have begun parsing and analyzing these textual and quantitative data, and there is much more to do!

I am looking for outside-the-box, independent thinkers/tinkerers with significant computer science (CS), statistics, and/or coding skills to collaborate on the following coding challenges:

1.) Data management: Merge, clean, and update unique, complex data structure composed of multiple large-scale educational data sets, including school academic performance measures and webs of charter management organization membership.
Key packages/fluencies: Pandas, SQL, spreadsheets

2.) Web-crawling/scraping: Expand and modify Scrapy Cluster framework to collect URLs and web contents from charter schools over time using Internet Archive; parse text from word documents and images.
Key packages/fluencies: Docker, Scrapy, BeautifulSoup, HTML, SQL, web frameworks generally

3.) Text parsing & analysis: Parse and filter web-crawling output, look for qualitative and quantitative patterns in texts (document length, distinctive words, etc.), cluster school learning approaches deductively (custom dictionaries) and inductively (topic models, k-means clustering, word embeddings).
Key packages/fluencies: Natural Language Processing, BeautifulSoup, computational text analysis generally

4.) Geospatial analysis: Examine geographic patterns in charter school proliferation, size, performance, and especially ideology within race- and class-structured school districts and Census tracts.
Key packages/fluencies: Folium, matplotlib, GeoPandas, Tableau (less important), GIS generally

5.) Statistical analysis: Investigate interrelated effects of community characteristics (race, poverty, education, etc.) on school ideology and academic performance.
Key packages/fluencies: Mixed-effects/hierarchical models (primarily), XGBoost, statistics & machine learning generally

Day-to-day supervisor for this project: Jaren Haber, Ph.D. candidate

Qualifications: Significant to advanced experience with Python is a must. I am specifically looking for experience with data management; web-crawling/scraping and parsing; and textual, geospatial, and statistical analysis, as outlined above. Other important qualities: Independent initiative, collaborative spirit, and timeliness in completing tasks.

Weekly Hours: to be negotiated
Related website: https://osf.io/zgh5u/