Heather Haveman, Professor

Open (1) Oski Lab - Web-Scraping | Text Parsing| Machine Learning| Databases with Start-Ups and Entrepreneurs

Open. Apprentices needed for the spring semester. Please do NOT contact faculty before February 5th (the start of the 4th week of classes)! Enter your application on the web beginning January 9th. The deadline to apply is Tuesday, January 23rd at 8 AM.

For this project URAP apprentices will develop code in order to collect and clean data for a variety of research projects. Apprentices will apply their programming skills to scrape product data from publicly available websites and to turn messy unstructured data sets into shiny clean data sets available for reproducible research.

Participants will scrape data on a number of markets and phenomena. They will also develop code to analyze the discourse surrounding these markets as found in electronic forums visited by market participants and articles published in the mainstream and specialized press. Students will work with machine learning packages for text analysis to help analyze the millions of observations collected via our web scrapers.

We are collaborating with a Berkeley business incubator that will have 10 start-ups residing just off campus this fall. The most dedicated and productive apprentices will have the opportunity to work with start-up founders on real world problems.

You can find more about our research projects here: http://www.oskilab.com.

Cyrus Dioun will be the day-to-day contacts to field questions, trouble-shoot problems and address everyday issues.

Undergraduates that are proficient in programming languages and statistics will help us collect, clean, and analyze large data sets.

Day-to-day supervisor for this project: Cyrus Dioun

Qualifications: Advanced coding skills. Proficiency with machine learning and distributed computing.

Weekly Hours: to be negotiated

Related website: http://www.oskilab.com
Related website: http://bids.berkeley.edu/people/cyrus-dioun

Open (2) Computer Vision: Classifying Photographs Using Deep Learning

Open. Apprentices needed for the spring semester. Please do NOT contact faculty before February 5th (the start of the 4th week of classes)! Enter your application on the web beginning January 9th. The deadline to apply is Tuesday, January 23rd at 8 AM.

We are looking for a few talented, irreverent advanced computer scientists that enjoy challenges, puzzles, and problem solving. We have collected hundreds of thousands of photographs with associated labels and want to use packages such as Caffe to train an algorithm to accurately classify them.

Last semester our team worked on normalizing the color and clarity of the photos and quantifying the color composition. This semester we will work on on recognizing shapes and finding a way to parallelize and speed up this data intensive task.

This project is funded by an Amazon Web Services Grant and will be run on EC2 instances.


Meeting bi-weekly. Writing code and implementing packages to analyze and classify photographs.

Day-to-day supervisor for this project: Cyrus Dioun, Ph.D. candidate

Qualifications: Advanced coding abilities in Python and Matlab. Courage. Persistence in the face of seemingly insurmountable odds.

Weekly Hours: to be negotiated

Closed (3) Chefs and Cooks: Exploring Race and Gender in Fine Dining and Food Writing

Closed. This professor is continuing with Fall 2017 apprentices on this project; no new apprentices needed for Spring 2018.

Who is a chef? What does a chef look like? How are the chef and the chef identity related to the food chefs prepare and/or the value of that food? How is the image or narrative of the chef shaped by or influenced by gender and race? Much of the sociological research about food focuses on either the processes of agro-ecology and the production of ingredients (farming, agriculture, the flawed food production system) or the process of eating (who eats what, what does this say about individuals’ or groups’ position in the social hierarchy, etc.). This project examines the neglected realm of professional cooking with an eye towards the gender and race dynamics of the professional fine dining kitchen. By collecting and analyzing data from the top food industry magazines and publications, we will examine some of the complexities and major trends surrounding the cultural meaning of food and fine food cooking in contemporary New York City and San Francisco.

Students' primary responsibility is data collection. Apprentices will scan, read and code magazine articles from the leading fine food magazines in the United State, learning about the data collection and analysis process with social scientific content analysis and text analysis.

Day-to-day supervisor for this project: Gillian Gualtieri, Ph.D. candidate

Qualifications: Attention to Detail; Timeliness; Interest in Topic; Restaurant Industry Experience (not required, but helpful); Apprentices must be able to regularly travel to downtown San Francisco to the SF Public library independently (via BART or other means)

Weekly Hours: 6-9 hrs

Off-Campus Research Site: 100 Larkin St San Francisco, CA 94102

Open (4) Charter Schools and the Business Age: Web-Scraping and Text Analysis

Open. Apprentices needed for the spring semester. Please do NOT contact faculty before February 5th (the start of the 4th week of classes)! Enter your application on the web beginning January 9th. The deadline to apply is Tuesday, January 23rd at 8 AM.

If you’re interested in how today’s business age structures organizations and their messages to stakeholders, want to contribute to a team data collection & analysis effort focused on innovation in education, and are willing to challenge yourself through hands-on learning, then this is the project for you!

Here’s our focus: How does the push to run schools like businesses--complete with performance targets, incentives, and centralization in culture and governance--shape the growing charter school sector? Which charters survive and thrive in this political climate: those that stress standards-based rigor and college-readiness (traditional model), or those that prioritize independent thinking and socio-emotional development (progressive model)? And how does this differentiation affect charter school segregation--that is, do progressive schools serve white students in affluent, liberal communities while traditional schools serve students of color in poor or conservative communities?

To answer these questions, our team will extract and analyze mission statements (MSs) from the websites of every U.S. charter school open today.

I am looking to add a small number of students to each of two teams:

1.) Manual coders visit the websites of charter schools and charter management organizations (CMOs) and retrieve information into a spreadsheet. The specific focus this semester will be checking the accuracy of URLs and assembling a comprehensive list of the member schools of CMOs.

2.) CS coders: Outside-the-box, independent thinkers/tinkerers with significant computer science (CS) and/or coding skills to collaborate in web-crawling/scraping charters’ sites, performing text analysis and machine learning (natural language processing, dictionary methods, and topic models), and statistical regression.

Day-to-day supervisor for this project: Jaren Haber, Ph.D. candidate

Qualifications: 1) Manual coders: Detail-oriented, responsible, diligent folks who work well with others. No coding or research experience required for this data collection work. 2) CS coders: Significant to advanced experience with Python is a must. I am specifically looking for experience with implementing GIS (geographic information systems) coding in GeoPandas (or similar); running machine learning algorithms with TensorFlow (or similar); and/or experience web-crawling/scraping from the Internet Archive. Also desirable is experience with text analysis in Python (especially word embeddings and advanced custom dictionary analysis), working knowledge of Selenium and/or the scrapy framework, and background in statistical analysis. Other important qualities: Independent initiative, collaborative spirit, and timeliness in completing tasks.

Weekly Hours: to be negotiated

Related website: https://github.com/URAP-charter
Related website: https://osf.io/zgh5u/