Big Data Preparation
Anastassia Fedyk, Professor
Business, Haas School
Applications for Fall 2024 are closed for this project.
This project focuses on data: hand-collecting new data, improving existing data, and structuring large messy data. Today, very large data sets are often at the heart of many social science research questions. However, those data sets can be plagued by big data problems: missing data, bad data, duplicate data.
This project is a great starter for younger students (freshmen and sophomores) interested in working hands-on with data and can be a great segue to the more technical project 'Big Data Global Employment Dynamics.' Students will learn and practice all aspects of data analysis and preparation required before statistical methods such as machine learning can be applied to social science questions.
Role: Key tasks will include:
- Analyzing/annotating data sets in preparation for analysis.
- Identifying additional/alternative data sets that could help to answer key research questions.
- LInking multiple data sets and running summary statistics.
- Visualizing data in useful ways to communicate ideas.
Students involved in this project will perfect their skills in:
- Understanding big data in the context of real world questions.
- Data validation/annotation/visualisation.
- Effective communication, both written and verbal.
Qualifications: Required:
- Highly organized;
- Detail oriented;
- Interested in big data
Must be willing to put in 10 hours/week every week, with no exception.
Hours: 9-11 hrs
Related website: https://sites.google.com/berkeley.edu/fedyk
Digital Humanities and Data Science Social Sciences