Anastassia Fedyk, Professor

Closed (1) Big Data for Global Employment Dynamics

Applications for fall 2021 are now closed for this project.

This is an opportunity to work with a large dataset of over 400 million employment profiles (resumes) in order to understand global employment dynamics and firm performance. In this project, we will leverage techniques from big data and machine learning to structure and analyze large textual data, which can help address questions ranging from individual career outcomes to understanding firm performance to automation and future of work. What are the effects of employment shocks such as the collapse of Lehman Brothers? Which firms are investing in new technologies such as AI, and what are the consequences of these invwstments? Which skillsets are booming and which are becoming obsolete in the modern economy? In this project, students will be able to practice the types of techniques needed in order to be able to work with large datasets (on the order of 100s of gigabytes).

Key tasks will include:
- Working with textual employment data (similar to LinkedIn profiles) to extract and understand job titles, skills, and demographics.
- Using a range of statistical tools to analyze the relationship between employment metrics and firm outcomes.

Students who work on this project will increase their knowledge of:
- Working efficiently with real-world large datasets in the hundreds of gigabytes
- Using econometric and machine learning techniques
- Understanding and conducting scientific research processes

Qualifications: Required: - Experience with at least one of: Python, C/C++, R, Matlab, in a comprehensive way (i.e., WITHOUT excessive reliance on specific packages such as Pandas); - Foundational computer science courses (Data Structures & Algorithms); - Experience writing object oriented, efficient, modular code, NOT just Jupyter notebooks. Preferred: - Foundational Statistics and/or Machine Learning courses; - Practical experience with Machine Learning and/or Econometric analysis. Must be willing to put in 10 hours/week every week, with no exception.

Weekly Hours: 9-11 hrs

Related website: https://sites.google.com/berkeley.edu/fedyk

Closed (2) Global Information Flows and Financial Markets

Closed. This professor is continuing with Spring 2021 apprentices on this project; no new apprentices needed for Fall 2021.

This project introduces students to the increasingly complex landscape of financial news. With millions of articles published each day, investors' task of understanding and trading on relevant information becomes ever more challenging. Which news stories are relevant and market moving? How can traders tell new information from old news? In this project, students will work with hundreds of gigabytes of textual data directly from premier news providers such as Dow Jones in order to understand how investors' cognitive limitations affect the link between news and stock price dynamics.

Key tasks will include:
- Applying natural language processing techniques on large textual datasets
- Linking measures based on financial news to stock price dynamics, including high-frequency trading data

Students involved in this project will increase their knowledge of:
- The structure of financial news
- Working efficiently with real-world large datasets in the hundreds of gigabytes
- The scientific research process

Qualifications: Required: - Experience working with textual data; - Experience with at least one of: Python, C/C++, R, Matlab; - Foundational computer science courses (Data Structures &Algorithms); - Experience writing object oriented, efficient, modular code, beyond IPython/Jupyter. Preferred: - Practical experience with Machine Learning and/or Econometric analysis. Must be willing to put in 10 hours/week every week, no exceptions.

Weekly Hours: 9-11 hrs

Related website: https://sites.google.com/berkeley.edu/fedyk

Closed (3) Research Grant Writing: Global Employment Dynamics and Financial Information Flows

Applications for fall 2021 are now closed for this project.

This project addresses a uniquely important part of research: writing grant applications. The project is closely linked to two of Professor Fedyk's other projects -- Global Employment Dynamics;Information Flows in Financial Markets.These two projects rely heavily on large datasets and computing power. The goal of the grant writing project is to obtain the resources necessary to enable the other projects to proceed. This is a unique opportunity to get deep into the intuition and the results of the big data projects without needing to have coding expertise or background. In this project, students will have an opportunity to work in a creative, collaborative environment, learning about interesting data and observing the entire research process from start to finish.

Key tasks will include:
- Researching available grants from organizations such as the National Science Foundation.
- Reading materials about Professor Fedyk's research projects and talking through the projects with others.
- Writing grant applications based on the project descriptions.

Students involved in this project will perfect their skills in:
- Effective communication, both written and verbal.
- Understanding of the scientific research process from start to finish.

Great for those interested in a career in research or technical program/product management.

Qualifications: Required: - Excellent communication skills; - Highly organized; - Detail-oriented; - Interest in big data. Must be willing to put in 10 hours per week.

Weekly Hours: 9-11 hrs

Related website: https://sites.google.com/berkeley.edu/fedyk

Closed (4) Big Data Preparation: Global Employment Dynamics

Applications for fall 2021 are now closed for this project.

This project aims to measure and improve the quality of large data sets used for economic analysis. Today, very large data sets are often at the heart of many social science research questions. However, those data sets can be plagued by big data problems: missing data, bad data, duplicate data.

Specifically, students will learn and practice all aspects of data analysis and preparation required before statistical methods (e.g. Machine Learning/Artificial Intelligence) can be applied to social science questions. This will include understanding data, validating data, running summary stats, annotating data, linking multiple data sets, and visualizing data in useful ways to communicate ideas.

Key tasks will include:
- Sampling records from billions of global employment and company records to estimate the extent of data problems/veracity.
- Identifying additional/alternative data sets that could help to answer key research questions.
- Analyzing/annotating data sets in preparation for the application of Machine Learning models.
- Visualizing key data statistics to convey the completeness/deficiencies of the data sets—whether they are "fit for purpose".

Students involved in this project will perfect their skills in:
- Understanding big data in the context of real world questions.
- Data validation/annotation/visualisation.
- Effective communication, both written and verbal.

Qualifications: Required: - Highly organized; - Detail oriented; - Interested in big data; - Able to put in a minimum of 10 hours/week every week.

Weekly Hours: 9-11 hrs

Related website: https://sites.google.com/berkeley.edu/fedyk