Joshua Blumenstock, Professor

Closed (1) Using fair machine learning to measure the welfare impacts of mobile banking in Africa

Closed. This professor is continuing with Fall 2020 apprentices on this project; no new apprentices needed for Spring 2021.

Access to credit can be an important lifeline, but it can also lock people into debt traps. Mobile banking technologies have completely transformed the ways in which people, especially in the developing world, access credit, raising new questions about fairness and welfare in lending.

The goal of this project is to learn about the mechanisms, contexts and policies through which access to credit can improve welfare. Using data on millions of loans and loan applications done using mobile devices in Kenya (and potentially other populations), we will try to learn patterns and factors leading to improvements in well-being.

This project will therefore seek to answer a number of different but related questions:
1. What is well-being in this context? How can it be quantified?
2. What is harm and how can it be quantified?
3. What is fair in this context and can fairness be quantified?
4. From a descriptive standpoint, what factors predict increases in well-being?
5. What policies will properly balance profit and welfare?

To answer these, we will have to draw on modern research in fair algorithms, machine learning, causal inference and optimization. It is likely a large portion of this research will entail the derivation of new inferential methods.


The undergraduate apprentice will have the opportunity to play an important role in real research from the very start of a new project. The beginning stage of this research will require deep exploration of the data; the apprentice should expect to spend time not simply cleaning data and outputting summary statistics, but deriving insights from the patterns they uncover.

Depending on the student’s interest, they may also participate in a thorough literature review of the field of algorithmic fairness, and help develop a framework for measuring fair lending.

In this project, the apprentice should expect to complete well-defined research tasks assigned to them in a timely manner. That being said, the hope is also that the student will propose insights and lines of inquiry of their own.


Day-to-day supervisor for this project: Jacqueline Mauro, Post-Doc

Qualifications: ● Proficiency in R or Python (required) ● Experience with large datasets (required) ● Ability to extract relevant information from data and present it clearly, especially through visualization (required) ● Knowledge of East African politics and society (preferred) ● Good understanding of basics of banks, finance and credit (preferred)

Weekly Hours: to be negotiated
Related website: https://sites.google.com/view/jacquelinemauro/home

Closed (2) Using large-scale data to measure the social and economic effects of COVID-19 in Afghanistan

Closed. This professor is continuing with Fall 2020 apprentices on this project; no new apprentices needed for Spring 2021.

The goal of this project is to understand the effects of COVID-19 in Afghanistan. The impact of COVID-19 has been felt all over the world. In Afghanistan’s case, this crisis comes on top of decades of violence and conflict. Many regions lack the means to respond effectively to this crisis, such as good governance, adequate healthcare facilities, or the ability to extend economic assistance to residents. In many areas, fighting has continued.

In this project, we would like to measure and understand the social and economic effects of COVID-19 in Afghanistan. These could include changes in mobility, displacement and migration, social ties, violence, as well as economic impacts. We have access to terabytes of data from Afghanistan's largest mobile phone operator, in the form of anonymized call records. This unique data set enables us to infer locations of individuals, as well as a wealth of other information, such as social networks. Other than mobile phone metadata, we will explore data from other sources, such as satellites, open street maps, financial institutions, censuses, household surveys.

With these data, we are uniquely positioned to be able to answer important and pressing questions, such as:
1. How did mobility respond to lockdown measures instituted?
2. Was there any kind of “mass exodus” from major cities, or unusual migrant flows?
3. Did patterns of cellphone usage change as a response to COVID-19? Are there signs of communities fraying or strengthening?
4. Do any of these responses differ in areas under Taliban control?
5. How have patterns of violence, or people’s response to this violence changed?


The undergraduate apprentice would have the opportunity to shape the research from the start of this project. This includes brainstorming ideas on what questions can and cannot be answered, exploring the data to extract insights, and identifying promising directions.

Depending on the student’s interest, there may be the opportunity to explore methods related to machine learning, causal inference, etc., as well as to develop new estimation frameworks. The undergraduate would be expected to be able to complete well-defined research tasks, but will also have the freedom to propose and pursue promising directions of their own interest.

Day-to-day supervisor for this project: Xiao Hui Tai, Post-Doc

Qualifications: All URAP apprentices, irrespective of the specific role or assignment, are expected to be extremely self-motivated, attentive to detail, and meticulous in their approach to data analysis. Apprentices must be able to work independently and be excited to take responsibility and initiative for ensuring their work meets exacting standards of quality. Specific qualifications for this position include: - Proficiency in R or Python (required) - Experience manipulating large data sets (required) - Ability to derive insights from data and communicate these clearly (required) - Knowledge of statistical/econometric methods (preferred) - Proficiency with visualization tools (preferred) - Knowledge of Afghanistan society (preferred)

Weekly Hours: to be negotiated

Off-Campus Research Site: Zoom (during covid)

Related website: http://www.jblumenstock.com
Related website: http://didl.berkeley.edu

Closed (3) Using Data Science to Improve COVID-19 Response in Developing Countries: Data Visualization

Closed. This professor is continuing with Fall 2020 apprentices on this project; no new apprentices needed for Spring 2021.

The Data-Intensive Development Lab at UC Berkeley (didl.berkeley.edu) is providing data science support to the governments of several Low and Middle Income Countries, as well as humanitarian organizations like GiveDirectly, who are doing their best to effectively respond to the evolving humanitarian crisis caused by the COVID-19 pandemic. So far, we have provided technical support to governments in Afghanistan, Bangladesh, Ghana, Mali, Mexico, Nigeria, Senegal, Sudan, Togo, and Uganda.

The main focus of our current work is on using data science methods (machine learning, computer science, statistics, econometrics, interactive data visualization) to help governments get emergency cash aid to those who need it most. You can read a short article by Prof. Blumenstock about these efforts here: https://www.nature.com/articles/d41586-020-01393-7

Practically, this means we spend a lot of time crunching through large-scale datasets from satellites, mobile phone networks, open street maps, financial institutions, censuses, household surveys, and whatever else is available. Some examples of academic papers related to this work are linked below:
- http://jblumenstock.com/files/papers/jblumenstock_2015_science.pdf
- http://jblumenstock.com/files/papers/jblumenstock_ultra-poor.pdf
- https://web.stanford.edu/~mburke/papers/yeh_et_al_2020.pdf



Interactive and static data visualization / Data Journalism

Apprentices will design and build interactive and static data visualizations that are eye-catching, beautiful, and that effectively communicate complex data analysis to a broad audience. We are looking for students with passion and experience in dataviz. Over the course of the semester, you would be expected to produce outputs along the lines of the following:
- https://www.nytimes.com/interactive/2018/12/10/business/location-data-privacy-apps.html
- https://devseed.com/covid-india-story/
- http://snip.ly/pJsZ#http://www.puffpuffproject.com/languages.html
- https://www.bloomberg.com/graphics/2015-whats-warming-the-world/

Day-to-day supervisor for this project: Emily Aiken, Graduate Student

Qualifications: All URAP apprentices, irrespective of the specific role or assignment, are expected to be extremely self-motivated, attentive to detail, and meticulous in their approach to data analysis. Apprentices must be able to work independently and be excited to take responsibility and initiative for ensuring their work meets exacting standards of quality. In your cover letter, please clearly describe your relevant experience, and how you meet the specific qualifications described below. Please include links to your portfolio or examples of work/code that you are proud of and (ideally) that are relevant to the position. QUALIFICATIONS ● Proficiency in one or more programming languages (required) ● Demonstrated ability to build compelling data visualizations (required - send us examples!) ● Prior experience building interactive data visualizations (preferred)

Weekly Hours: to be negotiated

Off-Campus Research Site: Zoom (during covid)

Related website: http://didl.berkeley.edu
Related website: http://jblumenstock.com

Closed (4) Using Data Science to Improve COVID-19 Response in Developing Countries: Data Engineering and Analytics

Closed. This professor is continuing with Fall 2020 apprentices on this project; no new apprentices needed for Spring 2021.

The Data-Intensive Development Lab at UC Berkeley (didl.berkeley.edu) is providing data science support to the governments of several Low and Middle Income Countries, as well as humanitarian organizations like GiveDirectly, who are doing their best to effectively respond to the evolving humanitarian crisis caused by the COVID-19 pandemic. So far, we have provided technical support to governments in Bangladesh, Ghana, Mali, Mexico, Nigeria, Senegal, Sudan, Togo, and Uganda.

The main focus of our current work is on using data science methods (machine learning, computer science, statistics, econometrics, interactive data visualization) to help governments get emergency cash aid to those who need it most. You can read a short article by Prof. Blumenstock about these efforts here: https://www.nature.com/articles/d41586-020-01393-7

Practically, this means we spend a lot of time crunching through large-scale datasets from satellites, mobile phone networks, open street maps, financial institutions, censuses, household surveys, and whatever else is available. Some examples of academic papers related to this work are linked below:
- http://jblumenstock.com/files/papers/jblumenstock_2015_science.pdf
- http://jblumenstock.com/files/papers/jblumenstock_ultra-poor.pdf
- https://web.stanford.edu/~mburke/papers/yeh_et_al_2020.pdf


Data Engineering and Analytics

Apprentices will build out scalable infrastructure for storing, processing, and analyzing large datasets, including datasets processed from satellite imagery, mobile phone networks, and other digital traces. This work will contribute directly to research and policy response undertaken by the lab. Day-to-day tasks may including writing efficient algorithms for data processing, cleaning, and analysis; as well as designing and designing scalable and parallelizable workflows for engineering meaningful features from unstructured and semi-structured data.

Day-to-day supervisor for this project: Shikhar Mehra, Staff Researcher

Qualifications: All URAP apprentices, irrespective of the specific role or assignment, are expected to be extremely self-motivated, attentive to detail, and meticulous in their approach to data analysis. Apprentices must be able to work independently and be excited to take responsibility and initiative for ensuring their work meets exacting standards of quality. In your cover letter, please clearly describe your relevant experience, and how you meet the specific qualifications described below. Please include links to your portfolio or examples of work/code that you are proud of and (ideally) that are relevant to the position. QUALIFICATIONS ● Proficiency in Python, including Pandas and PySpark (required) ● Prior experience with wrangling large datasets (required) ● Comfort with Linux (required) ● Experience with other data engineering and machine learning frameworks (e.g. Dask, Scikit-learn, H2O, Hadoop) (preferred)

Weekly Hours: to be negotiated

Off-Campus Research Site: Zoom

Related website: http://www.jblumenstock.com
Related website: http://didl.berkeley.edu