Edward Miguel, Professor

Closed (1) Data Sharing and Citations

Applications for Fall 2017 are now closed for this project.

Science makes our world better, so let's make science better. In this project we investigate the role that transparency of data and code play for academic publications.

In academia, the stakes to publish successfully are high ("publish or perish"), and have led to various negative side effects such as publication bias, p-hacking, and even fabrication of data. In 2010, as a step towards better scientific practice, the editor of one of the top journals, The American Journal of Political Science, began to require all publications provide the data and code necessary to reproduce their claimed results. Other leading journals (American Political Science Review) had no such requirement.

We want to investigate how the prominence of a publication, as measured by citations, is affected by whether it provides its underlying data and code. There is already strong evidence that data sharing is correlated positively with citations, but we are interested in finding a causal effect. The change in policies for some journals and the lack thereof in others will hopefully allow us to identify this causal effect. We have already gathered the data for two political science journals, but we need to gather the data from two economics journals (The American Economic Review and The Quarterly Journal of Economics) and do a considerable amount of data cleaning and analysis.

This project is being run by Edward Miguel (Economics Department) and Garret Christensen (Berkeley Institute for Data Science--BIDS and Berkeley Initiative for Transparency in the Social Sciences--BITSS)

For this project we are looking for students who are enthusiastic about research transparency and reproducibility and making a difference in the academic publishing system. A main part of the task will be adapting existing programming scripts to scrape data from additional journals, then collecting the data and categorizing academic papers according to how accessible the data and code are. Students will learn data collection, automation, and validation. Students will be exposed to a variety of modern research papers in political science and economics and gain a first-hand experience of their subject matter, level of reproducibility, and some of the statistical methods involved.

Research will be supervised by Garret Christensen (BIDS Data Science Fellow) and Edward Miguel (Economics). Students should expect to spend approximately half of their project time working in person, and half can be done remotely. Regular team meetings will be held with supervising faculty.

Day-to-day supervisor for this project: Garret Christensen, Staff Researcher

Qualifications: We are looking to add someone with Python skills to the project to help complete our automated data collection. Desired, but not essential are some experience with Stata or R, and having succeeded in a basic statistics or econometrics class such as Econ 140/141/144.

Weekly Hours: 9-12 hrs

Related website: http://www.bitss.org/
Related website: https://bids.berkeley.edu/