Edward Miguel, Professor

Closed (1) Data Sharing and Citations

Applications for Spring 2018 are now closed for this project.

In this project we investigate the role that transparency of data and code play for academic publications, and seeing what incentives exist for researchers to do better science.

In academia, the stakes to publish successfully are high (''publish or perish''), and have led to various negative side effects such as publication bias, p-hacking, and even fabrication of data. In 2010, as a step towards better scientific practice, the editor of one of the top journals, The American Journal of Political Science, began to require all publications provide the data and code necessary to reproduce their claimed results. Similar policies were adopted in top Economics journals. Other leading journals (e.g., the American Political Science Review) had no such requirement.

We want to investigate how the prominence of a publication, as measured by citations, is affected by whether the authors publicly share the underlying data and code. There is already strong evidence that data sharing is correlated positively with citations, but we are interested in finding a causal effect. The change in policies for some journals and the lack thereof in others will hopefully allow us to identify this as a causal effect rather than just a correlation. We have already gathered the data for two political science journals, but we need to finish gathering the data from two economics journals (The American Economic Review and The Quarterly Journal of Economics) and do a considerable amount of data cleaning and analysis.

This project is being run by Edward Miguel (Economics Department) and Garret Christensen (Berkeley Institute for Data Science--BIDS and Berkeley Initiative for Transparency in the Social Sciences--BITSS)

For this project we are looking for students who are enthusiastic about research transparency and reproducibility and making a difference in the academic publishing system. Students will learn data collection, automation, and validation. Students will be exposed to a variety of modern research papers in political science and economics and gain a first-hand experience of their subject matter, level of reproducibility, and some of the statistical methods involved. The team will adapt existing programming scripts to clean data from additional journals, manually categorize article characteristics, automate cross-checking and validation of entered data, then visualize and also analyze the data using regression analysis.

Research will be supervised by Garret Christensen (BIDS Data Science Fellow) and Edward Miguel (Economics). Students should expect to spend approximately half of their project time working in person, and half can be done remotely. Regular team meetings will be held with supervising faculty.

Day-to-day supervisor for this project: Garret Christensen, Staff Researcher

Qualifications: Desired, but not essential are some experience with Stata or R, and having succeeded in a basic statistics or econometrics class such as Econ 140/141/144.

Weekly Hours: 9-12 hrs

Related website: http://www.bitss.org/
Related website: https://bids.berkeley.edu/