Compiling and data-wrangling the first comprehensive dataset of all security forces in Latin America and the Caribbean
Dorothy Kronick, Professor
Center for Effective Global Action
Applications for Spring 2025 are closed for this project.
The majority of people in extreme poverty live in conflict-affected or fragile countries, most of which experience primarily internal conflicts, rather than wars with another country. Latin America has experienced a number of recent civil conflicts, as well as notable drug trafficking efforts and crime waves that cause more casualties, displacement, and destabilization to the region. Understanding how communities respond to this set of threats, what kinds of policing emerges from the state and alternative security providers after conflict, in particular, and how citizens relate to those different forces is essential to fostering stability and peace in Latin America.
This project will compile the first comprehensive dataset of all security forces in Latin America and the Caribbean, including both government (police, military) and alternative (militia, rebel) forces. Specifically, researchers will put together yearly data (at the country and regional level) on all security forces in all 33 Latin American and Caribbean countries from 1970 — 2020. Despite the importance of these data for policy decisions (especially in post-conflict countries), there are no comprehensive datasets to answer basic descriptive questions about security in Latin America — for example: how many and what types of police officers operate in each country? How are they selected and trained? To whom are they accountable? Only a few of these variables have been collected in any form, and most are not captured over time. Using this new dataset, the researchers will test a variety of hypotheses about how different security forces develop and how they interact with communities and their citizens.
Role: We are looking for motivated, highly organized students with data science skills who can help our team with managing and collating georeferenced administrative data from all 33 Latin American countries on police forces since 1970. In order to be successful, URAP students should have strong data management skills; navigate uncertainty to consult multiple data sources to collate data, fill gaps, and generate a thorough analysis; and a strong grasp of set theory and logical reasoning. Prior experience working with geospatial data using vector data and Spanish/Portuguese language skills are also a big plus.
The administrative data URAP students will primarily work with describe policing forces in the region — the number of police officers employed by different forces, expenditures on policing, police officer demographics, and how these vary over time and territory. You will work closely with Dorothy Kronick (UCB, Goldman School of Public Policy) and David Dow (Naval Postgraduate School, Political Science) who are leading the cross national data analysis effort of this project. Students who join the team will be expected to:
- Cleaning and combining administrative data sets - data collection will be finalized by the time URAP students members join. However, these data are aggregated at different administrative levels. Students will help wrangle and clean data, run data quality checks, and help compute and visualize descriptive statistics at national, state, and municipal levels.
- Identifying and filling in gaps in data - Through data cleaning and wrangling, students will identify gaps resulting from missingness or low quality data. Students will work with PIs to develop tailored data acquisition strategies to fill these gaps.
- Web scraping to construct crime datasets - upon satisfactory completion of the above tasks, motivated students could help build an automated process to scrape publicly available crime data sets from government websites. There is scope for this to be the focus of student efforts in following semesters if there is a good fit among candidates and the project team.
This work will help provide the first systematic description of policing forces in Latin America, how these vary over time and space, and will be used in conjunction with other data sources to better understand how these state institutions and bureaucrats function.
Qualifications: Essential:
- Person can navigate uncertainty of consulting appropriate data sources/sets, wrangle data, fill holes, and produce analysis
- Locating and merging new data as needed (ad-hoc)
- Logical thinking skills, set theory, etc.
- Strong coding skills: R and/or Python, Tidyverse and/or Pandas, git
Desirable but not essential:
- Geospatial analysis using vector data.
- Familiarity with Spanish or Portuguese a plus
Day-to-day supervisor for this project: Jonathan Chang, Staff Researcher
Hours: 6-8 hrs
Related website: https://cega.berkeley.edu/research/post-conflict-security-structures-and-citizen-buy-in-in-latin-america/
Social Sciences Digital Humanities and Data Science