Improving & Integrating Diversity Estimates

NSF Award #1824005

Co-PI, Scott Althaus, Merriam Professor of Political Science, University of Illinois

Existing estimates of ethnic, religious, and linguistic diversity appear to be correlated with a number of socio-political and economic outcomes, including development, conflict, and social capital. But these existing diversity indices are based on group-size statistics from secondary sources, which are imprecise in systematic ways, reflecting variation in who does (and does not) enumerate identity in a national census. This systematic measurement error can hide statistical associations or create phantom ones, threatening our ability to draw valid inferences about the true causes and consequences of diversity.

As a solution, I update estimates of group-size and re-calculate diversity using self-identification in cross-country, multi-wave surveys, which I argue are less prone to systematic sampling constraints and less affected by response bias and enumerator error than government statistics. At the same time, I take seriously the possibility of systematic measurement error in the survey data and use a novel database of survey-design characteristics (a "Survey of Surveys"), a large set of high-quality census results (a "Census of Censuses"), and machine learning algorithms to identify which survey features are prone to the most error. To triangulate between surveys and censuses, and to compare these with existing diversity metrics, I create a system for linking their different ontologies (a "cross-walk"), identifying synonyms and nesting structures among tens of thousands of unique categories.

Based on the self-identification of over 13.7 million respondents across 180 countries between 1973-2020, my updated estimates of diversity are significantly different from existing measures in an overwhelming majority of cases. Most importantly, they upend well-established correlations between diversity and politics found in the existing literature, indicating that these are likely the result of systematic measurement error. In total, the project challenges us to reconsider the ways that diversity truly impacts politics and economics.

Find the full project description here. More information about the NSF award available here.

Read a working version of the paper on ethnic diversity here. The highlights are listed below.

The problem

Existing estimates of diversity -- especially ethnic diversity -- appear to be significantly correlated with key political and economic outcomes. But the existing indices are based on statistics from secondary sources, and these are often incredibly imprecise.

Worse yet, the imprecision appears to be systematic, reflecting whether or not a country enumerates identity in a national census. And ethnic enumeration is itself correlated with the political and economic outcomes thought to be impacted by diversity.

Such systematic measurement error -- correlated with the dependent variable -- is capable of hiding true statistical associations or creating phantom ones. In short, it threatens our ability to make causal inferences.

Click on any of the items below to learn more.

1. Evidence of Imprecision

Diversity is typically defined as an inverse Herfindahl index -- i.e., one minus the sum of squared size of each group living in a given society -- so calculating diversity requires statistics on the relative size of groups. The earliest indices of ethnic diversity relied on estimates of group-size from the Atlas Naradov Mira, produced by a team of Soviet ethnographers in 1964. Significant updates were made in 2003 with indices published by Alesina et al. and Fearon, both of which relied on group-size estimates from a similar set of secondary sources -- mainly Encyclopedia Britannica’s “Book of the Year” and the CIA’s “World Factbook.”

These secondary sources often list imprecise statistics. Close to half (43.9%) are rounded to the nearest whole number, and -- among the whole numbers reported -- there is considerable "heaping" on numbers ending in 0 and 5. This level of imprecision is not surprising once we recognize how few of these statistics reflect an official, national head-count. Using a new, extensive set of census results, and with utmost generosity, I am able to link barely a third of the statistics to a national census, and some of these censuses were conducted decades before the construction of the diversity indices.

2. Systematic Measurement Error

If not based on an official head-count, group-size statistics are likely based on an unofficial metric: either an estimate from an (unnamed) country expert or, at worst, perhaps an outright guess. And the secondary sources rely heavily on these unofficial metrics because enumeration of identity in national censuses is relatively rare.

If the availability of census data across countries is non-random, then there is more measurement error in the existing estimates of diversity in countries that opt not to enumerate. If the practice of enumeration is correlated with the latent variable -- i.e., diversity -- other variables in models of diversity (e.g., economic growth, social trust, conflict), or errors in measures of these variables, then this measurement error is systematic and threatens the validity of our statistical inferences.

In the case of ethnic enumeration, I uncover some concerning patterns. Countries that have enumerated ethnicity at least once (1900-2020) appear significantly more diverse; they also experience less economic growth, host fewer survey respondents who trust “most people,” and have witnessed the onset of fewer civil wars. It would appear, therefore, that the quality of existing data on ethnicity are indeed correlated with key variables of interest.

My solution

To identify and correct for measurement error in estimates of group-size, it is essential that data sources overlap, generating estimates for multiple cases. Better still if the same source generates multiple estimates for the same case at a similar point in time.

Toward this end, I propose to use self-identification in cross-national, multi-wave surveys to re-estimate the size of groups, linking these to gold-standard censuses and a new database of survey characteristics to identify how design features bias how we measure ethnic, religious, and linguistic identity.

This process entails a number of distinct steps. Click on any to read more.

1. Self-identification in Cross-National, Multi-Wave Surveys

Since the 1980s, the number of cross-national survey projects has grown, as has their coverage and quality. When administered in multiple waves, these surveys allow me to triangulate across sources to identify and correct for systematic measurement error.

As of writing, among all cross-national, multi-wave survey projects, thirty ask respondents to self-identify in terms of ethnicity (E), religion (R), and/or language (L). They are:

Afrobarometer (ERL)
AmericasBarometer (ERL)
Arab Barometer (ERL)
AsiaBarometer (R)
Asian Barometer, including both the East Asian and South Asian Barometers (ERL)
Candidate Country Eurobarometer (RL)
Caucasus Barometer (ERL)
Central and Eastern Eurobarometer (ERL)
Central Asia Barometer (ERL)
Comparative National Elections Project (ERL)
Comparative Study of Electoral Systems (ERL)
Demographic and Health Survey (ERL)
East Asian Social Survey (R)
Eurobarometer (RL)
European Election Studies' Voter Study (R)
European Social Survey (ERL)
European Values Study (R)
Generations and Gender Survey (ERL)
International Social Justice Project (ER)
International Social Survey Programme (ERL)
Latinobarometer (ERL)
Life in Transition Survey (ERL)
Multiple Indicator Cluster Survey (ERL)
New Baltic Barometer (ERL)
New Europe Barometer (R)
Pew Global Attitudes Survey (ERL)
Political Action Panel (ER)
Post-Communist Publics (EL)
Voice of the People (R)
World Values Survey (ERL)

In total, I have information about the self-identification of over 13.7 million individuals across 180 countries between 1962 and 2019. This includes information about the ethnicity of over 5.6 million respondents across 161 countries (1973-2020). In this case, the mean (median) country in the dataset is surveyed 11.7 (8) times, by 3.1 (3) different survey-projects over 17.1 (17) years. Together, this translates into data on the ethnic self-identification of just under 35,000 (just over 21,750) respondents. Only 23 countries (14.3%) are surveyed just once, and 43 (26.7%) are surveyed by just one survey project.

Although survey coverage is extensive (see below), availability of data on ethnicity is uncorrelated with diversity or its purported covariates. As such, the survey data are less likely to generate systematic measurement error.

2. Variation in Survey Design and Quality

Self-identification in each survey can be used to estimate the relative size of each included ethnic, religious, and linguistic group. These, in turn, can be used to calculate a sample fractionalization index, estimating the level of diversity in a given country-year. In the case of ethnic diversity, across surveys, there tends to be some disagreement about the level of diversity in a given country. In a small number of countries, the survey-based estimates differ more dramatically.

I suggest that any systematic within-survey measurement error reflects how each survey is conducted. Surveys differ on whether (and how) they ask respondents to self-identify in terms of ethnicity, race, or tribe. But the surveys also have different ways of creating their samples and use different modes of interviewing respondents.

To identify and correct for sources of systematic error, I sift through methodological reports and questionnaires for each survey, coding information about the design of each sample, the methods used in the survey as a whole, and how questions about identity are asked and answered. The result is a "Survey of Surveys" which uncovers significant variation in how surveys approach the enumeration of identity.

3. Comparing Survey-Based Estimates and Census-Based Statistics

To assess how survey design impacts estimates of group-size, I need a set of unbiased reference points. Because the existing diversity indices suffer from systematic measurement error, neither of these are appropriate. I also avoid survey-averages recognizing that some survey designs could produce biased estimates. If these flawed designs are sufficient popular, they would also bias the average.

In their place, I use diversity as calculated from high-quality, "gold standard" censuses, where these exist. To identify these, I construct a "Census of Censuses," identifying whether and how censuses were conducted in more than 250 countries between 1900-2021. If a census took place, I attempt to find a copy of the questionnaire and -- if questions about identity were asked -- to note how these were asked and answered. Then, I collect the most disaggregated results, as well as information about the quality of the census, noting cases of under- and over-enumeration, those known to exclude a particular sub-population, or those noted for significant errors. In this work, I have been supported by the incredible Inter-Library Loan team at the University of Illinois Library

See below for an illustration of census-survey matches. There is also considerable variation in how (ethnic) identity is enumerated, also illustrated below.

4. Linking Ontologies

In order to triangulate between surveys and high-quality censuses, it is essential to unify their ontologies, i.e., the list of groups they include. Preliminary work on ethnicity finds tremendous variation in the number of groups listed in each source. Because of the way that fractionalization is calculated, these differences are likely to impact diversity estimates.

Unifying ontologies sometimes calls for one-to-matches, between different spellings of, or different names for, the same group (e.g., "White, non-Hispanic" and "Caucasian"). In other cases, sub-groups are nested within broader categories (e.g., "Japanese" and "Filipino" within the group "Asian"). In the case of ethnicity, there are over 10,000 distinct categories in the triangulated surveys and censuses, but machine-assisted merging struggles with this task since matching and nesting structures tend to be specific to each country-case.

Over the course of the project, a team of human coders has been diligently working to research the structure of identity around the world. Many do this work as part of a Political Science undergraduate research internship, in collaboration with the Cline Center for Advanced Social Research. When making their merging decisions, they cite the source(s) they use; and each decision is reviewed by at least three unique coders.

The result is a "cross-walk" of ontologies. In addition to linking categories from surveys and censuses, the team is also working to connect these to a range of existing datasets on ethnicity, religion, and language, with the goal of making it easier for scholars to make use of all of these different datasets. These include Ethnic Power Relations (EPR), Minorities at Risk (MAR), the Composition of Religious and Ethnic Groups (CREG), the World Religion Project, and Ethnologue. By casting a wide net and integrating all of these ontologies, the cross-walk also serves as a dictionary of ethnic, religious, and linguistic groups worldwide.

5. Systematic Measurement Error in Surveys

I model systematic measurement error in the surveys' diversity estimates as a function of their design characteristics, using gold-standard census data as a baseline of comparison. The results are used to calculate weights for each survey, based on its design features. Critically, they can also inform a set of best practices when it comes to measuring ethnic, religious, and linguistic identity in future surveys.

The models take in account how survey-samples were designed, how the surveys were administered, and how questions about identity were asked and answered. I also consider whether survey quality varies across survey-projects or, within projects, across waves. Although I am careful to identify (and exclude) any problematic census results, I also include covariates of census quality.

This approach rests on three critical assumptions:

Systematic measurement error in the surveys is a function of how they were designed and administered;
Gold-standard censuses provide unbiased reference points; and
Comparison of surveys and censuses in places where censuses exist can be applied to other countries, where no census data are collected.

In case the factors that predict census enumeration also impact how survey designs affect self-identification, I also include predictors of enumeration into the model. In the case of ethnic enumeration, economic growth, civil conflict, and social trust are all predictive, and there are important differences across regions. I interact each of these with every design feature, assessing whether the feature has a differential impact on data quality depending on a country's propensity to enumerate (ethnic) identity.

Because the number of covariates is sizable, I opt against a Bayesian approach. Instead, I estimate random forest regressions that more efficiently identify which survey design feature(s) contribute to systematic measurement error.

6. Supporting Transparency and Replication

Designed and hosted by the Cline Center for Advanced Social Research, a web portal (under construction) provides access to datasets and code, as well as supporting customized data manipulation and visualization. Designed for scholars, policy-makers, and members of the public in mind, this user-driven portal should help facilitate a better understanding of diversity worldwide.

Preliminary results

Beginning with ethnic diversity, pairwise comparisons between the survey-based estimates and existing indices find statistically significant differences in an overwhelming majority of cases (>85%). Most of these differences are not driven by ontological differences -- the surveys produce updated, less-biased estimates of ethnic group-size.

Moreover, these updated estimates of diversity upend the well-established association between diversity and economic growth. New estimates of Easterly and Levine's (1997) models indicate that the original result was likely driven by the association between GDP and errors in measures of diversity.

1. Comparison with Existing Estimates of Diversity

Until the compensatory weights are calculated and applied to each survey, I run some preliminary analysis using the simple average of surveys for each country, recognizing that these may still reflect some systematic measurement error.

The first set of analyses compare my survey-based diversity estimates with the existing indices calculated by Alesina et al. and Fearon. The survey-based indicates are less strongly correlated with the existing indices (r = 0.752 and 0.754, respectively) than they are to one another (0.930). The correlations are weaker when the existing indices are based on rounded numbers and stronger when the indices use group-sizes that are based on a census. Bivariate scatterplots with fitted LOWESS lines indicate more survey-based diversity in countries previously identified as homogenous and less diversity in those previously identified as heterogeneous.

Because the survey-based indices are sample statistics, I calculate confidence intervals around my estimates. Using these, I find that my survey-based indices are statistically different from the existing ones in over 85% of cases. When I force the survey data into Alesina et al.'s and Fearon's ontologies, I continue to find significant differences in over 65% of cases. This indicates that the survey-based diversity metrics differ from the existing ones largely because the surveys produce different estimates of ethnic group-size, not just because they include different ethnic groups.

2. Models of Ethnic Diversity and Economic Growth

As a preliminary exercise, I use the survey-based estimates of diversity to re-examine the long-standing correlation between ethnic diversity and economic growth, first published by Easterly and Levine in 1997 using diversity calculated by Taylor and Hudson (1972) based on group-size estimates from the Atlas Naradov Mira. The significant negative correlation was confirmed by Alesina et al. using their updated diversity indices.

Because the quality of information about ethnic group-size is correlated with economic development, the apparent relationship between diversity and growth may be driven by systematic measurement error. Since survey availability is uncorrelated with development, the survey data can be used to identify the "true" association between diversity and growth.

Substituting my survey-based diversity estimates in place of the existing indices, I find that the association with growth weakens to the point that it is no longer significant. The new result is closest to the original, where diversity was measured using the Soviet Atlas rather than the secondary sources. This would indicate that the measurement error in the Atlas is less severe or less correlated with variation in development. Across the different models, the coefficients on income and income-squared seem to be the most affected by the inclusion of the survey-based diversity metric.

I confirm that the null effect is not driven by ontological differences between the surveys and the existing indices. I do so by forcing the survey responses into the list of ethnic categories reported by both Alesina et al. and Fearon, confirming that the coefficient on diversity remains statistically insignificant.