Unified Democracy Scores Introduction Democracy Scores Other Estimates Replication Example Analysis Contact

Example Analysis

One of the chief advantages of the UDS over existing democracy scales is that they accompany each democracy rating with a quantitative estimate of measurement uncertainty. The probability model that drives the UDS does not simply produce point estimates of democracy levels across space and time, but generates full posterior distributions for each rating. Unfortunately, relatively few social scientists are familiar with how to use samples from these posterior distributions in subsequent analyses. In what follows, we walk users through a brief tutorial that demonstrates just such an analysis (download this tutorial, including data and .do files). This example uses the original UDS release from 2010.


In a recent article Gretchen Casper and Claudia Tufis demonstrate that existing democracy measures, although highly correlated, produce different results from one another when used to test a simple model of democratization. We're going to replicate part of Casper and Tufis' (2003) analysis and extend it by refitting the democratization model, using the UDS in place of a traditional democracy measure. In the process, we'll show you how to work with the UDS posterior samples on this site and how to take measurement error in the UDS into account when using them in subsequent analyses. We're going to perform the entire analysis in Stata 10, but the process is similar using other statistical software.


Download Casper and Tufis' (2003) replication dataset (local copy) from Casper's website. If you wish to replicate their entire analysis, you can also download and run the stata dofile (local copy). For this tutorial, we're just going to replicate columns 5, 6, and 7 of Table 1 on page 5. Casper and Tufis' democratization model uses a variety of lagged economic indicators, education measures, and political institution variables to predict a nation's democracy level in a given year. They use a straightforward statistical approach, running a linear regression with panel corrected standard errors. Table 1 duplicates columns 5, 6, and 7 in Casper and Tufis' table, and displays the results of fitting this model using Polity IV, Vanhanen's Polyarchy 1.2 dataset, and Freedom House's democracy measure as the dependent variable, for the 1975-1992 period. Bold numbers indicate that coefficients are statistically significant at the 0.05 level.

Table 1: Casper and Tufis (2003) Table 1, Columns 5, 6, and 7.
PolityVanhanenFreedom House
GDP pc, logged 3.372 6.549 2.236
Real GDP pc growth-0.023-0.046-0.010
Inflation-0.002 0.016-0.003
Primary Education-0.102-0.236-0.038
Secondary Education 0.055 0.159 0.030
Presidential 0.508 0.011 0.607
Parliamentary 2.059 2.924 0.768
Party Fractionalization 3.598 7.424 1.864

Performing this analysis in Stata is straightforward. After loading the replication dataset, limit the data to the 1975-1992 period. Then run the panel corrected linear regression, confining the analysis to observations where all the three measures provide scores, using the xtpcse command, after prepping the data with the tsset command. The code to perform these operations is displayed below. Note that the independent variables are all lagged one year, using the L1 operator. If you're doing everything right your output will match the results in Table 1.

      *** Get rid of pre-75 observations
      drop if year < 1975

      *** Prep for panel analysis
      tsset id year, yearly

      *** Column 5: dv = polity
      xtpcse polityiv L1.pcaplog L1.rgdppcgr L1.open L1.cpi L1.prime
        L1.second L1.presiden L1.parliamn L1.bksfrac if dv1==1 & dv2==1 & dv3==1,
        pairwise c(a)

      *** Column 6: dv = vanhanen
      xtpcse poly12 L1.pcaplog L1.rgdppcgr L1.open L1.cpi L1.prime
        L1.second L1.presiden L1.parliamn L1.bksfrac if dv1==1 & dv2==1 & dv3==1,
        pairwise c(a)

      *** Column 7: dv = freedom house
      xtpcse fhscore L1.pcaplog L1.rgdppcgr L1.open L1.cpi L1.prime
        L1.second L1.presiden L1.parliamn L1.bksfrac if dv1==1 & dv2==1 & dv3==1,
        pairwise c(a)


Now that we've replicated part of Casper and Tufis' analysis and walked through the basic Stata commands used to fit the democratization model, we're ready to merge the UDS into the dataset and refit the model, taking measurement error into account. Both the UDS and Casper and Tufis' replication dataset use COW country codes, making it easy to merge the data. First, clear out your stata environment, make sure you've allocated a reasonable amount of memory, and load the UDS 1000-draw sample.

      *** Load the UDS
      set mem 500M
      insheet using "uds_1000.csv"

Next, merge in the Casper and Tufis dataset, eliminate observations before 1975 or after 1992 and drop cases for which all democracy measures do not provide scores.

      *** Merge datasets
      gen id = cowcode
      sort id year
      merge id year using "PA_DTA_file.dta"

      *** Drop unused observations
      drop if year < 1975 | year > 1992
      duplicates drop id year, force /*corrects problem in the initial UDS release*/

There are (at least) two different ways we can incorporate the UDS into the analysis at this point. One option is to treat the UDS as simple point estimates, just as we treat Freedom House, Polity, and the Vanhanen scores. This approach is straightforward and easy. It also has potential advantages over using any single-rater democracy score in that it represents a compromise between a wide array of measures from experts across the field. The first column of Table 2 displays the results of this approach, which treat the mean of the UDS' posterior densities as the dependent variable in the Casper and Tufis model. This approach, which is demonstrated in the code listing following this paragraph, generates results that differ slightly from any of the columns in Table 1. Nonetheless, they provide few surprises; coefficient directions are consistent with Table 1 and every statistically significant coefficient in the UDS model is significant in at least one of the models in Table 1.

      *** Prep for panel analysis
      tsset id year, yearly

      *** Run the democratization model with UDS point estimates
      xtpcse mean L1.pcaplog L1.rgdppcgr L1.open L1.cpi L1.prime
        L1.second L1.presiden L1.parliamn L1.bksfrac if dv1==1 & dv2==1 & dv3==1,
        pairwise c(a)

Treating the UDS as point estimates is simple, but potentially misleading. A major contribution of the UDS is that they, unlike most other available measures, provide the analyst with quantitative estimates of uncertainty. A Unified Democracy Score for a given country is not represented simply by a single number but by a posterior density. The UDS do not purport to provide infallible democracy judgments but rather acknowledge the impact of measurement error, providing ratings in terms of probability distributions.

Table 2: UDS Results
UDS MeanUD 1000 Sample
GDP pc, logged 0.511 0.416
Real GDP pc growth-0.004-0.003
Primary Education-0.020-0.026
Secondary Education 0.005 0.012
Presidential 0.123 0.165
Parliamentary 0.2910.448
Party Fractionalization 0.508 0.989

The current context highlights the importance of measurement confidence. While a point-estimate approach to incorporating the UDS into the current analysis lends support to the importance of both trade openness and presidentialism in predicting democracy level, the fact that only one out of the three single-rater measures support these claims naturally lead one to question our confidence in these results. The probability distributions representing the UDS take such factors as rater reliability and agreement into account and provide us with a way to propagate our estimates of measurement error into the inferences we wish to draw from the democratization model (see our article for a full description of the UDS and their underlying probability model).

We can propagate uncertainty in the UDS to the democratization model using an iterative Monte Carlo approach. At each iteration we:

  1. Sample from the posterior distribution of the UDS.
  2. Fit the Casper and Tufis model, using the UDS posterior draw as the dependent variable, and extract the coefficient and panel-corrected variance-covariance matrix from the fitted model.
  3. Draw and save a single vector from the multivariate normal density with mean equal to the fitted model coefficients and variance-covariance matrix equal to the fitted model's variance-covariance matrix.

This procedure, which is demonstrated in the code listing below, yields a sample from the marginal posterior density of the Casper and Tufis model coefficients, treating both the model coefficients and the UDS as random variables, subject to various assumptions about the conditional independence of the UDS and the model parameters (For a more thorough discussion of the approach described here look up the "method of composition" in a good reference on statistical simulation, such as Martin A. Tanner. 1993. Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions. Second ed. New York: Springer-Verlag. pp. 30.)

      *** Prep for the monte carlo
      set more off
      set matsize 1000

      *** Note that the matsize must be at least as large as the larger
      *** dimension in your posterior matrix, in this case 1000 rows.
      *** While this is possible in State SE and MP, Intercooled Stata
      *** puts an upper limit of 800 on matsize.  If you are using
      *** Stata IC, set matsize to n <= 800 and change the loop below to
      *** iterate from 1 to n, thus using only the first n draws
      *** of the UDS posterior sample.

      *** Run the monte carlo
      forvalues i = 1/1000 {
        *** Print out an iteration number
        display `i'

        *** Fit the model, using the ith draw from the UDS posterior
        quietly xtpcse z`i' L1.pcaplog L1.rgdppcgr L1.open L1.cpi L1.prime
          L1.second L1.presiden L1.parliamn L1.bksfrac if dv1==1 & dv2==1 & dv3==1,
          pairwise c(a)

        *** Extract the coefficients and variance-covariance matrix
        matrix b = e(b)
        matrix V = e(V)
        local blength = colsof(b)

        *** Preserve the dataset, take a single multivariate normal draw from the
        *** posterior distribution of the coefficients, and restore the dataset.
        *** We use the capture command to catch possible errors in drawnorm
        *** and drop these iterations gracefully.
        capture quietly drawnorm b1-b`blength', double n(1) means(b) cov(V) clear
        if _rc == 0 {
          mkmat b1-b`blength', matrix(bsample)
          matrix posterior = nullmat(posterior) \ bsample
        else {
          display "Error drawing sample...iteration dropped"

Upon finishing the Monte Carlo procedure, we are left with a sample from the posterior density of the democratization model's coefficients. This sample is like any other generated from a Bayesian simulation approach and we can easily summarize it. For example, the means of the coefficient posteriors are reasonable point estimates of the impact of each independent variable on democracy level. Furthermore, we can construct credible intervals--the Bayesian version of confidence intervals--around these point estimates simply by calculating various percentiles of the posterior sample. The code below shows how to calculate the means, standard deviations, and 2.5 and 97.5 percentiles of the coefficient posteriors, forming point estimates and 95 per cent credible (confidence) intervals.

      *** Get posterior ready to work with
      svmat posterior

      *** Calculate means and standard deviations
      tabstat posterior*, stat(mean sd)

      *** Find the bounds of the 95 percent credible interval
      centile posterior*, centile(2.5, 97.5)

The second column of Table 2 displays the results of the Monte Carlo approach to estimating the democratization model, providing posterior means and, in parentheses, standard deviations (note that, because these values are generated by simulation, they will vary slightly from run to run). Estimates with 95 per cent credible intervals that do not cover zero--coefficients that are statistically significant at the 5 per cent level--are highlighted in bold. Taking measurement error into account makes a difference in the inferences we can draw from the democratization model. Furthermore, the impact of measurement error can be unpredictable. Real GDP growth, a variable with a statistically significant effect in all four point-estimate-based specifications, drops out in the Monte Carlo analysis. On the other hand, presidentialism and trade openness, which are both only statistically significant in one of the original models and the UDS Mean model, withstand propagating measurement error into the democratization model.


Latent constructs like democracy are measured with error but typical social science analyses treat democracy scores as if they were known with certainty. The preceding tutorial demonstrates how to fit a model using the UDS, using a Monte Carlo procedure to propagate measurement error in the democracy scores into the final estimates. This approach is extremely flexible and can be applied to virtually any statistical model you might fit with a traditional, purely point-estimate-based, democracy measure. Furthermore, while the UDS are the dependent variable in the democratization model examined here, the scores can enter the analysis on either side of the equation with no changes to the Monte Carlo procedure.