For those of you interested, I will discuss the merits of the Berkeley study
with Dr. Hout Monday morning at 7:30am on KPFA. Thanks to the anonymous
person that recommended me from the listserve, and thanks to the others who
pointed out the data are available on the website. I have replicated their
models and I have confirmed my suspicions that the results are driven by
multicollinearity among the interaction terms. This is a classic example of
what can go wrong when interaction terms are constructed from a variable
with little variation in a data set with a small number of observations.
This issue is so powerful that it alone dominates all other issues I
expressed concern about.
Hopefully, we can still have an interesting discussion about paper verified
voting trails, rather than a discussion of multicollinearity.
For those on the list who are interested, here are the Stata commands and
results from my analysis. I have forwared them to Dr. Hout so that he can
respond to them on Monday.
The report, data, and documentation are available at:
http://ucdata.berkeley.edu/new_web/VOTE2004/
=======================
Dr. Michael P. McDonald
Visiting Fellow, Brookings Institution
Assistant Professor, George Mason University
elections.gmu.edu
mmcdon@gmu.edu
703-993-4191
A simple model WITHOUT interaction terms between electronic voting (etouch)
and percent Bush vote in 2000 (b00pc and b00pc_sq - %Bush squared) show that
Bush did slightly worse, though not statistically significant, in touch
screen counties, contrary to the assertion in the paper.
. reg b_change b00pc b00pc_sq size etouch if(fl ==1)
Source | SS df MS Number of obs =
67
---------+------------------------------ F( 4, 62) =
4.02
Model | .01137473 4 .002843682 Prob > F =
0.0058
Residual | .043894573 62 .000707977 R-squared =
0.2058
---------+------------------------------ Adj R-squared =
0.1546
Total | .055269303 66 .000837414 Root MSE =
.02661
----------------------------------------------------------------------------
--
b_change | Coef. Std. Err. t P>|t|
[95% Conf.
Interval]
---------+------------------------------------------------------------------
--
b00pc | .3465873 .2988948 1.160 0.251 -.2508949
.9440696
b00pc_sq | -.2338153 .2662035 -0.878 0.383 -.7659486
.298318
size | -2.97e-08 2.69e-08 -1.104 0.274 -8.34e-08
2.41e-08
etouch | -.0059961 .0090412 -0.663 0.510 -.0240693
.012077
_cons | -.0773414 .084138 -0.919 0.362 -.2455309
.0908481
----------------------------------------------------------------------------
--
That Bush gained less in touch screen counties than other counties is
confirmed by some simple descriptive statistics (mean of 4.07 versus 2.42
percentage points in Bush gains between counties WITHOUT electronic voting
and those with, respectively). This alone should alert that something
strange is going on when the models with the interaction terms show that
counties with electronic voting had a greater gain for Bush.
. sum b_change if(fl==1&etouch==0)
Variable | Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
b_change | 52 .0407088 .0300101 -.0295672 .1070979
. sum b_change if(fl==1&etouch==1)
Variable | Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
b_change | 15 .0242388 .021011 -.0130056 .0736397
A simple model with one interaction term (etouch * % Bush 2000) suddenly
shows strong statistical significance, and in different directions. This is
a classic symptom of multicollinearity. When two variables are close
"together," the software tends to find both statistically significant, but
in opposite directions.
. reg b_change b00pc b00pc_sq size etouch b00pc_e if(fl ==1)
Source | SS df MS Number of obs =
67
---------+------------------------------ F( 5, 61) =
8.66
Model | .022948423 5 .004589685 Prob > F =
0.0000
Residual | .03232088 61 .00052985 R-squared =
0.4152
---------+------------------------------ Adj R-squared =
0.3673
Total | .055269303 66 .000837414 Root MSE =
.02302
----------------------------------------------------------------------------
--
b_change | Coef. Std. Err. t P>|t|
[95% Conf.
Interval]
---------+------------------------------------------------------------------
--
b00pc | .7865266 .2751752 2.858 0.006 .2362798
1.336773
b00pc_sq | -.5682898 .2411567 -2.357
0.022 -1.050512 -.0860672
size | -8.44e-08 2.60e-08 -3.243
0.002 -1.37e-07 -3.24e-08
etouch | .2153087 .0479929 4.486 0.000 .1193409
.3112764
b00pc_e | -.3931148 .0841124 -4.674
0.000 -.561308 -.2249217
_cons | -.2133257 .0783878 -2.721
0.008 -.3700718 -.0565797
----------------------------------------------------------------------------
--
The correlation between the two variables confirms the collinearity.
Perfect correlation is 1.0, and the correlation between these two terms is
.9777, which is well in the danger area of multicollinearity.
. corr etouch b00pc_e if(fl==1)
(obs=67)
| etouch b00pc_e
---------+------------------
etouch | 1.0000
b00pc_e | 0.9777 1.0000
The full correlation matrix between the three interaction terms shows the
problem is even worse, as all three are collinear with one another:
. corr etouch b00pc_e b00pcsq if(fl==1)
(obs=67)
| etouch b00pc_e b00pcsq_
---------+---------------------------
etouch | 1.0000
b00pc_e | 0.9777 1.0000
b00pcsq_ | 0.9272 0.9845 1.0000