[EL] data errors

Lorraine Minnite lminnite at gmail.com
Tue Oct 28 11:06:30 PDT 2014


The subject is on my mind because I've worked with a lot of voter files
over the last 30 years (back in the day of the mainframe…uggh), including,
a few years ago, files processed by Catalyst, and my practical experience
is that reported election data is quite prone to inaccuracies - not
necessarily always of a magnitude that puts results in question, but lots
of errors, nonetheless.

I give you one example: I'm in the middle of converting 50 years of
official annual voter registration and turnout reports for the City of
Philadelphia from paper to electronic format. I'm through about half of the
reports and on almost every page there are calculation errors appearing in
summary columns.  The one I'm looking at here is an exactly 10,000 vote
error (+) for the Republican candidate for Mayor in the historic 1983
Municipal election in which Philadelphians elected their first black mayor.
I've gone over the numbers several times and the reported vote total is
simply wrong - either that, or the ward totals are incorrect to the tune of
exactly 10,000 votes.  Philadelphia's 66 wards averaged between 10,000 and
11,000 votes each, so this would be like adding an entire ward of voters
voting only for the Republican.  Bad, but in this case not enough to hand
victory to the wrong winner (Wilson Good's margin was still more than
130,000 votes).  I've had similar experiences working with New York City
voting data and reports.

I think this bears on the list discussion of the Richman et al. non-citizen
voting study because that study rests a conclusion that is so out of bounds
with everything else we know about the level of illegal voting in the U.S.
on a finding of five people in an opt-in Internet survey with vote
validation conducted by a data company using less than transparent
methods.  If election officials sometimes can't add up 66 numbers
correctly, isn't it possible that people who didn't vote get recorded as
voting, or that matching algorithms are wrong, or that people make other
kinds of mistakes that could produce five errant records in a dataset of
30,000 or more?  The authors are not able to personally verify that these
five people are indeed non-citizens and that they actually cast ballots in
2008, so it would stand to reason that misreporting, record-keeping and/or
methodological errors are likely better explanations for the anomalies.
Instead, the researchers here started from an assumption that there is
wide-scale non-citizen voting in the U.S. and that the problem is measuring
it.  What is the evidentiary basis for this assumption in the first place?
If the CCES included 200 or 300 people in this category I might worry, but
I would still want to verify the records myself before jumping to a
conclusion that is so out of whack with all of the research on illegal
voting scholars, including myself, have done so far.  When you dig into the
real world beyond what statistical analysis (based on potentially faulty
data) presents, you often find all kinds of problems.

Lori Minnite
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://webshare.law.ucla.edu/Listservs/law-election/attachments/20141028/717b8732/attachment.html>


View list directory