[EL] McDonald study, birthdate distribution in real voter list

Justin Levitt levittj at lls.edu
Sun Sep 11 11:13:52 PDT 2011


This may be my fault -- I was having an off-list conversation with Bev 
about her mention that she'd found double-voters, based on a match of 
name and date of birth.  I sent her the paper that Michael and I had 
done, showing that such matches are far more common than most people 
assume.  And our discussion now seems to have gone on-list with some 
misunderstandings.

The paper is Michael McDonald and Justin Levitt, Seeing Double Voting: 
An Extension of the Birthday Problem, 7 Election L.J. 111 (2008).  We 
use statistics (backed up with simulations) to show that if you've got 
180 voting-age people in a "room", there's a 50% chance that two of them 
will share the same date of birth.  If you've got 460 voting-age people 
in a room, the chance is 99% that at least two of them will share the 
same date of birth.  We use common names not as an underlying 
assumption, but as an illustration: the statistics mean that if there 
are 460 "Robert Smith"s or "Maria Rodriguezes" on your voter rolls, it's 
a really, really, really, really good bet that two of them will be 
different people but share the same date of birth.  And then we applied 
the stats to a real-world example, in New Jersey's registration file.

Bev's quick calculation is based on a common statistical error: she's 
finding the rate that two people share one same date of birth -- say, 
May 5, 1955.  But the question isn't focused on a single given day -- it 
has to encompass every possible match of one person's birthday with 
every other person in the room, and add them all together.  Put 
differently, the question isn't whether two people on a list share a 
single date of birth, but whether _no_ two people in a large pool share 
a birthday.  And the answer to that latter question turns out to lead to 
birthdate matches quite frequently.

Now, those statistics may well end up yielding 2 in 10,000 people on a 
large list with the same name and date of birth -- it depends entirely 
on how many people you've got on the list with the same name.  When we 
looked at NJ's file of 2.5 million valid voter records, there were 325 
Robert Smith voters, 209 Maria Rodriguez voters, and so on ... and we 
calculated based on the actual names in the file that we'd expect about 
487 "twins" -- different people with the same name and birthdate.  
That's a rate of about 1 in 5100 "twins".  Different files, with 
different name distributions, will produce different rates.  Perhaps the 
file that Bev was examining simply had different name distributions than 
NJ's file.

While I want to point out the distinction between Bev's calculation and 
our own, I do think Bev's investigation is worthwhile for a different 
reason entirely.  It's good sleuthing to find that an anomalous quantity 
of registrations were entered into the system on one day, and that many 
of those registrations revealed an anomalous date of birth.  Doing this 
sort of overview can help reveal quirks in the underlying data that may 
explain what otherwise appears to be wrongdoing.  It's a good reminder 
-- and I know others on the list, including Mike McDonald, have done 
similar work on the quality of registration data -- to check out the 
quality of registration information before drawing conclusions that 
assume malfeasance.

Justin

On 9/11/2011 10:42 AM, Bev Harris wrote:
> Using first name, last name, birthdate as a locator for duplicate voters comes
> under question in a study by Michael McDonald. Note that this study assumes
> common name, like "Robert Smith" plus same birthdate, then draws conclusions
> against using firstname lastname birthdate assuming all are common names, which
> is a fallacy.
>
> Because real voter lists are readily available, I think such studies should use
> real lists, not hypothetical models. In private communications, I mentioned my
> conservative assumption that 1 in 10,000 people on a large voter list will have
> the same birthdate (year, month, day). That figure is derived as follows:
> 365 days in a year
> 50 years avg voting life
> 365 x 50 = 18,250
>
> It is this quick estimate of frequency of same birthdate that comes into
> question in the McDonald study, but in fact, the 1 in 10,000 figure is
> corroborated by the actual data.
>
> How do I go from 1 in 18,250 to 1 in 10,000? Lop off some for the bell-shaped
> curve, fewer voters in young and old groups, and put in a fudge factor for
> twins, and you have a napkin-calculation figure of a chance of 1 in 10,000 to
> have the same birthdate, with a few more in the baby boom years. There's no
> real precision in any of these statistics on a generic basis, and for obvious
> reasons it's nicer to use 1:10,000 rather than 1:13,129 or 1:9837. It gives a
> rough guideline for how many repeated birthdates to expect.
>
> In a smaller database, say, 8800, you will see more repetitions than 1:10,000,
> for a simple reason -- twins. I think the last figure I heard was that you have
> a 1:80 chance of having twins. That twins number also gives a same last name,
> but is unlikely to have same first name. Possible, don't prosecute anyone just
> on this basis, but unlikely.
>
> This 1:10,000 calculation is called into question in the McDonald study, based
> on theoretical models. So I thought you might be interested in real numbers,
> which really do work out to the 1 in 10,000 handy guestimate technique.
>
> I ran a birthdate frequency calculation on a voter list with 604,456 voters. It
> was interesting for two reasons: (1) It confirmed the expectation that only in
> in 10,000 birthdates will be the same in a voter list, and (2) It identified an
> anomaly for one specific birthdate in this list. The bell shaped curve shows
> consistency over 30,000 different dates, during the baby boom years, ranging
> from 45 to 67 repetitions of a given birthdate. Except that 114 people are
> shown to have a birthdate on one day - Nov. 29, 1960.
>
> See attached graph for the obvious and unlikely spike.
>
> That one date spikes into the stratosphere in this database. I then looked at
> the registration dates for voters with that birthdate, and found a big chunk of
> them coinciding with another unlikely spike. The number of voter registrations
> per day in this jurisdiction maxes out at about 3,000, with an average, during
> heavy registration periods, of closer to 1,500. But on Sept. 3, 1991, an
> unlikely 12,107 registrations are shown as being entered into the system. Many
> of the Nov. 29, 1960 birthdates were entered on Sept. 3, 1991. So there's a
> "hmm" for you. And perhaps a quick diagnostic tool to spot voter list
> unlikelihoods.
>
> At any rate, the off-the-cuff frequency for repeated birthdates, using 1 in
> 10,000 for a guideline, is indeed supported by the real voter registration data.
>
> Now, using the real data it is also possible to determine the actual frequency
> of any last-name, first-name combination in a given jurisdiction. The name
> "Robert Smith" used in McDonald's study shows up 127 times in the 604,456
> database, when you include all its permutations (Rob, Bob, Bobby etc). So
> that's about a one in 5,000 chance in that jurisdiction that your name is
> Robert Smith in this particular jurisdiction.
>
> As I remember my stats class, in this problem you would need to calculate the
> chance that a single item meets two low probabilities at the same time (for
> example, 1 in 10,000 chance of same birthdate, at the same time as 1 in 5,000
> chance of name Robert Smith). It really is unlikely that even for a common
> name, you will find the same birthdate in the same jurisdiction.
>
> Except for this: My data is showing that at least in Shelby County, TN, people
> who did not vote are being listed as voting, people who did vote are being
> listed as not voting, people who requested Republican ballots are being shown
> as requesting Democratic ballots and vice versa, and at least two people are
> checked in to vote as both absentee and polling place when it appears
> impossible (one was overseas and one is confined to a nursing home).
>
> In Shelby County, people who are Black are also showing up as White, and poor
> Effie Washington, who was a Black woman, then got listed as an "Other, Man",
> and now is listed as "Other, Woman." Or unfortunate lifelong Democrat Sharonda
> Williams, a Black woman, who had her race changed to "other", then voted with a
> Democratic ballot in the 2010 primary but is shown as not voting, then voted in
>   the next primary but is shown as voting a Republican ballot.
>
> Yeah. In Shelby County, if you are listed as voting twice, it's probably a
> database error.
>
> Bev Harris
> Founder - Black Box Voting
> http://www.blackboxvoting.org
>
> * * * * *
>
> Government is the servant of the people, and not the master of them. The
> people, in delegating authority, do not give their public servants the right
> to decide what is good for the people to know and what is not good for them to
> know. We insist on remaining informed so that we may retain control over the
> instruments of government we have created.
>
>
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
>
>
> _______________________________________________
> Law-election mailing list
> Law-election at department-lists.uci.edu
> http://department-lists.uci.edu/mailman/listinfo/law-election

-- 
Justin Levitt
Associate Professor of Law
Loyola Law School | Los Angeles
919 Albany St.
Los Angeles, CA  90015
213-736-7417
justin.levitt at lls.edu
ssrn.com/author=698321

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://webshare.law.ucla.edu/Listservs/law-election/attachments/20110911/1dda594b/attachment.html>


View list directory