Fuzzy Math

© Copyright 2005, Paul Kislanko

The second Harris Poll had an anomaly that there were more total points counted than could be attributed to the number of voters. Each voter has 325 votes to distribute over 25 teams in the order 25,24,23... for first, second, third, and so forth. With 112 voters there should have been 36,400 total votes assigned to the 45 teams that were in at least one voter's top 25. The points for all teams sum to six more than that.

It turns out this is not an error. According to Harris, quoted by Jerry Palm on CollegeBCS.com for the top 15 teams by average rank, a ranking more than three standard deviations from the average can be adjusted. That can result in a ballot that has two teams ranked N and no team ranked M on that ballot. So |M−N| teams each get one more or one less point than they would from voters who had one team ranked outside the acceptable range.

The intent of such a system is to avoid the effects of applying "strategic voting" (intentional or accidental) that this voting method is known to be subject to. The method is known as Borda, after the French mathematician who first described it. (Technically, with 119 alternatives, a pure Borda count would assign 118 to first, 117 to second, and so on. The count is then the number of teams ranked below the team for which preference is being indicated.)

While such an approach prevents a repetition of the problem AP voters had in 2004 - being the subject of intense lobbying to change their votes - it does so at an uncacceptable cost. First, it actually exacerbates the strategy problems that afflict Borda-counted elections by providing a way to have one ballot count more than others. Worse, it removes a layer of accountability. When the number of total points comes out exactly, there's no way to tell if everyone voted sincerely or the "insincere" strategic votes cancelled out.

While most versions of Borda have been discredited on theoretical grounds for real elections, this one does not even meet the "one man, one vote" criterion.

There're some mathematical problems too. Each voter ranks only 25 of the 119 teams, and not all voters rank exactly the same 25. So for a ballot that doesn't rank team X, how do you define its contribution to the "average rank" for team X? The voter could've had it 26th or 50th. When dealing with collections of ordinal rankings, the arithmetic mean is not the most appropriate measure - an average true Borda score would be better (the one using 119, 118, etc. instead of 25, 24, etc).

Even better would be the one I proposed last year to address the issue raised by the AP fiasco. When constructing an ordinal ranking based upon ordinal ranks, just assign teams the best rank for which a majority of voters rank the team that good or better. Break any ties by the actual size of the majority, and if there are still ties use the remaining ranks for the tied teams to repeat the process.

If every voter were required to rank every team, this would be equivalent to taking the median rank for each team, which is less affected by "outliers". In the same way, if voters were required to rank every team, then the Borda count would be equivalent to the average, and it might be appropriate to use averages and standard deviations to vet the ballots.

Using this "best rank for which a majority agrees" approach avoids the problems that ballot "trimming" seeks to avoid. It as a form of a method known as Bucklin, which as a single-winner election method is characterized as a form of "approval voting."

If a majority of voters fail to include a team in their top 25, well, then it probably isn't a top 25 team, no matter that one voter ranked it #1!

We do not have access to the Harris ballots, so I can't compare what a Bucklin result would be to their Borda result, but I do have access to 77 different ordinal rankings by various computers. In this example the teams that received a majority of "votes" for ranks 1-25 are listed first in Bucklin order, and "Others receiving votes" are listed in Borda order.