About the Bucklin Majority

© Copyright 2007, Paul Kislanko

20 August, 2007
In 2003 I came up with a method to count ballots in a Fans Poll (for college baseball) that was designed to avoid undue influence by "outlier" votes that were "too high" by fans for their team (or "too low" as a vote against their hated rivals.) I subsequently learned that I'd re-invented a variation of an election method known as "Bucklin."

The basic premise is to have each voter rank teams in 1, 2, 3, ... order, and to count the ballots give each team the rank for which 50% plus 1 of the voters ranked the team at least that highly.

Even if every voter votes their team #1, that vote only contributes to the (eventual) majority that agrees the team should be ranked at least as high as rank R, which is #1 if a majority of voters think it should be. Likewise, if a voter ranks their hated rival bottom (or unranked, even though their hated rival is good) it doesn't matter that much if a majority of the other voters rank their hated rival.

If every voter ranks every team, this is very much like assigning the team the rank that is the median of voters' ranks for the team. (The difference is the +1 in 50%+1 and that it is the same value even if a minority of voters don't rank the team at all.)

Fly in the ointment?

A "problem" with any vote-counting method that uses the median or something like the median is that there are usually a lot of ties. Instead of a 1,2,3,4,5,6,7,8,9,10 you get something like the 1,2,4,5,5,6,7,8,9,10 in the 2007 pre-season rankings by the various media outlets. From a technical perspective 1,2,4,5,5 is actually more useful than any translation to 1,2,3,4,4 as is commonly done. In any case, if the requirement is to produce an ordered list, we must define a method to break ties.

In election methods, a common approach is to resort to the normal Borda count to break ties, but for analyzing team rankings that defeats the objectives we set to not have outlier ballots determine the final ranking. In my original implementation I did use a form of Borda just to order teams that weren't listed on a majority of ballots but were on more than one, and for teams that were only on one ballot listed them in inverse order (i.e. a single vote for a team at #25 is more likely to be "correct" than a single vote for #2.) For 2007-08 I'm using a generalization of this notion.

The first tiebreaker is just the number of votes that make up the majority - if there are 100 voters and 52 rank team A 10th or better and 51 rank team B 10th or better, then we list team A before team B. With a small number of voters, though, it is quite common for tied teams to each have the same size majority, so we need a second tiebreaker, which is (for each team):
VM
+
M-1
VR × (MR + 1 - R) / ( M - R )
R=1
+
MR
VR / ( R - M )
R=M+1

Where:
MR
is the highest possible rank
M
is the best rank for which a majority votes the team R ≤ M
VR
is the number of votes for rank R

This has the effect of giving greater weight to votes that contribute to the majority ranking, but less weight to votes that are farther away from the majority whether too high or too low. One way to think of it is that if teams are tied with an equally-sized majority, the team with votes more nearly "clustered" around the majority rank will be listed first. (This second tiebreaker performs the same role as the standard deviation does for average when teams with smaller standard deviations are listed before teams with greater SDs for teams with the same average rank.)

When MR is less than the total number of teams, MR+1 is the default majority (not ranked.) When there's only one vote for a team, sorting on the second tiebreaker gives the desired #25 worth more than #2 effect (in a "top 25" poll the "not ranked" majority is 26, so the vote for #25 counts 1/(26-25) = 1/1 = 1 and the one for #2 counts 1/(26-2) = 1/24th.)