Computer Rankings

July 24, 2014

More than a month (only about a month! if your life has been on hold since January 6) until the first FBS football game. Not that many people will be holding their breath for the meeting between one of the newest members of division 1A and one of the new 1AA teams.

You can tell we're getting close to the start of the season when Dr. Massey begins publishing his College Football Ranking Composite page. It will be awhile before that includes the ratings I produce. I don't think there's a reasonable way to calculate them with no gamescores at all, and I wouldn't pretend to know enough about the changes in teams' quality to try to manually adjust last year's results.

Still, some analysts do, and as the main purpose of this site is ratings analysis, I have started publishing my metadata. So this is as good a time as any to review the report.

The Wisdom of the Crowd

The basic idea is to treat each rating's ranking as if it were a voter's ranked ballot. Just as it is not necessary to know which criteria are more important to an individual AP (or USA Today or Harris) poll voter to derive a consensus rank, a sufficiently diverse collection of algorithms might be expected to provide a "better" ranking than any of them individually. (More importantly, in the case of algorithms that's a testable hypothesis!)

My ratings list isn't quite the same as that published by Dr. Massey. I do not include any of the aforementioned human polls, for example. Computers I can analyze, people are beyond my capacity. (I almost wrote beyond help, but then some humans actually use computers to aid their ranking.)

A fair question is "Why bother? Dr. Massey already reports a consensus rank." The answer lies in one of the great mathematical theorems of the 20th century. If there are more than two alternatives and more than two criteria for ordering alternatives, there's no way to define a "best" way to combine the more-than-two "voters'" rankings into one.

Not "it's hard to do that", it's impossible. Sure there are many algorithms for combining ranks ("counting votes") but "which is best" devolves into a vote about "how do you define 'best'?" and you're back where you started.

So I provide several reports that utilize different methods to consolidate the rankings. Whether any of them is "better" is purely a matter of taste.

The "Standard"

The first section of the report is probably the least useful. It uses the same methodology that the media polls use to count Top 25 ballots:
  1. Truncate all rankings at 25 teams
  2. Assign points to ranks with 25 points for #1, 24 points for #2, down to 1 point for #25.
  3. Sum points for each team
  4. Sort teams by decreasing sum to assign ranks

Were I to try to list all this method's faults this essay would turn into a book. Behind all of them lies the main problem: the end product preserves almost no useful information. Are 25 votes for #25 really the same as one vote for #1?

I include this because it does provide an apples-to-apples comparison of the Computer Top 25 to the AP and Coaches' polls.

Bed and Breakfast

Actually, Borda and Bucklin. The Standard method is a variation of a ranked-ballot counting method attributed to Jean-Charles de Borda. The Borda Count for a team is the number of teams ranked below it on a ranked ballot. So it's a point-based system like the standard but assigns 0 points to the bottom-most team up to N-1 points to the top-most team on each ballot.

The Borda rank is calculated the same way as steps 3-4 for the standard, so it shares many of that method's disadvantages. There are two differences of note: since every ballot includes all teams slightly less information is lost in the summing (it's still not a great method) and the point assignment is subtly different. The difference is in tie-handling.

In the standard, when there's a tie the tied teams all get the same point assignment as if they were the only team ranked below teams with a better score. In a true Borda, they all get points based upon the number of teams ranked worse. For a "standard" ranking of 1,2,3,3,3,6,... the Borda Counts would be based upon a 1,2,5,5,5,6,... ranking. This takes into account the fact that besides the two teams with more points there are two other teams with equal points. The three teams are ranked better than those teams with strictly worse scores.

When all teams are ranked on every ballot, the Borda ranking of the sums corresponds exactly to the average rank, with the raw counts giving a (very) rough measure of differences between consecutive ranks. Like the average, Borda sums are distorted by extreme rank values.

What I call the Bucklin Majority rank is the best rank for which more than half the voters rank the team that highly or better. If there are an odd number of ballots it is the same as the arithmetic median. As such it is much less affected by outlier ranks, especially exceptionally worse ranks. A tiebreaker just to order the report is a very rough measure of how closely the ranks are clustered around the majority cutoff.

The first full-field report is ordered by majority rank but includes the Borda count and rank along with the Standard rank where applicable. Immediately following this report and in the same order is the count of votes by rank. This just lists the number of votes a team received for #1, #2, and so on up to #128.

The Pairwise Rankings report shows in row i column j the number of rankings that list team i ahead of team j. This list is ordered by decreasing number of "pairwise wins" by teams i.

The most current version of the report is always here.

© Copyright 2014, Paul Kislanko