Grading the 2007 Teams

© Copyright 2007, Paul Kislanko

7 December 2007

No need for polls?

One of the fun side-discussions related to college football is not just "which team is better?" - "how do you measure better?" leads to a deeper insight. "Ranking" doesn't help, since the folks who do the rankings each use a different blend of speculation, opinion, and objective measurements, so arguments ensue about methods rather than team.

Were we to automate the ranking process, we'd want a set of objective criteria that everyone understands and agrees to and a process for combining them into an ordered list. Division one hockey actually has such a list and process. It goes basically like this:

  1. Identify "teams under consideration." This just weeds out the bottom of the division, leaving out teams with losing records except for automatic qualifiers and teams rated worse than some rank by the NCAA-calculated RPI
  2. For each TUC, compare it to each other TUC by each of the pre-defined criteria and assign the one with more points a "pairwise win" (and the other a "pairwise loss")
  3. For all teams under consideration, calculate the "pairwise winning percentage" the usual way (pairwise wins + ½pairwise ties divided by #compared teams)

It is step 3 that can sometimes cause controversy. When team A and team B have played and team A won both the head to head and the pairwise comparison between the teams, team B can still be ranked higher than team A because it has more pairwise wins against all the other teams under consideration than team A. This is not an "error" - it just means that if we compared all triples there are more where team B is 2-1 and team A is 1-2 when the triple contains both.

For football, a reasonable implementation is:
1.Assume that the criteria for considering teams is the same as that for bowl-eligibility. I'd prefer a much stronger test but for the moment it is "have at least a .500 record without counting more than one win against a FCS team."
2.
aHead-to-Head Assign one point for each win by team X over team Y. For most team pairs neither team gets a point by this criterion, which is the basic problem.
bWP vs Common Opponents If either team X or team Y has a better winning percentage vs common opponents, assign one point to that team. Again, for many team-pairs this could well result in no point for either team.For teams that have common opponents the percentage rather than just number of wins is necessary because the two teams may not have played the same number of games against the common opponents.
cWins vs Teams Under Consideration For record vs teams under consideration we shouldn't use percentage, or else a 1-0 record would beat a 6-1. So this just compares the number of wins against teams under consideration by team X and team Y. (Note: we do not count wins vs common opponents for the comparison between team X and team Y, since those wins have already been used.)
dRecent record Assign ½ point to team X if it has a better record over the last four games than team Y has over team Y's last four games.
eBest Other Win Assign 1/4 point to team X if the best win by team X against teams other than Y or X and Y's common opponents is better than team Y's against teams other than X and the Common Opponents.
fWorst Other Loss Assign 1/4 point to team X if its worst loss to teams other than Y or X and Y's common opponents is to a better team than team Y's worst loss against teams other than X and the Common Opponents.
3. List the teams ordered by pairwise winning percentage as defined above.

Notes

2.b has an extra case. When team X and team Y have no common opponents at all we give no points to either team, but if they have common opponents and the same record vs the common opponents, we give each team ½ point. This makes no difference in the pairwise comparison, but in the reports distinguishes teams that have common opponents from those which don't.

2.c neatly avoids any specific definition of SOS while still taking schedule strength into account. By just counting wins against the "good teams", it gives credit for those without giving credit for just playing a tough schedule, which sometimes occurs with rankings that combine SOS with winning percentage formulaically.

2.d compares the teams' records over the last four games each played. I chose four because the number of games used in both basketball and baseball represents approximately the last month of the season.

For both 2.e and 2.f the idea is that head-to-head and results versus common opponents should take precedence, but when they're not applicable we have to use the rest of the field to make comparisons. Hence the best wins and worst losses are only based upon games that involve teams only one of the pair played. This begs the question, though, as to how to define "best win" and "worst loss."

Computers to the rescue

Since 2.e and 2.f compare teams X and Y based upon games that do not involve X, Y, or any of their common opponents we need some way to assign a rank that allows us to compare teams that played X but not Y and those that played Y but not X. We can't use any of 2.a, 2.b or 2.c to derive the values by definition, so we need an "oracle."
Note that both the "best win" and "worst loss" could well involve teams not under consideration, so some ranking of all teams must be an input to the process.
For our oracle we need some objective ranking of all teams and the best place to look for that is in the computer rankings. We could choose one, choose all or some and take the average or median of the ones we use, or even just use the existing BCS computer summary to form the initial list. I chose to use the Bucklin Majority which is approximately the median of the complete list.

See the Pairwise Ratings for the 2007 season.