Rating Games Played

Rating Games Played - Upsets

© Copyright 2007, Paul Kislanko

27 August, 2007

There's an interesting problem when it comes to comparing ratings by how well they retrodictively match game results. In some cases, the "error" isn't in the rating, it's in that a team didn't perform to its rating (or the lower-rated opponent played above its) in particular games. Sometimes a retrodictive "error" is just an acknowledgement that "the best team doesn't always win", and the ranking that has the winner of a game ranked worse than the loser might in fact be correct.

The 2006 poster-child for inconsistency was UCLA - the Bruins averaged winning over a better team or losing to a worse team in nearly half their games (5.7 was the average retrodictive error count for the computer rankings in this survey). The top 10 hardest to rank teams from 2006:

5.74 UCLA 4.7 Arizona 4.44 Washington St 4.37 Toledo 4.33 Washington 4.21 Boston College 4.04 Georgia 3.81 Auburn 3.77 Tulsa 3.73 Oregon
This list is just based upon a count of games involving the team where the team with better final rank lost. That really doesn't tell us much because it's really not that much of a surprise when the #10 team loses to the #15 team - there's not a lot of difference between those rankings (and there's almost no difference between the teams ranked #50 and #55.)
It's an average of how many games involving the team 99 computer rankings got "wrong" retrodictively. See the full list by rating.

For a "surprise factor" we'd like a measurement that takes into account both how large the upset (MOV) and how important it was (the rank of the team that was upset):

Surprise = ⌈ (WS - LS)÷8 ⌉ × (WR - LR)

LR

⌈ x ⌉ is the least integer ≥ x
(this is the MOV expressed as # of scores)
WS = Winner's Score
LS = Loser's Score
WR = Winner's Rank
(note - in these games will be > LR)
LR = Loser's Rank
(that of the better-ranked team)

Here's a sample for a single hypothetical ranking from 2006.

If we weight the retrodictive ranking violations by this Surprise Factor, we get a different list:

As one might imagine, the teams on this list tend to come in pairs or triplets: USC/UCLA/Oregon State and Auburn/Florida/Georgia make up over half the list.
Since Florida finished ranked #1 by most of the computers, all of its weighted average violation comes from the loss to Auburn. Recalling that the final margin in that game was due to a last-play defensive TD by the Tigers, it probably wasn't the most surprising outcome of the year. Even adjusting to a single-score factor, though, that game is second only to UCLA's win over Southern Cal as the most surprising in hindsight.

1.8529 Auburn 1.4887 Southern California 1.3370 UCLA 1.3311 Florida 0.8365 Oregon St 0.8286 Rutgers 0.6844 Arizona 0.6265 Cincinnati 0.6122 Georgia 0.5765 Boston College

To calculate the Weighted Retrodictive Violation factor for a team and rating, sum the surprise factor for each game lost by the team to a worse ranked opponent or won by the team over a better-ranked (by the particular rating) and divide by the number of games the team has played. For 2006 these are shown by row team and column rating in this table.

To get the value for a team just average the values for each rating for that team. To get the value for a rating take the sum of all "surprise factors" for each retrodictive violation and divide by the total number of games played. The table is sorted by descending team values and ascending rating values.

The source for the ratings used in this example is Kenneth Massey's College Football Ranking Comparison. The data were captured after the 2006 Bowl Season, and include the 99 computer ratings but not the MCS, AP or USA human polls.