This year's "final four" turned out to be a "no-brainer" - in the sense that the committee came up with the same field the computers (which have no brains) would have. They came up with the same semifinal matchups, although they got the one/two and three/four seeds backwards. That earns the committee an A−.
12 teams got at least one top four ranking from the computers, and five teams were ranked fourth or better by more than half of the computers.
# | Team |
115 | Alabama |
101 | Oklahoma |
100 | Clemson |
61 | Michigan State |
59 | Ohio State |
10 | Stanford |
9 | Iowa |
5 | Baylor |
3 | TCU |
2 | Mississippi |
Notre Dame | |
1 | Utah |
An interesting point was made by the committee along the lines of "we want the best four, not necessarily the most deserving four." The computers are not all of the same mind when it comes to this - although Michigan State garnered more top 4 "votes", there actually are more computer ratings with the Buckeyes ranked better than the Spartans:
|
We can't know how a collection of humans combine their biases criteria weights, but we do know a lot about how the computer ratings work. Specifically I know how the three that I calculate work, and it is instructive that the most predictive (KLK in the list above) favors Ohio State, and the two that give more weight to results vis a vis opponent quality (ISR and WWP) favor Michigan State. The "out" for the computer-committee is the same as for the humans: "Consider the body of work." That suggests that we could define an objective metric that quantifies that vague notion. There are numerous ways to do that (college Hockey's pairwise matrix is an elegant one.) I'll present one that could be applied equally to human rankings and computer rating composite ranks, after a minor rant.
The Mythical "Eye Test"
I pretty much ignore punditry that includes this phrase. The problem is that anyone who uses it is unavoidably biased by both the objective data (games seen) and valid but unstated criteria for "looks good" and "looks bad." Even if everyone who uses the phrase were to watch every game, their rankings would be incomparable because the unstated underlying criteria may be fundamentally different. In other words, by nature it is subjective because it depends upon whose eyes are performing the test.I would accept an "eye test" ranking provided the person who provides it meets the following criteria:
I doubt that anyone who meets my criteria is spending their time ranking college football teams. I'd hope they're working on carbon-free energy possibilities, and fear they are working on more ways to target me with online ads or siphon fees from my retirement accounts.
- Has watched every game
- Has a photographic memory
- Can do 8,128 simultaneous comparisons in their head
Step two is to assign the value of a win against a team with a specific rank, and the "cost" of a loss based upon rank. The simplest approach is just to use the rank values themselves but that can be misleading when summed over teams because opponents' rank difference 'n' is not the same at 10+n as 50+n. For example, the difference between teams ranked 1 and 10 is a lot higher than teams ranked 51 and 60. Similarly a loss to #1 is not all that different from a loss to #2. So I group the ranks into "buckets" to be used to formed histograms:
Ranks | 1-3 | 4-9 | 10-21 | 22-40 | 41-89 | 90-108 | 109-120 | 121-126 | 127-NR |
Grade | A+ | A | B | C+ | C | C− | D | E | F |
Worth | 64 | 48 | 32 | 24 | 16 | 8 | 4 | 2 | 1 |
Cost | 1 | 2 | 4 | 8 | 16 | 24 | 32 | 48 | 64 |
Form the Body of Work metric by assigning all the opponents their rank-based "grade", then form sums of the wins and losses based upon the equivalent weights. Subtract the losses sum from the winning sum and divide by the total number of games. Note that I do not consider this as much a "meta rating" as just a convenient sort sequence. What actually makes the results versus schedule strength visible is the pair of histograms. Here's the comparison for all the teams with an A or A+ grade.
|
The full list provides some interesting insight into how the quality of wins and losses determines the team's own grade. Again, the "BoW" metric is not a ranking on it's own, but it looks to me like a pretty good tiebreaker when the comparison is between a few pairs of teams out of the 8,128 pairs that contribute to the ratings.