Playoff Selection Report Card

December 10, 2015

This year's "final four" turned out to be a "no-brainer" - in the sense that the committee came up with the same field the computers (which have no brains) would have. They came up with the same semifinal matchups, although they got the one/two and three/four seeds backwards. That earns the committee an A−.

12 teams got at least one top four ranking from the computers, and five teams were ranked fourth or better by more than half of the computers.

Massey Ratings College Football Ranking Composite as of Wed Dec 16 12:50:23
(117 computer rankings)
#Team
115Alabama
101Oklahoma
100Clemson
61Michigan State
59Ohio State
10Stanford
9Iowa
5Baylor
3TCU
2Mississippi
Notre Dame
1Utah

An interesting point was made by the committee along the lines of "we want the best four, not necessarily the most deserving four." The computers are not all of the same mind when it comes to this - although Michigan State garnered more top 4 "votes", there actually are more computer ratings with the Buckeyes ranked better than the Spartans:

Massey Ratings College Football Ranking Composite as of Wed Dec 16 12:50:23
(117 computer rankings)
Ohio State (59)Michigan State (58)
ARG #3 - #5
ASH #4 - #5
BAS #4 - #10
BBT #2 - #6
BCM #4 - #13
BDF #3 - #7
BRN #3 - #9
BWE #3 - #7
CGV #2 - #3
COF #4 - #5
CPA #4 - #15
CTW #2 - #4
DCI #4 - #9
DEZ #3 - #6
DII #4 - #5
DOI #3 - #21
DOK #2 - #10
DP #3 - #8
FPI #3 - #14
HEN #5 - #6
HKB #4 - #7
HNL #3 - #6
HOW #3 - #4
KAM #3 - #10
KEL #3 - #5
KLK #2 - #8
KPK #4 - #5
LOG #4 - #13
MAS #4 - #5
MDS #4 - #5
MOR #3 - #6
NOL #2 - #5
NUT #3 - #8
OSP #3 - #5
PAY #4 - #5
PIG #3 - #8
PIR #4 - #5
PTS #5 - #8
RBA #2 - #18
RSL #3 - #10
RT #4 - #5
RTH #4 - #5
RUD #4 - #5
RWP #3 - #5
S@P #4 - #9
SAG #4 - #9
SEL #4 - #5
SFX #3 - #5
SOL #3 - #7
SP #3 - #15
SRC #4 - #5
STH #3 - #5
STU #3 - #5
TFG #2 - #4
TPR #2 - #12
TRP #3 - #13
TS #5 - #6
WLK #3 - #8
WMR #2 - #7
ABC #6 - #4
ACU #6 - #4
AND #6 - #2
ATC #7 - #3
BIH #5 - #4
BIL #5 - #3
BOB #5 - #4
BOW #6 - #4
BSS #9 - #3
CI #5 - #1
CMV #3 - #2
COL #5 - #2
CPR #10 - #4
CSL #6 - #3
D1A #6 - #3
DES #6 - #3
DOL #6 - #3
DUN #7 - #4
ENG #4 - #2
EZ #5 - #2
FEI #7 - #5
FMG #5 - #3
GBE #5 - #1
GLD #5 - #2
GRS #5 - #4
HAT #5 - #2
ISR #6 - #2
JNK #5 - #2
KEE #5 - #4
KEN #5 - #4
KH #5 - #4
KNT #5 - #3
KRA #5 - #2
LAZ #7 - #4
LSD #5 - #3
LSW #7 - #3
MAA #6 - #2
MCK #5 - #4
MEA #5 - #3
MGN #5 - #4
MJS #5 - #3
MRK #4 - #3
MVP #5 - #4
MvG #6 - #3
PCP #7 - #3
PGH #7 - #3
PPP #5 - #3
REW #6 - #2
RFL #7 - #1
RTB #7 - #2
RTR #5 - #3
SOR #5 - #3
UCC #5 - #4
WEL #5 - #3
WIL #5 - #2
WOB #7 - #3
WOL #6 - #2
WWP #11 - #3

We can't know how a collection of humans combine their biases criteria weights, but we do know a lot about how the computer ratings work. Specifically I know how the three that I calculate work, and it is instructive that the most predictive (KLK in the list above) favors Ohio State, and the two that give more weight to results vis a vis opponent quality (ISR and WWP) favor Michigan State. The "out" for the computer-committee is the same as for the humans: "Consider the body of work." That suggests that we could define an objective metric that quantifies that vague notion. There are numerous ways to do that (college Hockey's pairwise matrix is an elegant one.) I'll present one that could be applied equally to human rankings and computer rating composite ranks, after a minor rant.

The Mythical "Eye Test"

I pretty much ignore punditry that includes this phrase. The problem is that anyone who uses it is unavoidably biased by both the objective data (games seen) and valid but unstated criteria for "looks good" and "looks bad." Even if everyone who uses the phrase were to watch every game, their rankings would be incomparable because the unstated underlying criteria may be fundamentally different. In other words, by nature it is subjective because it depends upon whose eyes are performing the test.

I would accept an "eye test" ranking provided the person who provides it meets the following criteria:

  1. Has watched every game
  2. Has a photographic memory
  3. Can do 8,128 simultaneous comparisons in their head
I doubt that anyone who meets my criteria is spending their time ranking college football teams. I'd hope they're working on carbon-free energy possibilities, and fear they are working on more ways to target me with online ads or siphon fees from my retirement accounts.

Constructing a Body of Work Metric

The fundamental principle involved is the quality of opponents in teams' wins and losses. Individual computer ratings form their ratings based upon their determination of opponent quality, and presumably humans try to do the same. Our first step is to define a composite ranking for all teams so that every team's wins and losses can be categorized the same way. My choice is to use the "Bucklin" Majority to represent the computers. This is the best rank for which 50%+1 of the computers rank the team at least as high. Any other "vote-counting" method would work as long as ranks are assigned to all teams.

Step two is to assign the value of a win against a team with a specific rank, and the "cost" of a loss based upon rank. The simplest approach is just to use the rank values themselves but that can be misleading when summed over teams because opponents' rank difference 'n' is not the same at 10+n as 50+n. For example, the difference between teams ranked 1 and 10 is a lot higher than teams ranked 51 and 60. Similarly a loss to #1 is not all that different from a loss to #2. So I group the ranks into "buckets" to be used to formed histograms:

Ranks 1-3 4-9 10-21 22-40 41-89 90-108 109-120 121-126 127-NR
Grade  A+     A   BC+CC−DEF
Worth64483224168421
Cost12481624324864
The width of each "bucket" is based upon the presumption that team quality is normally distributed, and is defined to approximate ½ standard deviation except for C which corresponds to the mean ±½ standard deviation so has width of one standard deviation.

Form the Body of Work metric by assigning all the opponents their rank-based "grade", then form sums of the wins and losses based upon the equivalent weights. Subtract the losses sum from the winning sum and divide by the total number of games. Note that I do not consider this as much a "meta rating" as just a convenient sort sequence. What actually makes the results versus schedule strength visible is the pair of histograms. Here's the comparison for all the teams with an A or A+ grade.

BoWWsumLsumGradeRankTeamWv:A+ABC+CC-DEFLv:A+ABC+CC-DEF
19.69230827216A4Michigan State022062000000010000
19.1538462490A+3Clemson012162001000000000
18.3846152434A+1Alabama002620011001000000
17.66666722816A+3Oklahoma003250100000010000
15.5384622108A7Stanford010432010002000000
14.2307691872A8Iowa001261011010000000
14.0833331723A8Notre Dame001332100110000000
14.0000001702A4Ohio State001081010010000000

The full list provides some interesting insight into how the quality of wins and losses determines the team's own grade. Again, the "BoW" metric is not a ranking on it's own, but it looks to me like a pretty good tiebreaker when the comparison is between a few pairs of teams out of the 8,128 pairs that contribute to the ratings.

© Copyright 2015, Paul Kislanko