Meta-Ranking Construction
September 3, 2014
The debut of the USA Today College Football Computer Composite
and its inclusion in Dr. Massey's College Football Ranking Composite prompts contemplation on the role and construction of
composite rankings.
There are some good reasons to exclude it from my own computer ranking analysis, and some for including it.
- It does not coreespond to a rating system, as it uses the constituent systems' ranks as input. There is inevitable loss of precision in any such
consolidation. This is balanced by the fact that the ranks are combined in an interesting and unique way (more on that in a bit.)
Had I been tasked with construction of a composite rating, I think I would have done as I do with my baseball ratings comparisons: exploit the fact that the rating values
for the systems are normally distributed to "scale" the team values to a distribution with mean zero and standard deviation one. The results for each team can then be combined
in any arithmetical way while preserving the relative differences between team values in the different systems. I may take the trouble to do exactly that if I get time.
- Each of the constituent rating systems is already included in my analysis, so including a composite derived from them gives these more weight than the other ratings.
At least in the first week's edition of the ranking, the composite appears to be different enough from each individual component1 for that not to be a large concern.
I am including the ranking for now because part of the premise of my analysis is that a composite has the potential to be a better measure of the field, and including
several such supports investigation of the premise's correctness. To facilitate this, I am also including the "Consensus Rank" from Dr. Massey's composite in the Weighted
Retrodictive Ranking Violations (but not the Borda, Bucklin or Pairwise Ranking calculations.)
WHAT? Mean?
I was especially intrigued (and pleased) by the choice of the geometric mean of the five rankings to order the results. I had been using it as one way to characterize schedule strengths
based upon opponents' ranks without undue influence of the inevitable (alas!) games against unranked opponents.
A reminder:
- The arithmetic mean
- is the usual average of the ranks.
- The harmonic mean
- is the inverse of the average of the ranks' inverses.
- The geometric mean
- is the antilogarithm of the average of the ranks' logarithms.
For both the harmonic and geometric mean, lower rank values effectively have more weight than higher ones, in the case of the harmonic mean so much so that the composite
is completely dominated by the best ranks in the list. The effect is as if the average is "weighted" by functions such as the ones shown in the graph.
The "weight function" for the geometric mean in this graph should not be taken too literally - this particular function was chosen to
illustrate how the shape differs from that of the harmonic mean.
Having some kind of "decay" for rankings far worse than most without resorting to removing them entirely is desirable, and it can be argued that
if one or some ratings are much better than most that is a measure of some desirable quality that the team demonstrates. Note that my
"Majority Consensus" composite is a "step function", that assigns a zero weight to every rating worse than the majority rank.
Since the the UTCFCC has only five components and it is reasonable to expect that the spread of ranks would not be excessive, the geometric mean strikes me as a
much much better method than the BCS and its predecessors applied.
Another benefit is that it renders true what was once a hilarious response to a serious question. In the earlier days of the BCS the ranking formula consisted of a
linear combination of rational components. When asked how ties would be broken, a spokesmen demonstrated extreme innumeracy by responding "we'll just use more
decimal places." Since the 5th root of an integer that is not a 5th power of an integer is guaranteed to be irrational (transcendantal, even)
using more decimal places would actually work. Except there's still a tie when two teams have exactly the same number of the same ranks,
as when team A has { 1 1 2 3 3 } and team B has { 3 3 2 1 1 }. In such cases, teams A and B would be tied in the UTCFCC, and that is reasonable and proper.
Any "tiebreaker" would have to be arbitrary, in the sense that it cannot be derived from the input.
Comparisons
When all values are positive (as is the case with ranks) we always have arithmetic ≥ geometric ≥ harmonic mean, with equality holding only when all values are equal.
Since we're dealing with ordinal ranks where lower is better, this means harmonic is always better than geometric and geometric is always better than the arithmetic mean.
So what would a composite of all computer rankings look like if the geometric mean were used to form the composite? The values for all teams would be better than
the average value, but not by the same amount so the ordinal ranking of the composite is different. For the 54 week 1 ratings published on Thursday, Sep 4 16:06:02
the top 25 by geometric mean was:
Composite Construction Comparisons
Massey Ratings College Football Ranking Composite as of Thu Sep 4 16:06:02
Derived Ranks | | Values |
Cmaj | Borda | PWw | Avg | Mcon | Geo | Har | Tname | Cmaj | Borda | PWw | Avg | Mcon | Geo | Har |
1 | 1 | 1 | 1 | 1 | 1 | 1 | Florida State | 1 | 6829 | 127 | 1.54 | 1.52 | 1.2839 | 1.1673 |
4 | 2 | 3 | 2 | 2 | 2 | 3 | Oregon | 5 | 6588 | 125 | 6.00 | 5.91 | 4.6899 | 3.6162 |
3 | 3 | 3 | 3 | 3 | 3 | 2 | Alabama | 5 | 6559 | 125 | 6.54 | 6.38 | 4.7214 | 3.4220 |
2 | 4 | 3 | 4 | 4 | 4 | 4 | Auburn | 5 | 6505 | 125 | 7.54 | 7.47 | 5.4943 | 4.1713 |
6 | 6 | 8 | 6 | 6 | 5 | 5 | Baylor | 9 | 6418 | 120 | 9.15 | 9.18 | 7.3757 | 5.5256 |
5 | 5 | 5 | 5 | 5 | 6 | 9 | Stanford | 8 | 6449 | 123 | 8.57 | 8.68 | 7.5409 | 6.6342 |
8 | 7 | 7 | 7 | 7 | 7 | 7 | Michigan State | 9 | 6392 | 121 | 9.63 | 9.54 | 7.9030 | 6.3288 |
7 | 9 | 9 | 9 | 10 | 8 | 6 | Texas A&M | 9 | 6343 | 119 | 10.54 | 10.55 | 8.2504 | 5.7881 |
9 | 8 | 8 | 8 | 8 | 9 | 8 | Oklahoma | 9 | 6351 | 120 | 10.39 | 10.14 | 8.3381 | 6.4032 |
11 | 10 | 10 | 10 | 9 | 10 | 12 | Ohio State | 11 | 6337 | 118 | 10.65 | 10.54 | 9.6186 | 8.5975 |
13 | 11 | 12 | 11 | 11 | 11 | 11 | Southern California | 11 | 6292 | 116 | 11.48 | 11.59 | 9.8311 | 8.1160 |
10 | 12 | 10 | 12 | 12 | 12 | 13 | LSU | 10 | 6276 | 118 | 11.78 | 11.80 | 10.2236 | 8.7863 |
12 | 13 | 13 | 13 | 13 | 13 | 10 | Georgia | 11 | 6239 | 115 | 12.46 | 12.28 | 10.4851 | 8.0079 |
14 | 14 | 14 | 14 | 14 | 14 | 16 | UCLA | 13 | 6115 | 114 | 14.76 | 14.65 | 12.4110 | 10.0379 |
16 | 17 | 16 | 17 | 17 | 15 | 14 | Oklahoma State | 18 | 5872 | 112 | 19.26 | 19.71 | 14.5277 | 9.9889 |
15 | 16 | 16 | 16 | 15 | 16 | 15 | Missouri | 16 | 5882 | 112 | 19.07 | 19.23 | 14.5370 | 9.9999 |
17 | 15 | 17 | 15 | 16 | 17 | 17 | Louisville | 20 | 5883 | 111 | 19.06 | 19.39 | 16.9333 | 13.5815 |
19 | 18 | 19 | 18 | 18 | 18 | 18 | Mississippi | 20 | 5802 | 109 | 20.56 | 20.41 | 19.3837 | 18.2242 |
20 | 19 | 19 | 19 | 19 | 19 | 20 | Notre Dame | 21 | 5749 | 109 | 21.54 | 21.34 | 20.2056 | 18.9543 |
18 | 22 | 21 | 22 | 22 | 20 | 22 | South Carolina | 20 | 5589 | 107 | 24.50 | 24.39 | 21.6760 | 19.6480 |
22 | 21 | 23 | 21 | 21 | 21 | 21 | Arizona State | 23 | 5634 | 105 | 23.67 | 23.43 | 21.8403 | 19.5765 |
23 | 20 | 23 | 20 | 20 | 22 | 23 | Texas | 23 | 5682 | 105 | 22.78 | 23.05 | 21.9034 | 20.9461 |
21 | 23 | 21 | 23 | 23 | 23 | 19 | Wisconsin | 22 | 5550 | 107 | 25.22 | 25.00 | 21.9340 | 18.5364 |
26 | 24 | 26 | 24 | 24 | 24 | 24 | Nebraska | 26 | 5465 | 102 | 26.80 | 26.52 | 25.1618 | 23.6870 |
24 | 25 | 24 | 25 | 26 | 25 | 26 | Clemson | 25 | 5430 | 104 | 27.44 | 27.32 | 25.6479 | 24.1446 |
Full list |
|
|
The "Derived Ranks" are the ordinals resulting from sorting the list according to the "Values" for each definition of "composite." A few notes:
- The derived rank for my Consensus majority is based upon a tiebreaker that depends upon the input but is "arbitrary" in
the sense that it is only one of many possible choices. I only use it to determine a reporting sequence. Since the Value is itself
an ordinal rank it is the value I use in all of my team-oriented reports.
- PWw (pairwise wins) has a derived rank that is just #teamsranked-#pairwise wins (#pairwise wins is the value.)
There is no tiebreaker, although if one is needed the pairwise head-to-head results are available.
- Avg and MCon (the Massey Consensus) would be identical if I included the human polls. That the derived ranks differ at all indicates something about
the human polls.
All of the possibilities are interesting, but I have an aesthetic preference for a composite ordinal rank that is derived from ordinal ranks without
resorting to an average of any kind. I'll calculate any varieties we come across, but the ones I use will be based upon the simple (?)
methods based only upon counting.
© Copyright 2014, Paul Kislanko
1 |
The "distance" between rankings is the number of swaps a "bubble sort" would require to transform one rating to another. For the UTCFCC and its constituents,
the distances between the rankings and the composite, and between each other are:
| UCC | MAS | WOL | SAG | BIL | COL |
UCC | ∗ | 565 | 599 | 701 | 741 | 850 |
MAS | 565 | ∗ | 618 | 668 | 998 | 1243 |
WOL | 599 | 618 | ∗ | 952 | 886 | 1206 |
SAG | 701 | 668 | 952 | ∗ | 984 | 1314 |
BIL | 741 | 998 | 886 | 984 | ∗ | 1201 |
COL | 850 | 1243 | 1206 | 1314 | 1201 | ∗ |
|
|