Meta-Ranking Construction

September 3, 2014

The debut of the USA Today College Football Computer Composite and its inclusion in Dr. Massey's College Football Ranking Composite prompts contemplation on the role and construction of composite rankings.

There are some good reasons to exclude it from my own computer ranking analysis, and some for including it.

It does not coreespond to a rating system, as it uses the constituent systems' ranks as input. There is inevitable loss of precision in any such consolidation. This is balanced by the fact that the ranks are combined in an interesting and unique way (more on that in a bit.)
Had I been tasked with construction of a composite rating, I think I would have done as I do with my baseball ratings comparisons: exploit the fact that the rating values for the systems are normally distributed to "scale" the team values to a distribution with mean zero and standard deviation one. The results for each team can then be combined in any arithmetical way while preserving the relative differences between team values in the different systems. I may take the trouble to do exactly that if I get time.
Each of the constituent rating systems is already included in my analysis, so including a composite derived from them gives these more weight than the other ratings. At least in the first week's edition of the ranking, the composite appears to be different enough from each individual component¹ for that not to be a large concern.

I am including the ranking for now because part of the premise of my analysis is that a composite has the potential to be a better measure of the field, and including several such supports investigation of the premise's correctness. To facilitate this, I am also including the "Consensus Rank" from Dr. Massey's composite in the Weighted Retrodictive Ranking Violations (but not the Borda, Bucklin or Pairwise Ranking calculations.)

WHAT? Mean?

I was especially intrigued (and pleased) by the choice of the geometric mean of the five rankings to order the results. I had been using it as one way to characterize schedule strengths based upon opponents' ranks without undue influence of the inevitable (alas!) games against unranked opponents.

A reminder:

The arithmetic mean: is the usual average of the ranks.
The harmonic mean: is the inverse of the average of the ranks' inverses.
The geometric mean: is the antilogarithm of the average of the ranks' logarithms.

For both the harmonic and geometric mean, lower rank values effectively have more weight than higher ones, in the case of the harmonic mean so much so that the composite is completely dominated by the best ranks in the list. The effect is as if the average is "weighted" by functions such as the ones shown in the graph. Contributions to results

The "weight function" for the geometric mean in this graph should not be taken too literally - this particular function was chosen to illustrate how the shape differs from that of the harmonic mean.

Having some kind of "decay" for rankings far worse than most without resorting to removing them entirely is desirable, and it can be argued that if one or some ratings are much better than most that is a measure of some desirable quality that the team demonstrates. Note that my "Majority Consensus" composite is a "step function", that assigns a zero weight to every rating worse than the majority rank.

Since the the UTCFCC has only five components and it is reasonable to expect that the spread of ranks would not be excessive, the geometric mean strikes me as a much much better method than the BCS and its predecessors applied.

Another benefit is that it renders true what was once a hilarious response to a serious question. In the earlier days of the BCS the ranking formula consisted of a linear combination of rational components. When asked how ties would be broken, a spokesmen demonstrated extreme innumeracy by responding "we'll just use more decimal places." Since the 5^th root of an integer that is not a 5^th power of an integer is guaranteed to be irrational (transcendantal, even) using more decimal places would actually work. Except there's still a tie when two teams have exactly the same number of the same ranks, as when team A has { 1 1 2 3 3 } and team B has { 3 3 2 1 1 }. In such cases, teams A and B would be tied in the UTCFCC, and that is reasonable and proper. Any "tiebreaker" would have to be arbitrary, in the sense that it cannot be derived from the input.

Comparisons

When all values are positive (as is the case with ranks) we always have arithmetic ≥ geometric ≥ harmonic mean, with equality holding only when all values are equal. Since we're dealing with ordinal ranks where lower is better, this means harmonic is always better than geometric and geometric is always better than the arithmetic mean.

So what would a composite of all computer rankings look like if the geometric mean were used to form the composite? The values for all teams would be better than the average value, but not by the same amount so the ordinal ranking of the composite is different. For the 54 week 1 ratings published on Thursday, Sep 4 16:06:02 the top 25 by geometric mean was:

Composite Construction Comparisons

Massey Ratings College Football Ranking Composite as of Thu Sep 4 16:06:02

Derived Ranks								Values
Cmaj	Borda	PWw	Avg	Mcon	Geo	Har	Tname	Cmaj	Borda	PWw	Avg	Mcon	Geo	Har
1	1	1	1	1	1	1	Florida State	1	6829	127	1.54	1.52	1.2839	1.1673
4	2	3	2	2	2	3	Oregon	5	6588	125	6.00	5.91	4.6899	3.6162
3	3	3	3	3	3	2	Alabama	5	6559	125	6.54	6.38	4.7214	3.4220
2	4	3	4	4	4	4	Auburn	5	6505	125	7.54	7.47	5.4943	4.1713
6	6	8	6	6	5	5	Baylor	9	6418	120	9.15	9.18	7.3757	5.5256
5	5	5	5	5	6	9	Stanford	8	6449	123	8.57	8.68	7.5409	6.6342
8	7	7	7	7	7	7	Michigan State	9	6392	121	9.63	9.54	7.9030	6.3288
7	9	9	9	10	8	6	Texas A&M	9	6343	119	10.54	10.55	8.2504	5.7881
9	8	8	8	8	9	8	Oklahoma	9	6351	120	10.39	10.14	8.3381	6.4032
11	10	10	10	9	10	12	Ohio State	11	6337	118	10.65	10.54	9.6186	8.5975
13	11	12	11	11	11	11	Southern California	11	6292	116	11.48	11.59	9.8311	8.1160
10	12	10	12	12	12	13	LSU	10	6276	118	11.78	11.80	10.2236	8.7863
12	13	13	13	13	13	10	Georgia	11	6239	115	12.46	12.28	10.4851	8.0079
14	14	14	14	14	14	16	UCLA	13	6115	114	14.76	14.65	12.4110	10.0379
16	17	16	17	17	15	14	Oklahoma State	18	5872	112	19.26	19.71	14.5277	9.9889
15	16	16	16	15	16	15	Missouri	16	5882	112	19.07	19.23	14.5370	9.9999
17	15	17	15	16	17	17	Louisville	20	5883	111	19.06	19.39	16.9333	13.5815
19	18	19	18	18	18	18	Mississippi	20	5802	109	20.56	20.41	19.3837	18.2242
20	19	19	19	19	19	20	Notre Dame	21	5749	109	21.54	21.34	20.2056	18.9543
18	22	21	22	22	20	22	South Carolina	20	5589	107	24.50	24.39	21.6760	19.6480
22	21	23	21	21	21	21	Arizona State	23	5634	105	23.67	23.43	21.8403	19.5765
23	20	23	20	20	22	23	Texas	23	5682	105	22.78	23.05	21.9034	20.9461
21	23	21	23	23	23	19	Wisconsin	22	5550	107	25.22	25.00	21.9340	18.5364
26	24	26	24	24	24	24	Nebraska	26	5465	102	26.80	26.52	25.1618	23.6870
24	25	24	25	26	25	26	Clemson	25	5430	104	27.44	27.32	25.6479	24.1446
Full list

The "Derived Ranks" are the ordinals resulting from sorting the list according to the "Values" for each definition of "composite." A few notes:

The derived rank for my Consensus majority is based upon a tiebreaker that depends upon the input but is "arbitrary" in the sense that it is only one of many possible choices. I only use it to determine a reporting sequence. Since the Value is itself an ordinal rank it is the value I use in all of my team-oriented reports.
PWw (pairwise wins) has a derived rank that is just #teamsranked-#pairwise wins (#pairwise wins is the value.) There is no tiebreaker, although if one is needed the pairwise head-to-head results are available.
Avg and MCon (the Massey Consensus) would be identical if I included the human polls. That the derived ranks differ at all indicates something about the human polls.

All of the possibilities are interesting, but I have an aesthetic preference for a composite ordinal rank that is derived from ordinal ranks without resorting to an average of any kind. I'll calculate any varieties we come across, but the ones I use will be based upon the simple (?) methods based only upon counting.

The "distance" between rankings is the number of swaps a "bubble sort" would require to transform one rating to another. For the UTCFCC and its constituents, the distances between the rankings and the composite, and between each other are:

	UCC	MAS	WOL	SAG	BIL	COL
UCC	∗	565	599	701	741	850
MAS	565	∗	618	668	998	1243
WOL	599	618	∗	952	886	1206
SAG	701	668	952	∗	984	1314
BIL	741	998	886	984	∗	1201
COL	850	1243	1206	1314	1201	∗