During the 2007 college baseball season I came acrossed a "correlation test" for ordinal rankings that turns out to be particularly useful. Keeping the math-y bits simple:
It turns out that there's an ordinal ranking correlation that is quite useful. It is called the Kendall τ-test. What makes it especially useful for our purposes is that there is a metric associated with it called the τ Distance that makes it possible to find out which teams contribute the most to correlation or non-correlation of two rankings that include all the same teams.
The τ-Distance is the total number of position swaps that are required to transform either ranked list into the order specified by the other. What makes that useful is that at the same time we count swaps we can count how many times each team was involved in a swap, so we can find out which teams contribute most to the non-correlation.
An example that demonstrates how the τDistance for a team gives more insight than the usual measures of correlations is provided by comparing the ranks from the final 2006 computer rankings to the the 2006 pre-season rankings:
Team Pre Final τD(Pre,Final) LSU 5 5 4 Here the comparison is from the pre-season computer rankings for LSU to the final rankings. Since the Tigers were predicted to be 5th and wound up 5th, we might think the pre-season rankings were perfect for the team (the obvious correlation is the square of the differences in rankings, which is zero.) But the τDistance contribution by LSU is 4, not zero.
What the τ does is measure the ranking of LSU against every other team. In the pre-season list, #1 Texas and #4 were higher than LSU's #5 but were #18 and #19 compared to LSU's #5 in the final list, and #14 Florida and #12 Louisville were behind LSU in the preseason but finished #1 and #3 in the final. So the 4 pairs (LSU, Texas), (LSU, Virginia Tech), (Florida, LSU) and (Louisville, LSU) are reversed in the two rankings, resulting in that contribution to the Distance.
Pairs whose relative ranks are reversed in the orderings are sometimes called discordant. The τ Distance between two rankings is just the number of discordant pairs.
The Kendall τ correlation for the full pre-season computer rankings and the final ones is 0.5354 with total distance 3262. So for all team pairs, the pre-season had the "correct" team ranked higher only about 53.5 percent of the time. To tell the truth, that's actually higher than I expected.
|
|
Notice that the τ values in these comparisons are a lot closer to one than the value found for the pre-season vs final test.
Now, while not a fan of some aspect's of Bilingsley's method, especially the carry-over from year to year, I also have argued that too much "sameness" in the computers would mean only one is necessary. Nonetheless, when we calculate the τ-Distance for each pair of BCS computers, we find that Billingsley's system has the most "discordant pairs" compared to every other system than any pair of systems that doesn't include his.
SE | COL | WOL | AND | BIL | ←Reg | Final → | SE | AND | COL | WOL | BIL | ||
MB | 166 | 998 | 818 | 644 | 1330 | Season | MB | 366 | 624 | 858 | 712 | 1182 | |
SE | 996 | 776 | 646 | 1388 | SE | 682 | 872 | 682 | 1196 | ||||
COL | 512 | 614 | 1124 | AND | 494 | 520 | 1110 | ||||||
WOL | 610 | 1256 | COL | 498 | 1084 | ||||||||
AND | 1146 | WOL | 1226 |
One could imagine an objective criterion for including one of the 100+ available computer rankings framed in terms of having a lower τ-Distance when compared to the average BCS computer component than a rating that is already included.