Ratings' rankings Correlations

© Copyright 2012, Paul Kislanko
20 November 2012

I've always wanted a way to display how (dis)similar different ratings are. We ought to be able to tell which of David Wilson's Categories a rating belongs to by how closely its ranking correlates to a rating specifically designated as being in the category.

Given only ordinal rankings for all the ratings published by Dr. Massey, one way to find correlation between Ratings A and B is to count the number of swaps a bubble sort would require to make the rankings identical. The "simple" way to count them is for each pair of ratings(A,B) count the number of team pairs that are in the same relative order.

For team-pair (Ta,Tb) and ratings R1 and R2, if
RankR1(Ta) - RankR1(Tb)
has the same sign as
RankR2(Ta) - RankR2(Tb)
then the pair (Ta,Tb) is said to be concordant with respect to R1 and R2.

For rating pairs (Ri,Rj) the "distance" between the Ratings is the number of discordant pairs, and the smaller that is the "more like" each other they are.

With 124 ranked teams, there are 7,626 pairs, and it takes two comparisons for each of them for each pair of ratings. As of this writing, there are 111 ratings = 6,105 ratings-pairs, so just to calculate these correlations requires

7626 × 2 × 6105 = 93,113,460

The "closest" pair of ratings is Maas (MAA) and Roundtable BCS (RTB) with only 51 discordant of the 7,626 team-pairs. The largest discrepency by team is 3, which occurs six times: Cincinnati, Tulsa, Michigan State, Arkansas State, Toledo and Purdue. Wilson's categorizations are:

Maas Adv/Win/Ret
Roundtable BCS Frm/Scr/Ret
Although Roundtable is categorized as Retrodictive, its Description includes the line
Because the rating is based on scores and margins, it can be used to attempt to predict the margins of coming games.
It makes more sense to me that a Predictive Formula-based system would be close to an Advanced Retrodictive rating than there's a retrodictive formula that turns out to be close to an advanced one.

The "farthest apart" are Bassett and Stupey with 1,909 discordant pairs It takes 71 swaps to put Utah in the same position in both lists and the average difference for team-ranks is nearly 31 spots. The Wilson classification of Bassett is Adv/Hom/Pre/ and there doesn't appear to be one (yet?) for Stupey, but we might guess the opposites.

In the team-oriented data given above, the numbers are not necessarily the rank-difference between the ratings-pairs. It is the sum of teams ranked higher than the given team in rating 1 but lower than the given team in rating 2 plus the number of teams ranked lower than the given team in rating 1 but higher than it in rating 2. This is often just the rank difference but it does not have to be.

To display the list, for each rating order all other ratings in ascending distance order and assign them ranks 1,2,3...,110. Using the standard Borda vote-counting technique (110 points for being closest to the team doing the "ranking" down to zero points for being the farthest from it) we can order the ratings by average ranking by other ratings.

The "vote counts by rank" gives an idea of how rating distances are distributed with respect to each other. Note that this order is in no way a measure of relative merit. The "best" rating (whatever that means) could well be the one least like a majority of others. That chart just shows how many other ratings for which it was closest, 2nd-closest, 3rd-closest, etc.

What we're really interested in for each rating is how it ranked all the other ratings. The 25 Most-Similar Rankings for each Rating lists the ratings' "top 25 ballots" - for each rating the 25 other ratings that are closest to it, along with the distance to each of those. The link to Distance Between Rating Rankings presents the distance between any of the ratings pairs.

As intermediate results we can also count the number of discordant pairs each team contributes to across all ratings. Unsurprisingly, teams ranked near the top or bottom of most rankings are most consistently ranked. The teams with the highest variation in their rankings are Kent State (averages a 21-spot difference between rating-pairs), Ohio (20) and Ball State (20).

 No electrons were harmed in the production of these reports.