Ratings Rantings

© Copyright 2005, Paul Kislanko

In Fairness and Simplicity I made an argument that since the human polls were nearly identical we only needed one, and further suggested that both the Harris Poll's and the six computers all be treated as ballots allowing only 25 teams to be ranked. The summarization technique I recommended was just to take the 4th-best ranking for each team (which would be the median value if all ballots ranked each team).

Just as I compared the Harris and Coaches' polls to each other, we can compare each of the seven components to the fourth-best rank to see which ones contribute the most to it.
  COL AND SE MB WOL Har BIL   Team
Texas 0 0 0 0 0 1 0   0.38
Southern California 4 0 0 1 1 1 0   1.00
Virginia Tech 1 0 0 1 1 0 0   0.65
Alabama 1 0 0 0 0 0 16   1.56
Penn State 4 1 0 0 0 25 49   3.36
UCLA 0 1 0 0 0 0 1   0.53
Wisconsin 0 1 0 0 0 36 4   2.42
Ohio State 0 0 4 4 0 9 16   2.17
Miami-Florida 25 1 4 0 9 16 0   2.80
LSU 0 0 36 16 25 16 16   3.95
Georgia 1 1 49 36 9 0 49   4.55
Texas Tech 25 100 1 1 0 25 0   4.66
Oregon 16 1 4 0 16 4 11881   41.27
West Virginia 0 16 0 0 4 36 16   3.21
Florida St 0 0 0 9 0 25 1   2.24
TCU 0 0 64 100 0 16 0   5.07
Florida 4 4 0 4 1 1 36   2.67
Michigan 0 1 9 16 4 25 0   2.80
Colorado 1 1 9 16 0 36 0   3.00
Boston College 0 1 0 0 0 0 4   0.85
Oklahoma 0 1 1 4 0 9801 9801   52.93
Notre Dame 0 0 9 0 0 169 4   5.10
Minnesota 0 1 4 4 0 9409 9409   51.86
Northwestern 0 1 0 1 0 9216 9216   51.32
Georgia Tech 0 4 0 1 9025 9025 0   50.79
                 
Auburn 0 0 0 0 0 10609 10000   54.26
California 0 0 0 0 0 9801 9801   52.92
Louisville 0 0 0 0 0 9025 9409   51.32
Fresno St 0 0 0 0 9025 9409 0   51.32
Boise St 0 0 0 0 0 0 10201   38.17
             
Variannce from Median 1.65 2.13 2.54 2.67 24.58 50.57 51.62
Consistent w/ majority 18 19 19 17 17 11 15
The last two rows show each component's importance to the result. "Variance from median" (really the fourth-highest ranking) is a rough measure of how the component represents the whole, with lower numbers being "better". Note: this does not make the rating better than any other rating - it just means the rating more nearly represents the final result. "Consistent w/ majority" is the number of the 25 rated teams for which the component had the team ranked that high or higher.

The variance number is somewhat exaggerated, since by far the largest contributors to it are failing to rank a team in the top 25 that at least four components did rank in the top 25, and ranking a team in the top 25 that no more than two other components did. Nonetheless, the table suggests that there's more to look at regarding the computers.

Using the same technique as with the Harris and Coaches polls, I compared each of the BCS components (except the coaches poll, because it is so close to the Harris poll) to each other.

COL AND SE MB WOL HAR BIL
COL 1.65 1.66 1.96 2.17 18.68 36.74 36.95
AND 1.78 2.13 2.76 2.77 18.80 36.71 36.94
SE 2.64 2.74 2.54 0.96 18.74 36.75 37.05
MB 2.79 2.80 1.11 2.67 18.77 36.79 37.05
WOL 18.67 19.10 18.69 18.90 24.58 32.61 40.28
HAR 36.05 36.11 36.18 36.66 32.01 50.57 26.86
BIL 37.75 37.14 37.65 37.81 40.97 28.00 51.62
The diagonal (shaded) is the team's correlation to the median from above, and the other positions in the row are the team's correlation to the best ranking when paired with each of the other inputs. In the pairwise comparison, a lower number for team A than team B just means that ranking was the better one more often. The four computers that all picked the same top 25 are close to each other but vary significantly more than the human components (0.49, 0.57).

As might be expected from its attributes (carryover from previous year, giving more weight to recent games, etc.) Billingsley's program correlates more closely to the Harris poll than to the other computer rankings. The strongest pairwise correlation is between Massey-BCS and Sagarin-ELO, which may be a consequence of their having been modifications of MOV-based systems to meet the BCS' requirement to not use MOV.

I added the original Massey and Sagarin ratings to the original ballots to see what effect margin of victory -based systems might have. It turns out that their "ballots" have some of the same characteristics as the Harris poll. Again, that's not too surprising because we know that humans are affected by actual versus expected margin of victory.

So it seems to me that a better approach than 2/3 human polls and 1/3 computer average would be 2/3 non-MOV systems plus 1/3 MOV systems. The MOV systems would be the Harris Poll plus the original Massey and Sagarin ratings, and the 2/3 the six computer systems currently used. The modified counting rules would be:

  1. Drop the bottom four ranks (including not-ranked)
  2. Drop the top four ranks
Or, equivalently, take the fifth-highest rank of the nine, even if that is "not in top 25."

When this is applied with the extra ballots, the same top 10 and top 25 result. The only positions shifted are the teams that were in the 11-20 range according to the non-MOV systems. This could be important for determining BCS-bowl eligibility, but the consensus at the top is hardly affected.

Majority-Rule with Massey and Sagarin-Predictor
4 Nov 2005 09:44:18 (US Central)

w/o
MOV
  with
MOV
Majority # ≤
Maj
Tie
Brkr
Borda Team AND MB WOL SE COL BIL HAR MAS SAG-P
1   1 1 7   1060 Texas 1 1 1 1 1 1 2 1 2
2   2 2 6   1051 Southern California 2 3 3 2 4 2 1 2 1
3   3 3 8   1046 Virginia Tech 3 2 2 3 2 3 3 3 4
4   4 4 5   909 Alabama 4 4 4 4 5 8 4 10  
5   5 5 5   1015 Penn State 6 5 5 5 3 12 10 4 6
6   6 6 7   1000 UCLA 5 6 6 6 6 5 6 13 18
7   7 8 6   985 Wisconsin 7 8 8 8 8 6 14 11 16
8   8 9 7   995 Ohio State 9 7 9 7 9 13 12 5 5
9   9 9 5 1.37 984 Miami-Florida 8 9 12 11 14 9 5 7 12
10   10 9 5 1.09 878 Oregon 12 11 7 9 7   13 6 9
13   11 11 5   959 Texas Tech 21 10 11 10 16 11 16 9 8
15   12 13 5 1.98 946 Florida St 13 16 13 13 13 14 8 15 20
12   13 13 5 1.42 958 LSU 11 15 16 17 11 7 7 16 13
11   14 14 5   851 Georgia 10 17 14 18 10 4 11 17  
14   15 16 6   834 West Virginia 16 12 10 12 12 16 18 22  
18   16 17 6   930 Michigan 18 13 19 14 17 17 22 14 7
17   17 17 5   925 Florida 14 18 17 16 18 10 15 19 19
19   18 18 5   905 Colorado 17 14 18 15 19 18 24 20 21
16   19 19 5   800 TCU 15 25 15 23 15 15 19 25  
20   20 20 6   887 Boston College 19 20 20 20 20 22 20 21 22
22   21 22 8   920 Notre Dame 22 22 22 19 22 24 9 8 3
21   22 22 5   492 Oklahoma 20 19 21 22 21        
23   23 23 5   679 Minnesota 24 21 23 21 23     18 24
24   24 25 6   569 Northwestern 25 23 24 24 24       25
25   25 25 5   473 Georgia Tech 23 24   25 25 25      
                               
            401 Auburn           20 17 24 14
            393 Fresno St     25       23 12 23
            298 Louisville           23 25   11
            205 Michigan St               23 10
            196 California           21 21    
            104 Arizona St                 15
            102 Iowa                 17
            100 Boise St           19      

This would be a better mix than we've had before - there are the four computers that correlate fairly closely to each other based only on winning percentage and WP-based strength of schedule; the four that tend to be more diverse; and the human poll to give an appropriate odd number. The ones that explicitly take into account margin of victory (including the Harris Poll) comprise only a third of the inputs, so the emphasis on running up scores is much lower than in the 2/3 human plus 1/3 computer breakdown. Finally, there's virtually no way that any one particular voter or computer system can inadvertantly or intentionally skew the results "unfairly" as was alleged in 2004.


Sources for the input values used in the simulation:
Sagarin-Predictor
Jeff Sagarin's Football Ratings
All others
Kenneth Massey's College Football Ranking Comparison
All systems are reported as they were based upon games through October 29, 2005.

Games through 5 November