Fairness and Simplicity

© Copyright 2005, Paul Kislanko

In my last article I suggested a method for counting votes in the human poll that would provide the desired "transparency" of the polling process and still provide for anonymous ballots. In this one I'll turn my attention to how the polls (and computer rankings) might be used to avoid the problems with putting too much emphasis on one component.

The problem with human polls is that they are subject to human nature. A team that has a bye week is likely to fall in such a ranking, a team that's ranked highly early tends to remain that way even if other teams are playing better, and so on. But because human nature is somewhat universal, all of the human polls are subject to these phenomena. So, the first recommendation is

just use one poll.

Coaches and Harris Poll Comparison
Week 9
Best Team USA HAR   d(CP) d(HP) d(Team)
1 Southern California 1 1   0 0   0
2 Texas 2 2   0 0   0
3 Virginia Tech 3 3   0 0   0
4 Alabama 4 4   0 0   0
5 Miami-Florida 5 5   0 0   0
6 LSU 6 7   0 1   1
6 UCLA 7 6   1 0   1
8 Florida St 8 8   0 0   0
9 Notre Dame 9 9   0 0   0
10 Georgia 10 11   0 1   1
10 Penn State 11 10   1 0   1
12 Ohio State 12 12   0 0   0
13 Oregon 13 13   0 0   0
14 Wisconsin 14 14   0 0   0
15 Florida 15 15   0 0   0
16 West Virginia 16 18   0 4   2
16 Texas Tech 17 16   1 0   1
17 Auburn 18 17   1 0   1
19 Boston College 19 20   0 1   1
19 TCU 20 19   1 0   1
21 California 21 21   0 0   0
22 Fresno St 22 23   0 1   1
22 Michigan 23 22   1 0   1
24 Colorado 24 24   0 0   0
25 Louisville 25 25   0 0   0
  Variance from Best       0.49 0.57    
  Matches Best       19 20    

After the first half of the season (really well before that) there's just not enough difference between any of the human polls to tell them apart. Given that the AP pulled out because their voting is too transparent and the coaches have an obvious conflict of interest or two, we may as well use the Harris poll. It is sponsored by the BCS anyway, and if humans should have more weight (with which I disagree) that can be factored into the formula.

As you can see from this comparison of the polls after week eight, the coaches poll and Harris poll:
  • selected exactly the same "top 25" teams;
  • agreed exactly on 14 ranks;
  • differed by one position on 10 ranks;
    and
  • the maximum difference in ranks for any team is two for one team

The only difference in the variance from the better rank of the two polls is exactly that one has Virginia Tech 16th and the other 18th.

The computers are not nearly as unanimous, which in general is a good thing. They all have different means of handling strength of schedule, some have different weights for later games, some (like human voters) carry over results from prior years, some factor in game location, some opponents'and opponents' opponents' records, and so forth. To the extent that these are all important to judging the quality of a team, some synthesis of different perspectives is desirable.

I have always thought that the way the human polls and computer rankings were handled should be the same, and in 2004 they changed the formula so that it superficially was. However, the normalization was to the "# of voters" level, and still resulted in different weights for each "voter". Also, the "don't include best and worst computer ranking" has no analogue in the human polls (though Harris' "trimming" in their second poll was similar in spirit).

So this brings us to my second recommendation:

use only ordinal ranks to combine all the components
. Just as voters can only enter their first through 25th choices, take only the top 25 teams from each computer, and only the top 25 from the results of the Harris Poll. We have been inferring that the number of points associated with teams in the "others receiving votes" category provided a 26th, 27th, etc. team, but that's not the case. The computers rank all 119 teams 1 through 119, so 26th in a computer ranking is not the same as 26th in the Harris poll (which might be 3 voters ranking the team 18th, 22nd, and 25th).

Taking only the ordinal ranks makes sense because we do not know in general how the different computers come up with theirs. The fact that we know the humans use a flawed election method (Borda) to do so does not make that method a useful one for combining the computer rankings into one. If we list only the ordinal rankings from each source component for week 9 we get:
BCS Computer Rankings + Harris Results
(Top 25 only)
Team AND BIL COL HAR MB SE WOL
Texas 1 1 1 2 1 1 1
Southern California 2 2 4 1 3 2 3
Virginia Tech 3 3 2 3 2 3 2
Alabama 4 8 5 4 4 4 4
UCLA 5 5 6 6 6 6 6
Penn State 6 12 3 10 5 5 5
Wisconsin 7 6 8 14 8 8 8
Miami-Florida 8 9 14 5 9 11 12
Ohio State 9 13 9 12 7 7 9
Georgia 10 4 10 11 17 18 14
LSU 11 7 11 7 15 17 16
Oregon 12   7 13 11 9 7
Florida St 13 14 13 8 16 13 13
Florida 14 10 18 15 18 16 17
TCU 15 15 15 19 25 23 15
West Virginia 16 16 12 18 12 12 10
Colorado 17 18 19 24 14 15 18
Michigan 18 17 17 22 13 14 19
Boston College 19 22 20 20 20 20 20
Oklahoma 20   21   19 22 21
Texas Tech 21 11 16 16 10 10 11
Notre Dame 22 24 22 9 22 19 22
Georgia Tech 23 25 25   24 25  
Minnesota 24   23   21 21 23
Northwestern 25   24   23 24 24
Boise St   19          
Auburn   20   17      
California   21   21      
Louisville   23   25      
Fresno St       23     25
Components are listed alphabetically left to right,
and teams ordered only by leftmost rank.

The Formula?

This is fairly simple, though it is based upon the "majority rule" principle I discussed last time as the Bucklin method. Basically it says if more than half of the components (four in this case, since we have seven inputs) agree a team should be ranked N or higher, we give that team rank N. Nothing could be simpler, except there will be ties.
1 Drop the three lowest ranks (including "not ranked")
This eliminates all the teams that are not considered "top 25" by the majority of inputs.
2 Drop the three highest ranks
The remaining rank (which may often be equal to some or all of those dropped in this step) is the highest rank for which a majority of the input rankings agree the team deserves.
If all of the inputs agree on which teams are the 25 best, this is equivalent to just selecting the median ranking for each team. If not all inputs rank the team but a majority do, it is just "select the fourth highest ranking" for each team.

There usually will be ties, though, and there's a two-stage tiebreaker. For teams with the same "majority ranking":

1 Order the tied teams by the number of input rankings that contributed to the tie
A team that 5 inputs agree that team X should be ranked 5th or better is ranked higher than team Y if only 4 inputs agreed that team Y should be ranked 5th or better.
2If teams are still tied after stage 1, then order the tied teams by "how close the teams are to winning the rank"
There are many ways to do this, but essentially it involves just taking the ballots that did not contribute to the selection of rank N for each of the tied teams and use those for only the tied teams to hold a new "election" for rank N+1.
An example using the inputs above illustrates the process.
Suggested RankingNotes
Team Majority #≤Maj Tie
Brkr
Borda
1 Texas 1 6   825
2 Southern California 2 4   816
3 Virginia Tech 3 7   815
4 Alabama 4 5   800
5 Penn State 5 4   787 Using Borda would've had UCLA ahead of Penn State even though a majority of the inputs had the Nittany Lions ranked higher than the Bruins
6 UCLA 6 7   793
7 Wisconsin 8 6   774
8 Ohio State 9 5   767 5 inputs have the Buckeyes ranked 9th or better but only 4 have the Hurricanes that high. This illustrates a stage 1 tiebreaker.
9 Miami-Florida 9 4   765
10 Oregon 11 4 1.51 655 Here we have an example of a stage 2 tiebreaker. Oregon has votes for #12 and #13, Georgia for #14, #17, and #18, LSU #15, #16, #17, and Texas Tech has 2 #16s and a #21. Clearly Oregon's closer to a 5th #11 vote (by far) than any of the others.
Borda would've ranked Oregon behind these teams, just because one input left them out.
11 Georgia 11 4 0.64 749
12 LSU 11 4 0.62 749
13 Texas Tech 11 4 0.50 738
14 West Virginia 12 4   737
15 Florida St 13 5   743
16 TCU 15 4   706
17 Florida 16 4   725
18 Michigan 17 4   713
19 Colorado 18 5   708
20 Boston College 20 6   692
21 Oklahoma 21 4   492
22 Notre Dame 22 6   693
23 Minnesota 23 4   483
24 Northwestern 24 4   475
25 Georgia Tech 25 5   473
 
26 Auburn       201 Teams that aren't ranked in the top 25 by at least four of the seven inputs are listed in Borda order. But by using a "true" Borda count we can tell that the first four of these were listed on two of the inputs and Boise State on only one of the seven.
So we know that that Auburn needs to move into the top 25 in two more computers to get a ranking, and Boise needs to improve in three.
27 California       196
28 Louisville       190
28 Fresno St       190
30 Boise St       100

This approach is very simple, and likely to result in better orderings than the current one which gives too much weight to the polls. Based upon the reaction a few years ago when the computers determined the best matchup, one would expect that there would be complaints that the human polls only have 1/7th input. That is not quite correct, since the influence that any one component has depends upon where it fits with respect to all of the other components.

In any case, for years we've avoided the real issue - if computers "aren't to be trusted" then we need some unambiguous way to define what "trusted" means in terms of picking the best two teams. That will be the subject of my next essay.