Ratings as Voters Analysis - Site changes

August 14, 2014

In the "About" description of the College Football Ranking Composite page Dr. Massey writes

It is a challenge to present so much information clearly on one web page. As the number of rankings has grown, a convienent display of them has become more difficult.
Considering that the motivation of this site is to analyze the rankings a number of different ways I have taken the "discretion is the better part of valor" approach and broken the analyses into multiple pages.

In recent years my rankings analysis has consisted of one (very) large page with multiple reports, with one edition per week of the season. This was more by coincidence than design, since to produce it I had just taken a program used to count ranked-ballot elections in different ways and applied it as if the rating systems were voters.

Beginning with the 2014 season, I'm replacing the row of weekly reports on the home page with a single link ("Analysis") beneath the link to the page on MasseyRatings that provides the data being analyzed.The Analysis page references each of the reports individually.

Separating the reports makes it easier to demonstrate the points of the analyses they support, and providing a single index to them affords the ability to summarize their derivation. Reports not provided before make explicit some of the analysis that was "hidden" in the construction of the previous single report.

Structure and Formats

Each report has a definition, the heading of which is always a link to the most current version of the report. There will also be links to the point-in-time versions corresponding to weeks of the season. As appropriate I may also include reports that show variations in computed values over time.

Top 25 As I say in the analysis index, I only include this report to demonstrate how much information is left out of the usual media presentation of their poll results. To see what I mean note the number of teams in the report on fewer (sometimes by a lot) top-25 ballots that are ahead of teams on more ballots but ranked worse by the way the votes are counted. You can see why the math works the way it does because the report includes the distribution of all top-25 votes. Because the ballots are truncated this method gives far too much weight to "outlier" ratings at the top of the rankings.

BordaThe Borda Count addresses part of the problems with the top-25 approach, because ballots are not truncated. All ranks for all teams are considered, and it is unlikely that a team with only a few top-25 votes will rank higher than a team with more top-25 votes at worse ranks when votes at ranks 26-128 are included.

But the outlier problem is exasperated because when all ranks are considered outliers at the other end of the ballot become relevant. Ratings that have the most "outliers" (too high or too low) for a team wind up having more influence on the team's rank than ratings that all agree on a middle value.

The effect results in a subtle characteristic of this ranking: when a Borda rank is assigned by sorting the list in descending Borda count order, the team's value might be a rank that no rating has assigned it. In the ranking distribution report for each team I highlight the number of ratings that have the team ranked the same as the team's Borda rank, and if there are none include a in that column. You will find a lot of them.

Majority Consensus This is my variation of a Bucklin method of aggregating multiple ranked-lists as a single ranking. To each team it assigns the best rank for which a strict majority of ratings assign the team this rank or a better one. When there are an odd number of rankings this is the same as the arithmetic median of all ranks. If there are an even number of ratings, it is the best rank worse than that (because "strict majority" is 50%+1.)

In this report the distribution of ranks colors "#ratings with team ranked rank" blue if the rank contributes to the majority and emphasizes the number of ratings that rank the team exactly as does the consensus.

Teams with the same Majority Consensus rank are listed in descending order of the size of the majority. When those are equal, ties are broken using what amounts to an inverse borda system using only the ratings included in the majorities. This ignores all ranks greater than the consensus rank and gives lower weights to ranks farther from the consensus.

The (Bucklin) Majority Consensus rank is the one I usually use as the team rank on reports that include a team rank.

  For both Borda and Majority Consensus the link from the team name is to a list of the ranks assigned to a team by any rating and for each of those a list of the ratings that assigned the team that rank.
Pairwise MatrixThis ranking orders the teams by the number of pairwise wins against other teams. Row i column j contains the number of ratings that have teami ranked better than teamj.

When teami has more pairwise wins than teamj but teamj has a pairwise win (is ranked better on more ballots) over teami the values for those team-pairs are emphasized. These indicate a violation of transitivity.

The Smith Set is the subset of the team wherein each member of the subset has pairwise wins over every team outside of the subset. You can find it by noting the first team that has a pairwise loss to a lower-ranked team (the pairwise win count is bold) - all teams listed above it are in the largest Smith Set.

The team name links to a page that shows each pairwise comparison and for each other team lists the ratings that have the team ranked better and worse than the opposite team.

CorrelationsThese reports compare the ratings based upon their rankings' correlations to the teams' Majority Consensus ranking and to each other. The method used is described in the analysis page.

There are two reports. The first lists each team's MAJority Consensus rank and for each rating (including the consensus rank reported on MasseyRatings) how "far" from the consensus that rating is. The distance is not just the rank difference (though it may be the same value as the rank difference) - it is the number of times the team would participate in a bubble sort swap.

The teams are ordered top to bottom by Majority Consensus rank, and the ratings ordered left to right by increasing distance from the Majority Consensus.

The second report is a table that shows the distance between each pair of ratings. Row i column j is the distance between Ratings i and j. The rating most like rating i is shown in blue and the least like in red.

Conference RanksDr. Massey provides a report that shows conference rankings by individual ratings (using the average team rating for each conference) and the composite based upon those. My first report is similar, showing the average team rank over all ratings and the distribution of team ranks in histogram format. This counts every team-rank by every rating, so if there are #R ratings and #T teams in the conference, #Ranks = #R×#T.
My conference average rank will not exactly match Dr. Massey's because I do not include human polls or certain computer ratings that do not rank all teams.

My other reports use the alternative approach of assigning the composite (Majority Consensus in this case) team rank and then using those to summarize by conference. The conferences with their team ranks are listed in ascending Conference Majority Consensus order. Like the team Majority Consensus, this is assigned by taking the rank for which 50%+1 of the teams in the conference are ranked that highly or higher.

The Pairwise Comparison of Teamranks by Conference report uses what has become my favorite way to compare conferences regardless of the criteria used in the comparison. For each pair of conferences C1 and C2 with #C1 and #C2 teams respectively, use all #C1×#C2 team comparisons and find the percentage in which the team from C1 is better than the team from C2.

The presentation is a table where the entry in row Row-Conf and column Col-Conf gives the expected number of "wins" by Row-Conf in 1000 trials of randomly selecting a team from Row-Conf and a team from Col-Conf. The conferences are ordered by All%, which is the expected number of wins by Row-Conf in 1000 trials where a team is randomly selected from Row-Conf and compared to one randomly selected from { teams not in Row-Conf }.

Weighted ViolationsThe link to the Weighted Retrodictive Rankings Violations report has also been moved to the Analysis page. The format remains the same as last year's,

© Copyright 2014, Paul Kislanko