Resumes and Meta-Rankings

© Copyright 2007, Paul Kislanko

10 October 2007

Recent discussions at Sunday Morning Quarterback and on the Fans Collective Survery message boards suggest an article about similarities between classes of computer rating systems and poll voting philosophies. Reviewing those suggests a useful genneralization.

Voting Phiolosphy Computer Rating Class

Power
The voter ranks teams based upon her perception of which teams would be "likely" to beat teams she ranks lower.

Predictive
Teams are ordered by the rating such that if team A is ranked higher than team B then A has a better than 50 percent chance of winning.

Resume
The voter ranks teams based only upon the results of games played, making no assumptions about the relative rankings of teams that haven't played.

Retrodictive
Teams are ordered based upon winners of games played. Usually a combination of winning percentage and some measure of schedule strength.

Voting Phiolosphy	Computer Rating Class
Power The voter ranks teams based upon her perception of which teams would be "likely" to beat teams she ranks lower.	Predictive Teams are ordered by the rating such that if team A is ranked higher than team B then A has a better than 50 percent chance of winning.
Resume The voter ranks teams based only upon the results of games played, making no assumptions about the relative rankings of teams that haven't played.	Retrodictive Teams are ordered based upon winners of games played. Usually a combination of winning percentage and some measure of schedule strength.

Now, if you think that a voter who espouses the "Power" philosophy only needs information from the "Predictive" ratings or the "Resume" voter doesn't use those, that's only because I placed the definitions side-by-side.

In fact, the "predictive" ratings certainly use all the data the "retrodictive" ones use, they just emphasize the data that the author feels contribute more to the probability of winning a game - mostly ability to score points against teams compared to the other teams' ability to prevent points from being scored. Likewise the "pure resume" voter needs some basis for determining the relative "worth" of a win, since teams with the same record have to be compared - usually by strength of schedule, but how to define that?

The Resume() function

The general idea behind resume voting is to compare teams based upon their best wins and worst losses. How to define quality of wins and losses objectively, though, is left open. A "pure resume" approach would define win quality based upon the quality of defeated opponents' wins, and right away you're into recursion that can be hard for a human voter to comprehend, much less calculate. Computers to the rescue!

One of my favorite "tricks" involves a function that maps any rating into a retrodictive form. Use the rankings from an arbitrary rating system to assign the values of wins and losses:

Win worth: (#ranked teams+1) − opponent's rank
Loss worth: − opponent's rank

For example, if 120 teams are ranked, a win over #1 is worth (120+1)-1 = 120 points. A loss to the #1 team is worth -1 points, and a loss to the last-ranked team is worth -120 points. Just add up the points for wins and losses and divide by the number of games played to get a "resume score" for each team.

This "resume function" is an example of another class of computer ratings:

Meta ratings
Meta ratings use the output of one or more computer ratings to define a new rating.

As implied by the definition, there are two flavors of meta-ratings. Some, like the Resume function, take the output of a single rating and create a new one using it and some external factor (games won and lost in the case of the Resume function), and others (like what I call the Bucklin Majority) combine multiple-ratings into a new summary rating.

For example, we can take the rankings associated with Jeff Sagarin's "Predictor" system to get resSAGP. The column definitions are:

ix
The index/rank for the "resume score."
Srt
The "resume score" for the team based upon the input rank.
Team
Team name<
SAG(P)
The name of the ranking used to form the resume report. This is the rank the team was assigned by the rating that is the input to Resume().
Conf, W, and L
are informational columns listing the team's conference affiliation, and wins and losses to-date
Avg_W Ornk
is the average opponents' rank for the team's wins
Avg_L Ornk
is the average opponents' rank for the team's losses
SOS
is the average opponents' rating by this system for all the team's games
ix - the second ix is just the ordinal rating of the SOS values.
BW
The best rank of all the opponents defeated by the team
WL
The worst rank of all the opponents to whom the team lost

Usability

An interesting property of the Resume function is that if you take any two ratings R_α and R_β the rankings for Resume(R_α) and Resume(R_β) are more nearly alike than R_α and R_β.

τ τ-Distance N

0.9070 resSAGP resISOV 2260 †221

0.9058 resISR resISOV 2290 221

0.8775 ISR resISR 2978 221

0.8760 resSAGP resISR 3014 †221

0.7551 SAGP resSAGP 7142 242

0.7535 ISOV resISOV 5992 221

0.7525 ISR ISOV 6016 221

0.7424 SE SAGP 7512 242

The τ Distance is the number of pairs whose order is reversed in the two rankings. With 221 teams, there are 24,310 total pairs. The ISR is like Sagarin's Elo-Chess in that it only takes into account who won and where the games are played, and the ISOV like Jeff's Predictor, taking into account the strength of the victories. Notice that the difference affects nearly a quarter of the team pairs, but the Resume functions agree on over 90 percent.

I will add Resume(ISOV) and Resume(ISR) reports as indexes to the Team Resume pages.