The Ultimate Resume Analysis Tool

© Copyright 2007, Paul Kislanko

10 September, 2007
Last year I used a tool that processed the directed gamess graph to summarize all A beat B beat C ... beat ... chains to quantify the "value" of teams' wins several different ways. I used the basketball and baseball seasons to refine and modify it, and this year's version is sufficiently different that I won't even reference last year's description.

This is not a "rating system" but it by the end of the season is a reasonable model for systems that don't depend upon margin-of-victory. One of its reports was the only computer-based system I know of that ranked Florida over Ohio State prior to the BCS Championship Game.

(Editor's note: Don't read too much into that - there's no reason to believe that was anything other than a coincidence. Quite a few other computer rankings had Florida's SOS much higher than Ohio State's, and this tool is probably better characterized as an SOS measurement than a rating system. In general, the fact that a team has a higher SOS does not mean it's a better team. See Oklahoma vs Boise State for a counterexample.)

The inspiration was the College Football Victory Chain Linker. Usually a team with at least one win over a team with at least one win can be connected to every other team that has at least one loss through an "A beat B beat C beat..." chain by the end of the season. (There's no guarantee this will be true, but in practice it usually is.) I use the symbol → to indicate a win:
means team A won over team B.

To analyze the directed games graph for each team A we look at all the A→x chains for A's wins, and for each X all of the xy chains to get all the A→xy chains (A's opponents' wins over A's opponents' opponents). Then for each Y all of the yz chains to get all the A→...→z chains, and so on. The number of different chains from A to a specific Z can be quite large, since A can have several xs that each win vs a specific Y, and several ys can each win over a specific Z.

For every pair of teams in D-1A the Football Bowl Subdivision (A,B) we can find the path length - PL(A,B) - of an A→...→B chain. For instance, from 2006:
Wake Forest→Florida St→UCLA→Southern California→Arkansas
Arkansas→South Carolina→Clemson→Wake Forest
So, PL(Wake Forest,Arkansas) = 4 and PL(Arkansas,Wake Forest) = 3 and the latter is stronger. When the path lengths are equal, a team that has more paths to another than the other has to it has a stronger win chain.

We can calculate a second order winning percentage for team A using these concepts as follows:

  1. Find PL(A,x) for all x for which there is a win-chain (PL is the shortest such chain)
  2. Find #P(A,x): the number of paths of length PL(A,x)
  3. Count SOWP is just
    Wins + Ties/2

    Wins + Ties + Losses + Unknown
The denominator in the SOWP is just one less than the number of teams in the field.

One problem with this second-order winning percentage is introduced by "upsets" - in pure logic pq and qr means pr, but that's not so in sports.

Taking another example from 2006, we get
PL(Auburn,Florida) = 1; #P(Auburn,Florida) = 1
PL(Florida,Auburn) = 2; #P(Florida,Auburn) = 2
It seems that Auburn's win over Florida shouldn't count as much as, say, their win over LSU. To account for these, we can define a "weighted win chain" metric. For each team A:
WtdW(A) ={#P(A,x)#P(x,A)}

for all x for which PL(A,x) or PL(x,A) exists. If one exists but the other doesn't, the term corresponding to the non-existant path is treated as zero.

So, in the example above, the contribution by Florida to Auburn's weighted wins is 1/12 − 2/22 = +1/2, and the Auburn contribution to Florida's is 2/22 − 1/12 = −1/2.

Neither my SOWP nor my WtdW report constitute a "rating system" but they can be useful for analyzing the behavior of "advanced" systems because those depend upon the directed games graph. Here I've used A→B to mean only that A won over B, but those systems can be modeled by adjusting the length and/or thickness of the arrow based upon, say, game location and margin of victory.

Unfolding the directed games graph

One way to describe the directed games graph is to use a matrix. Let Wi, j = 1 if team i wins over team j and = 0 if the teams haven't played or team i lost. Then Wλ gives the number of unique win chiains from each team to every other team that have a path-length of λ. This is interesting only to analysts though, because it requires looking at each power of W for each team-pair to get anything interesting from the matrix.See note

A more useful (and fun) way to look at the graph is to "center" it on each team. The SOWP and WtdW reports have links to a report for each team that displays all win chains that begin with the team. If team A has a win path to team Z (however long) there is a link to the page that center's the graph on team Z. Thus, you can jump around and view the graph from any team's perspective.

One interesting aspect of the powers of the matrix W has to do with its diagonal. Wni, i counts the number of times team i "beats itself." A→B→C→A is not an uncommon relationship, and of course these are the things that give rating systems a challenge. (Not to mention us poor humans.)