Metrics on the Directed Games Graph

October 12, 2014

This essay could be subtitled why the worst computer ranking is probably better than the best human poll. Besides the obvious objectivity of computers and inevitable subjectivity of human rankings, the task is just too difficult for a human to do properly without computer help. There are 128 teams - 8128 pairs - to rank using only 336 games between them. To assign relative ranks to teams who haven't played each other you have to consider records versus common opponents, and if they don't have any of those, the human is generally left to fall back on subjective somethings. Undaunted, computer algorithms trudge on until they find a way to compare every team against every other team.

Computer ratings work because even though the 8128 comparsions can't be made directly from the 336 games played so far there are 562,494 "connections" that provide useful data.(There are more than that, this is just the count of connections A↔B↔C↔D↔E. Every team can be an A with every other an E when there are 5 or fewer "↔"s.) The diameter of the games graph is 5 since every team can be connected to every other team with 5 or fewer "↔"s.

The human voter might base their vote on the few games they've seen or read about, or even on all games played. It is difficult (without computer help) to go as far as records vs common opponents to form a subjective opinion. Advanced computer ratings use the Directed Games Graph (usually implicitly) to turn the 336 games into 24,334,465 comparisons for the 8,128 team pairs, or nearly 3000 tests per team-pair.

A↔B↔C↔D↔E describes the Games graph and the FBS field is connected and the games graph has a diameter of 5. The Directed Games Graph is what you get when you turn the connections A↔B (team A played team B) into the relationship A→B (team A won a game against team B).

In general you cannot find an A→B→... chain from every team to every other team, since winless teams cannot be the head of any such chain and undefeated teams cannot be the tail. The DGG can be said to connected when every team with at least one win is connected to every team with at least one loss, but that may never happen (at this writing we're 2428 pairs short.) If every pair that can be connected is connected using no more than ρ "→"s I call ρ the DGG's radius and after week 7 ρ is 20.

Advanced computer ratings can be thought of as functions that measure each team's position in the DGG using different rulers. They have a lot more to work with than humans can process: There are 24,254,465 paths that connect teams with wins and teams with losses that are 20 or fewer "→"s long, and many if not most use more information than who won to define the value of each of the 336 "→"s that combine in that many ways.

As far as I know no rating actually finds every possible A⇒B path and then applies its particular ruler, but their clever algorithms use every bit of the DGG it (no pun intended) and more to come up with their team ratings. To make the "skeleton" upon which these ratings are built visible, I do find all of the shortest A⇒B paths and count all A⇒B paths up to the DGG radius.

Touring the Directed Games Graph

My measurements are all very simple, being composed primarily of counting stats. The Winpath Summary just lists for each team how many teams it has beaten (are one → removed), how many other teams those have beaten (and are →.→ distant from the team) and so on up to the radius of the DGG. I sort these two different ways. A "second order winning percentage" results from considering a shorter path from A to B than from B to A a "win" by A (in the case of equal path lengths, more paths A⇒B than B⇒A counts as a win; where length and number are equal a tie is counted.) "Weighted Wins" is a value based upon the number of other teams at each distance.

The main entertainment value from these reports is that the team names link to lists of the actual (shortest) paths that connect the teams to every other team for whcih such paths exist. The "tail" of each path links to the same page for the other team positioned to the team at the head. In this way if you have the inclination (and too much time on your hands) you can jump from vertex to vertex in the DGG and display any of the millions of paths.

The Winpath Summary stops "looking for" connections once it has found the shortest ⇒ chain from A to B, but rating systems do not. If there are n paths at distance λ there can be many times n (by orders of magnitude) at what I call the graph's radius. Making all of the relationships visible is something of a challenge, but (sticking with counting stats) I use the "Win Path Matrices."

A Win Path Matrix is the number of win-paths from team i to team j for a specific number of "→"s. Presenting them in a comprehensible manner is a challenge, because the graph is asymmetrical - it looks different from every vertex. Here's how I organize the reports.

If ρ is the radius of the DGG, and wpλ(A,B) the number of team A⇒ team B paths with λ "→"s, define

F(A,B) =  ρ
{ ( wpλ(A,B) − wpλ(B,A) ) × ω(λ) }
 λ=1
• The ω(λ) factor keeps the large number of long-length paths from counting more than a win. Currently ω(λ) = 21-λ. The largest number of 20-"→" paths counts about one 60th win with this definition.
• From the definition it is clear that F(A,A) = 0 and F(B,A) = −F(A,B).
• All calculations use at least 20 decimal digits (I increase this as necessary to ensure all counts can be represented as integers.)

For each A the teams can be ordered first through nth by increasing F() values, giving n different "rankings" of the field. Teams { t | F(A,t) < 0 } are ranked better than A and teams { t | F(A,t) > 0 } are ranked worse than A from team A's perspective. Team A's ranking for team A will be a tie with all teams for which there is no win path from or to A (or an equal number of weighted paths.)

For each team A I produce a report that lists by rank each team that ranks team A at that position in the DGG, followed by the ranking by team A of all teams. In the first part, teams that team A has played are in bold, and for all the other teams there's a link to the report for those teams. The rankings by team A section lists the values by which the ranking was formed (ΔPaths), F(team,x) and F(x,team)) and the number of win-paths from team A to x for each pathlength used in the calculation. In the second part of the report, teams that have been played are listed with a link to their page.

A method is a trick used twice

How do you order a set of rankings into one composite ranking? If you answer anything other than "the same way you consolidate polls or computer ratings" send me an email - there might be a Nobel prize in Econimics in your future.

Just as I do with Computer Rankings I calculate the Borda rank and Bucklin Majority rank for each team. I do not calculate a Condorcet "pairwise comparison" because it was true pairwise-comparisons that generated the ranked ballots in the first place. I do report a ranking that has no voting method analog: Self is the rank a team assigns to itself based upon the number of teams for which it has a negative F() value.

The index to the team pages is sorted by Borda count, and lists the Self-rank, best rank, Bucklin (65th-best), worst rank, and average in addition to the Borda rank. It also includes the count of teams that ranked each team at ranks 1-128.

It cannot be emphasized enough that none of these rankings are ratings of team quality. They are convenient measurements of team position within the Directed Games Graph at a specific point in time. As the season progresses there will be more common opponents, opponents' opponents and so on, and the graph itself changes fairly dramatically from week to week. By the end of the season some of the rankings (especially the "self" ranking) become fairly consistent with results as measured by retrodictive ranking violations, but even then the rating systems that use the DGG as a starting point for their calculations do better.

As mentioned above advanced ratings do not generally explicitly calculate the winpath matrices and apply their formula to them. However, the 128 teams (nodes) and #games-played (edges) are what the start with (my simple model of a retrodictive rating implicitly uses over 192 quintillion paths of length 76 or less), so a human who wants to rank teams might find it useful to check out some neighborhoods of the graph.

© Copyright 2014, Paul Kislanko