Applying the Directed Games Graph To Study Ratings

© Copyright 2007, Paul Kislanko

19 September, 2007

Properties of Functions of the Directed Games Graph

In Using the Directed Games Graph I wrote:
Neither my SOWP nor my WtdW report constitute a "rating system" but they can be useful for analyzing the behavior of "advanced" systems because those depend upon the directed games graph. Here I've used A→B to mean only that A won over B, but those systems can be modeled by adjusting the length and/or thickness of the arrow based upon, say, game location and margin of victory.
but one could use the graph itself as a "skeleton" for a rating system. The "weighted wins" function and "second order winning percentage" function each have some desirable properties that anything based upon either inherits.
  1. The SOWP is very "responsive" to new results. Team A is said to have a "second order win" over team C if team A has won against a team B that has a win over team C. But unlike a direct win over team C, this "second order win" can disappear. If team C subsequently beats a team D which has a win over team A, the A→B→C gets balanced by the C→D→A, and the result is a second order tie. And of course, if C beats A in a later week, the second order win turns into a first order loss, and any second-order wins A had based upon C's wins disappear.
  2. There's an intuitive "strength of schedule" built into the SOWP calculation. Suppose team A is 4-0 and team B is 3-1. If there are 20 teams for which team A has a second-order win, but 30 for team B, it's clear that on average team B's three wins have been over better teams than team A's four wins.
  3. The WtdW is approximately normally-distributed (this is not a coincidence, because of the way it's defined the mean will be zero and standard deviation is a function of the factors in the summed differences.) This makes it very useful as a foundation for a rating system that "superimposes" some function of relative team strength (such as margin of victory.)

The SOWP is based upon simple counting statistics. SOWP is just:
SOWP =WW + TT × ½

WW + LL + TT + UU
WW = #wins + #second-order wins
LL = #losses + #second-order losses
TT = #second-order ties
(plus ordinary ties in sports that have them)
UU = number of teams for which there is no relationship

One must be careful using property 2 - until all teams with at least one win are connected to all teams with at least one loss there's a great deal of imprecision in the SOWP's implied SOS, and therefore the SOWP rank. The UU term in the calculation provides a handy measurement of that. Just define the precision of the SOWP as:
1 − UU

#teams − 1

Early in the season, the precision will be less than a tenth, indicating a 90% uncertainty in most teams' rankings.

Using the Directed Games Graph to Analyze A Rating System

As noted in property 3 above, the basic "weighted wins" formula:
WtdW(A) ={#P(A,x)#P(x,A)}

is approximately normally distributed. But until pretty much the end of the season, the "implied SOS" definitely isn't. After three weeks in 2007, Alabama has 13 "second order wins", but all against winless teams (or teams whose only wins are against the other 12). What we need to do is define the worth of a second-order win (and cost of a loss) in terms of the factors in the WtdW sum and a specific rating system.

Let RT be the ordinal ranking of team T (1 being best, N being worsst where N is the number of ranked teams). If we replace the terms in the WtdW sum above with FA,B and FB,A for simplicity, the WtdWRating is:
wins( FA,x × (N + 1 - Rx) ) − losses( Fx,A × Rx )

Let the ranking be that as defined in the Maj column in the Majority Consensus Summary (for the games through 15 Sep.)

There's a bit of a problem with the ranking. We have
62New Mexico
86New Mexico St

New Mexico State→UTEP→New Mexico
UTEP→New Mexico→New Mexico St
There's no way to avoid at least one of the ranking violations, since no matter how you rank the teams there's at least one game that doesn't work. The question is which game is the true "upset" that can be safely ignored?

When we calculate WtdWBMaj we get a new ranking:
56New Mexico
67New Mexico St

So, according to the way the teams' wins are embedded in the directed games graph, we can conclude that the win that violates the ranking is UTEP's over New Mexico.

And to resolve the question about Alabama's rating, we notice:

†— The formula is a little more complicated than shown. When both F(A,x) and F(x,A) are defined, then for wins the multiplier is
F(A,x) − F(x,A)
and for losses
F(x,A) − F(A,x)
This accounts for cases where an intermediate win is by a team that itself has first or second -order losses to other teams in the chain.

This is necessary in sports (such as baseball) where it is common for teams to play more than once in a season.