Analyzing the Games Graph

© Copyright 2006, Paul Kislanko

The Distance Matrix report I publish is just one way to look at the graph that connects all teams by games played. This is usually the view that is used by "formula" type systems that work on averages of opponents' characteristics, opponents' opponents' characteristics, and so on.

More advanced systems use the game graph itself (sometimes indirectly) by using the connections directly instead of beginning with team summaries. For these it is important to understand how the teams that are being compared are connected and how a particular rating system uses the connections between teams.

The connectivity of the field can be described by the average path-length between teams, as I mentioned in Calculating the Distance Matrix. But there's more information to be gleaned from the powers of the incidence matrix.

In what I call the Distance Matrix the only thing that matters is the shortest path linking a pair of teams. Equally important (and not shown in that report) is the number of paths by which the teams are connected.

Unless there are rematches, teams that play each other are connected by one path. Teams that don't play but have a common opponent are connected by one path of length two, or by two paths of length two if there are two common opponents, three paths of length two if there are three common opponents, and so on.

As before, define the Incidence Matrix IM by
IM i , j = { 1 if team i and team j have played}
0 otherwise
and note that [ IMd ] i , j is the number of unique paths between team i and team j of length d. As before, let
P( i, j) = the smallest value of d for which [ IMd ] i , j ≠ 0.

and define
E(t) = [ IMP(t, i) ] t, i
i ≠ t

E(t) is just a count of the number of paths required to connect team t to all other teams. A team i that is an opponent's opponent's opponent may be (usually is) more than one opponent's opponent's opponent, so we count all paths between the teams that are shortest paths.

Like Average Path Length, E(t) is a measure of how connected a team is to the rest of the field. Teams with smaller counts have schedules that contribute more to the connectivity of the field.

Now, if the graph's diameter is higher than P(i, j) then systems that process the entire graph will encounter the same endpoints multiple times - an opponent can't be an opponent's opponent, but there will be paths of length 3, 4, etc. in addition to the path(s) of length 1. Define
XP(t) = Diam
[ IMd ] t , i
i ≠ t d=P(t , i) + 1

XP is a count of the number of "excess" paths that are encounbtered when the whole field is processed. In the expanded distance matrix these characteristics are shown for each team. The columns are:

Team name
APL
The Average Path Length for the connections from the team to every other team
PPT
The Paths Per Team - total number of paths required to connect the team to every other team divided by the number of other teams.
∑P
is E(t) as defied above
∑XP
is XP(t)
Ldiam
is the "local diameter with respect to team t" - the largest P(t,i) - teams with Ldiam < Diam accumulate a much larger XP than less-well connected teams.
N1
the number of teams at distance 1 from team t (number of opponents)
P
the number of paths to opponents (number of games played)
XP
the number of "excess paths" to opponents
The last three are repeated for opponents' opponents, and so on.

Compare the report for division 1-A to the less-connected All Division 1 graph. (Division 1-AA is so weakly connected as a subfield that this graph won't be as connected as 1-A until after the playoff games - which is why 1-AA needs a playoff.)