I was asked recently to explain some of the terms I was using with regard to the "Distance Matrix" and "connectivity." The terms are borrowed from Graph Theory, and it is quite possible (likely?) I've been misusing them whilst confusing visitors to the reports. So I wrote this to provide precise definitions for the terms as I use them. Folks who are intimidated by mathematical-looking expressions can skip to the plain-language explanations.

The **I**ncidence **M**atrix **IM** is defined by

im_{ i , j} | = { | 1 if team i and team j have played | } |

0 otherwise |

**IM** is a very sparse matrix. Even at the end of the season only about 10 percent of all possible team pairs will have played each other. However, its properties make it very easy to calculate how closely related two teams are by opponents' opponent chains.

If we let **A** = **IM**^{2}, then **A**_{ i , j} is nonzero if teams i and j are opponents' opponents, and the nonzero value is the number of opponents they have in common. (**A**_{ i, i} is the number of teams that have played team i.) In general, for **A** = **IM**^{d}, **A**_{ i , j} gives the count of number of unique paths of length *d* that connect team i and team j.

Define *P*( i , j) to be the smallest value of *d* { *d* = 1, 2, 3, ...} such that **IM**^{d}_{ i , j} ≠ 0 *if there is one.* If *P* is defined for all team pairs (i , j) the field is said to *connected*. When the field is connected, the **largest** value of *P*(i,j) for all teams i and j is said to be the *diameter (D)* of the schedule graph.

• For all teams t,D(t) = MAX(P(t, i) ) for all teams i ≠ t ≤D. One measure of how "connected" a team is to the field is given by this "diameter with respect to team t."

• If there is a subset of teams such that all teams in the subset only play each other then the field is never connected. We can always redefine thefieldto be the complement of this subset, and consider the subset a different schedule graph.

• If the field will eventually be connected by scheduled but not-yet-played games, we can measure the progress towards connectivity by counting the number of teams for whichD(t) is defined compared to the count of teams for which it is not.

dm_{ i , j} | = | P(i , j) |

Define *N*_{d}(t) to be the *number* of teams i ≠ t for which **[** *dm*_{ t, i} **]** = *d*.
By definition, if team t is connected to the field, and there are N teams in the field,

(N - 1) = | D(t) |

∑ | N_{d}(t) |

d=1 |

APL(t) = ( 1 / (N - 1) ) × | D(t) |

∑ | d × N_{d}(t) |

d = 1 |

Once all teams are connected to the field, we can find the overall APL by averaging all the team APLs. But we can find *N*_{d} for the field for all *d* ≤ D by summing over teams *N*_{d}(t) and dividing by 2 (since if team A is team B's opponent, team B is team A's so that pair is counted for each team) and use the same formula as above replacing ( 1 / ( N - 1 ) ) with ( 1 / ( N × ( N - 1 ) / 2 ).

- N1
- is just the number of teams the given team has played (number of opponents).
- N2
- is the number of teams one or more of the team's opponents have played and have
**not**played this team (number of opponents' opponents). - N3... and so on
- The number of teams that are opponents of teams that are opponents of teams counted in the previous column but are not themselves included in any prior column (Opponents' opponents' opponents'...)
- APL
- is the average path length from the team to each of all the other teams.

There are extra lines at the bottom of the report summarize the data for all teams.

- #Pairs
- is the number of
*pairs*of teams that are related to each other by the same number of links in the "opponents' ... opponent" chain, for N1 (number of pairs that are opponents), N2 (number of pairs that are opponents' opponents) and so on. The**APL**value for this line in the report is a measure of how well connected the field is overall, and can be compared to the team APLs to distinguish the teams that are more or less -connected. - %Total pairs
- is just #pairs expressed as a percentage of all team pairs at each O-O "level."

The reports I produce (Division 1A only All of D1) summaries of the *schedule graph*, where the only thing that matters is whether a pair of teams have played. I also only bother with division 1, although usually there is enough inter-divisional play to eventually connect all college football teams. I also don't attempt to report the list of pairs that connect two teams due to the large amount of data involved.