Normalizing Scoring Stats Using Scoring-based SOS

© Copyright 2006,2007 Paul Kislanko

Deciding whether gaudy offensive numbers are due to good offense or weak opponents' defense (or both!) is an interesting and complex problem. The short answer is that you have to adjust the statistics based upon the strength of the opponent (lower offensive production is expected against a top defense, worse defensive stats are expected against a top offense.) The complexity is trying to decide whether the "good defense" looks good only because it played "bad offenses", and vice versa.

The simple answer is "adjust offensive and defensive stats by SOS" but that begs the question "which SOS?"

The general SOS

What most fans think of as "SOS" is a specific definition based upon Opponents' Winning Percentage (OWP), or for sophistacted calculators thereof, some combination of that and Opponents' Opponents' Winning Percentage. This is reasonable, but it is not very useful to normalize any statistic that isn't perfectly correlated to winning percentage. If you want to normalize points scored, for instance, you would prefer an "SOS" based upon opponents' Points Allowed, not opponents' winning percentage.

A general definition of SOS is equivalent to the first derivative of a function that characterizes a team. (Mathephobes may want to skim past the next bit.)

What we need is a numerical rating that relates to the statistic that we want to normalize. If we have a rating system that combines the metric with other statistics, we would like to be able to separate that out.

If we have a rating R(t) = ƒ(...,PSt,PAt,...), then we can use ∂ƒ / PA to adjust a team's Points Scored, and ∂ƒ / PS to adjust its Points Allowed. The mentally-tough part is that ƒ′( t ) is a function of team t's opponents' R, and in general if we don't know how ƒ uses the PS and PA variables, we can't find the derivative with respect to PA or PS.

If all we cared about was average MOV, we could do something like
  1. find average PS and PA for all teams
  2. adjust PS for each team by PA-by-all-opponents vs all other teams the opponents have played
  3. adjust PA for each team by PS-by-all-opponents vs all other teams the opponents have played
  4. repeat steps 2-3 until the largest difference in values is less than some pre-determined positive value η
  5. report the adjusted (PS-PA) values for all teams
Now, that would be a reasonable approximation, but there are problems in step 1. They are averages, which means data that relate individual teams have already been lost.

Suppose we take averages over games instead of averages over teams. In this case we would find R(t) by:
  1. set ƒ0(t) = 100 for all teams t
  2. set ƒn+1(t) = average opponents' ƒn(t) ± G(g) for all games g played by team t, e.g:
    • ƒn+1(winner) includes ƒn(loser) + G(g) in the average
    • ƒn+1(loser) includes ƒn(winner) − G(g) in the average
  3. repeat step 2 until the maximum | ƒn+1(t) − ƒn(t) | < η (where η is any pre-specified positive number)
  4. report R(t) as the value of ƒn+1(t)

Note that if G(g) = constant for all games for all teams, then the result is a pure combination of winning percentage, opponents' winning percentage, opponents' opponents' winning percentage, and so on. This algorithm is Boyd Nation's original Iterative Strength Rating.

The question is what should G be if what we're concerned about is scoring? What we'd like is something that combines the ability to score points and prevent points from being scored and has a different value for each game. Typically this is Margin of Victory, but that can be misleading - not all 7-point wins are equal. A team that wins 42-35 was much more in danger of losing than one that won 10-3. So I chose a measurement called Strength of Victory (which I think is commonly used to analyze the NFL.)
SOVgame = (winning score - losing score)

( winning score + losing score )
This is a number between 0 (a tie) and 1 (a shutout).

When we set
G(g) = SOVgame×(average points per game)


(average SOV for all games)
we have what I call the Iterative Strength of Victory.

It turns out that the ISOV is not a particularly good "power rating" by itself. But because of the way it is constructed, the SOSISOV (defined as average of opponents's ISOV values) is essentially the derivative with respect to "the ability to score and prevent scores", and
ISOV(t)

SOSISOV(t)
is a ratio we can use to find out how well the statistics represent the team. Just divide the average Points Scored by it and multiply the average Points Allowed by it. (The ratio will be > 1 if the team has played mostly teams worse than it at scoring and preventing scores, and ≤ 1 if it has mostly played teams as good or better than it.)

Not Quite That Simple

A complication that I left out of the description has to do with game location. A home field advantage adjustment is made in the calculation of G and this isn't visible in the ISOV or its SOS numbers (both of which are essentially averages of averages).

The ISOV (and its SOS) are normally-distributed because of the way they are calculated, but in all sports there's a bit of assymetry - there are more bad teams than good ones. So we can improve the approximation some by comparing each team's SOS to every other team's ISOV. In the process, we can add back in the home field advantage factor for each team. So for each of the 7140 pairs of D-1A teams, we consider three possible game locations to construct Normalized Scoring Statistics. This amounts to "imagining" how a team's scoring offense and defense would rank if every team played every other team home and home and on a neutral field.

Don't Bet On It!

This methodology is fairly precise, and it's more accurate than nothing, but there's no way to empirically verify it. So we just define this as a method to grade offenses and defenses "on the curve."