Normalizing Scoring Stats Using Scoring-based SOS

© Copyright 2006, Paul Kislanko

Ever-insightful Sunday Morning Quarterback justified his "resume" method for filling out his BlogPoll ballot and described some of the components of teams' resumes. I pretty much would use the same process, but would suggest an improvement to one of the components.

SMQ noted that one of the things he looks at is average Margin of Victory. One thing stood out to me - #1 on that list had played 7 total games, and four of them were ranked #116, #117, #118, and #119 in scoring defense. Not only did Clemson run up the score on them, but everybody else did, too. In the case of Temple, the 52-point loss to Clemson was only third on the list of bad losses (to teams not as good as Clemson, I might add.)

Deciding whether gaudy offensive numbers are due to good offense or bad defense (or both!) is an interesting and complex problem. The short answer is that you have to adjust the statistics based upon the strength of the opponent (lower offensive production is expected against a top defense, worse defensive stats are expected against a top offense.) The complexity is trying to decide whether the "good defense" looks good only because it played "bad offenses", and vice versa.

The simple answer is "adjust offensive and defensive stats by SOS" but that begs the question "which SOS?"

The general SOS

What most fans think of as "SOS" is a specific definition based upon Opponents' Winning Percentage (OWP), or for sophistacted calculators thereof, some combination of that and Opponents' Opponents' Winning Percentage. This is reasonable, but it is not very useful to normalize any statistic that isn't perfectly correlated to winning percentage. If you want to normalize points scored, for instance, you would prefer an "SOS" based upon opponents' Points Allowed, not opponents' winning percentage.

A general definition of SOS is equivalent to the first derivative of a function that characterizes a team.

What we need is a numerical rating that relates to the statistic that we want to normalize. If we have a rating system that combines the metric with other statistics, we would like to be able to separate that out.

If we have a rating R(t) = ƒ(...,PSt,PAt,...), then we can use ∂ƒ / PA to adjust a team's Points Scored, and ∂ƒ / PS to adjust its Points Allowed. The mentally-tough part is that ƒ′( t ) is a function of team t's opponents' R, and in general if we don't know how ƒ uses the PS and PA variables, we can't find the derivative with respect to PA or PS.

If all we cared about was average MOV, we could do something like
  1. find average PS and PA for all teams
  2. adjust PS for each team by PA-by-all-opponents vs all other teams the opponents have played
  3. adjust PA for each team by PS-by-all-opponents vs all other teams the opponents have played
  4. repeat steps 2-3 until the largest difference in values is less than some pre-determined positive value η
  5. report the adjusted (PS-PA) values for all teams
Now, that would be a reasonable approximation, but there are problems in step 1. They are averages, which means data that relate individual teams have already been lost, and the NCAA reports include scoring data for games against teams not on these lists. (Georgia's scoring offense report includes its 48-12 win over Western Kentucky, who doesn't exist on the same scoring offense or scoring defense lists that Georgia's on.)

Suppose we take averages over games instead of averages over teams. In this case we would find R(t) by:
  1. set ƒ0(t) = 100 for all teams t
  2. set ƒn+1(t) = average opponents' ƒn(t) ± G(g) for all games g played by team t, e.g:
    • ƒn+1(winner) includes ƒn(loser) + G(g) in the average
    • ƒn+1(loser) includes ƒn(winner) − G(g) in the average
  3. repeat step 2 until the maximum | ƒn+1(t) − ƒn(t) | < η (where η is any pre-specified positive number)
  4. report R(t) as the value of ƒn+1(t)

Note that if G(g) = constant for all games for all teams, then the result is a pure combination of winning percentage, opponents' winning percentage, opponents' opponents' winning percentage, and so on. This algorithm is Boyd Nation's original Iterative Strength Rating.

The question is what should G be if what we're concerned about is scoring? What we'd like is something that combines the ability to score points and prevent points from being scored and has a different value for each game. Typically this is Margin of Victory, but that can be misleading - not all 10-point wins are equal. A team that wins 45-35 was much more in danger of losing than one that won 10-0. So I chose a measurement called Strength of Victory (which I think is commonly used to analyze the NFL.)
SOVgame = (winning score - losing score)

( winning score + losing score )
This is a number between 0 (a tie) and 1 (a shutout).

When we set
G(g) = SOVgame×(average points per game per team)


(average SOV for all games)
we have what I call the Iterative Strength of Victory.

It turns out that the ISOV is not a particularly good "power rating" by itself. But because of the way it is constructed, the SOSISOV (defined as average of opponents's ISOV values) is essentially the derivative with respect to "the ability to score and prevent scores", and
ISOV(t)

SOSISOV(t)
is a ratio we can use to find out how well the statistics represent the team. Just divide the average Points Scored by it and multiply the average Points Allowed by it. (The ratio will be > 1 if the team has played mostly teams worse than it at scoring and preventing scores, and ≤ 1 if it has mostly played teams as good or better than it.)

Not Quite That Simple

A complication that I left out of the description has to do with game location. A home field advantage adjustment is made in the calculation of G and this isn't visible in the ISOV or its SOS numbers (both of which are essentially averages of averages).

The ISOV (and its SOS) are normally-distributed because of the way they are calculated, but in all sports there's a bit of assymetry - there are more bad teams than good ones. So we can improve the approximation some by comparing each team's SOS to every other team's ISOV. In the process, we can add back in the home field advantage factor for each team. So for each of the 7021 pairs of 1-A teams, we consider three possible game locations to construct the Normalized Scoring Statistics. This amounts to "imagining" how a team's scoring offense and defense would rank if every team played every other team home and home and on a neutral field.

I introduced this last year as a part of the general discussion about whether the Pac 10 offenses were that good or the Pac 10 defenses that bad (yes, and yes) vis a vis whether the SEC defenses were that good or the SEC offenses just that bad (yes and yes). Subsequently it's turned out to be useful for any sport for which the ISOV is.

Don't Bet On It!

This methodology is fairly precise, and it's more accurate than nothing, but there's no way to empirically verify it. Especially in football, where halfway through the season the "next game" is worth 14 percent of the average. But (to get back the original point) 33.4-17.7 sure looks like it characterizes Clemson better than 43.9-13.3!