Stength of Schedule
Basic Wins and Losses

© Copyright 2004, Paul Kislanko

For every sport where not every team plays every other team at least twice (home-and-home) the only way to compare teams who don't play each other is to compare the quality of their wins. But what's a quality win other than a win over a team that has quality wins of its own? How can you tell they had quality wins? You look at their strength of schedule to determine their component to your SOS.

Yes, that definition is circular, but it is not faulty. If there is some function of games played and opponents involved in the games that can be used to order all teams it is necessarily a recursive function. That doesn't mean the problem is intractable, it just means it isn't easy to solve by a formula.

The most basic form of SOS uses only opponents' winning percentage. Some go a step farther and try to use opponents' opponents' winning percentage. Most of these get the first part right, but many calculate the OOWP in such a way that records are duplicated when an opponent is also an opponent's opponent, or the team for which the OOWP is being calculated is its own opponents' opponent. That the NCAA's Ratings Percentage Index includes this flaw turns out to be one of the reasons that formula isn't as valid for baseball as it is for basketball.

If we were going to define a rating system that involves only wins and losses combined with a "strength of schedule" based only upon wins and losses using some linear combination of WP and SOS, we'd want WP and SOS to be in the same units, and we'd want the independent variables to be truly independent. There's an easy way to do this. Suppose we want SOS to include OWP and OOWP for team A. Let's define:

OWP
All opponents' wins other than those against team A divided by all opponents games against teams other than team A.

Most systems get this part right, but the NCAA's RPI formula doesn't use this definition - it uses average of opponents' WP with team A's games removed. There is no good reason for that mistake, but that's how they do it.

OOWP
All opponents' opponents wins other than those against team A or one of team A's opponents divided by number of games played other than against team A or one of team A's opponents.
With these definitions a game can contribute to WP, OWP, OOWP or to none of those, but cannot contribute to two of those categories. Once we've divided the games so that none can contribute to both OWP and OOWP, we can combine them in a way that relates them to the rest of the games played.

If we let #O be the number of opponents and #OO be the number of their opponents who are not also opponents, then we can conclude that for a given team #O+#OO is a measure of how "connected" the team is to the rest of the field (the closer it is to everybody the more accurate the "SOS" measurement is). But more important from the standpoint of defining a linear combination of OWP and OOWP as "SOS", these values suggest the weights that should be assigned to OWP and OOWP.

I'd suggest something like this:
SOS=#OO × OWP + #O × OOWP

( #O + #OO )
It seems to me that if we're going to use any linear combination of OWP and OOWP it would be better to use one that does not have arbitrary weights for those but instead has weights that mean something. Anything.

In this case the weights make some philosophical sense. #OO times OWP gives OWP a weight proportional to how many of the total field that OWP represents, and #O times OOWP similarly for how many or few opponents the OOWP represent. The denominator ( #O + #OO ) as a separate value is an indication of how "connected" the team is to the field - when it is greater than some value this definition of SOS becomes useful.

Note that this analysis only applies if Opponents' wins plus Opponents' losses represent games that are not included in OO wins plus OO losses. When this metric is combined with winning percentage, a team's wins and losses are also not included in the components that make up OOWP.

This definition of SOS would be appropriate for a formula-based system that depends only upon wins and losses, such as the Ratings Percentage Index. More sophisticated systems (those that use something other than wins and losses) require a different definition of "strength of schedule" that is in some way dependent upon the ranking itself, but those are the topic of another essay.