Scoring Margins

Is there a Home Field Advantage?
January 8, 2019

When I set out to model rating systems by calculating simple retrodictive and predictive systems I ran into a couple of problems. It turned out that my model for a "predictive" system wasn't by itself all that "predictive." In order to predict game-scores, or at least margin of victory, I had to come up with another algorithm that used the points-based algorithm as input. As I designed it I included a home field adjustment variable, for no better reason that every other predictive algorithm had one.

Now, figuring out how a home field advantage is manifested in points is a really hard problem. This is especially true for college football because by the time you have enough data points to study you've got different team compositions in the mix. There's nowhere near enough neutral-site games to serve as a control, and throw in that there's no way to tell if a home team that lost would've lost by more had it been a road game and one wonders if it's worth the effort.

The best study I've read about was done by Boyd Nation using D1 baseball conference games over multiple seasons. Baseball provides enough games, and in two or four -season periods conference games between team-pairss come in nearly equal home-and-home increments. Choosing conference games minimizes the noise influence of the tendency of "power" teams to play non-power teams exclusively at home. Boyd's results were framed in terms of winning percentage, with the result being that home field made a difference in the outcome about 10 per cent of the time.

There is some evidence that to all D1 team-sports are about the same in this regard. When they introduced the home court adjustment to the RPI I repeated it using just two years of basketball conference game results and various studies I've done on D1 football give approximately equal outcomes. Football is harder for the reasons stated above but when teams of "the same class" are used, the home team winning percentage gets closer to .55 as the definition of "class" is refined.

Where to start?

I began with the notion that if there was a home field advantage and it could be expressed in points it should be expressed by the difference in victory margins in games where the home team wins and that when the visiting team wins. Since I was rating only D1 teams for all games between D1 opponents I used
ΔMargin = (average margin when home team wins) − (average margin when road team wins)
which for 2018 after the regular season looks like this:
#HWMar#RWMar#NMarHW%ΔMar
D186920.5254916.114415.500.6134.41
 
ΔMargin actually starts out much larger early in the year and is always larger than the one people who are better at this than I am came up with, so I used ½ΔMargin to align my HFA with the ones published along with other ratings. That seemed justifiable on the idea that about half the time the venue didn't matter - the better team would've won had the game been played away instead of at home. That was just a naive ad-hoc adjustment, and I'm a bit embarrassed that I didn't immediately identify my mistake.

The real reason that ΔMargin came out too high is that the average is dominated by all the 1AA @ 1A games, where the home team would win by a landslide even were it a visitor.

#HWMar#RWMar#NMarHW%ΔMar
1A v 1A42218.2528616.223216.560.5962.03
1AA v 1AA34319.3125516.141212.670.5743.17
1AA @ 1A10433.6977.860.93725.83
If we use only 1A vs 1A and 1AA vs 1AA games we get a more reasonable 0.586 home winning percentage and ΔMargin 2.54.

But there's a similar distinction within 1A. If we divide it into the "power 5" and "group of 5" conferences we see the same phenomenon.

#HWMar#RWMarHW%ΔMar
P5 @ P516916.1312916.750.567-0.62
G5 @ P5 7326.961012.900.88014.06
P5 @ G5815.131217.420.400-2.29
G5 @ G517216.7713515.840.5600.93
It seems that as difference in team quality is accounted for the magnitude of a universal home field advantage decreases. In fact when we consider just conference games over two seasons it isn't obvious that one exists at all.
2015 - 2016
Conf#HwinHwin#RwinRwin#NNeut%HWinΔmargin
ACC5617.595512.0436.000.5055.55
B106219.325616.3236.670.5253.00
B124719.433917.95414.750.5471.48
P126117.934715.68225.000.5652.25
SEC6114.754812.71522.200.5602.05
AAC5618.204219.690.571-1.49
CUSA6119.804517.290.5752.51
MAC4319.794814.19311.000.4735.60
MW5615.094117.2425.000.577-2.15
SBC3314.582521.400.569-6.82
BSky5617.093613.890.6093.20
BigS1215.421016.200.545-0.78
CAAF5516.824315.300.5611.52
Ivy2713.672616.880.509-3.22
MEAC3918.263115.65316.330.5572.61
MVC4817.443514.510.5782.92
NE2714.371515.930.643-1.56
OVC3718.223314.300.5293.91
Pat2315.911814.39140.000.5611.52
Pio3817.713415.760.5281.95
SoCon3615.893012.930.5452.96
SLC5218.124516.96220.500.5361.16
SWAC3623.114520.421118.180.4442.69
 
All102217.5484715.913916.180.5471.63
            
2016 - 2017
Conf#HwinHwin#RwinRwin#NNeut%HWinΔmargin
ACC5718.125413.28315.000.5144.85
B107319.045320.4026.500.579-1.36
B124019.254614.63513.400.4654.62
P127117.893715.59217.000.6572.29
SEC6019.004814.52719.860.5564.48
AAC5616.274217.140.571-0.87
CUSA6117.484913.530.5553.94
MAC4816.854714.62310.000.5052.24
MW5315.814416.1115.000.546-0.30
SBC3316.822915.720.5321.09
BSky5916.443515.290.6281.15
BigS1320.77914.560.5916.21
CAAF5216.544513.51114.000.5363.03
Ivy2813.042714.56123.000.509-1.52
MEAC3914.493114.00212.500.5570.49
MVC4618.133616.670.5611.46
NE2415.791815.500.5710.29
OVC3914.183113.550.5570.63
Pat2115.812016.40140.000.512-0.59
Pio3816.583418.880.528-2.30
SoCon4115.663314.730.5540.93
SLC5417.024219.71223.000.563-2.70
SWAC3521.313819.081111.910.4792.24
 
All104117.1484815.794114.930.5511.36
            
2017 - 2018
Conf#HwinHwin#RwinRwin#NNeut%HWinΔmargin
ACC6115.215116.47233.500.545-1.26
B107118.585518.20213.500.5630.38
B124516.644112.76611.500.5233.89
P126915.673914.5925.000.6391.08
SEC5819.714917.18812.880.5422.52
AAC5917.973915.690.6022.27
CUSA6215.315213.790.5441.52
MAC5117.694514.6929.000.5313.00
MW5016.524815.940.5100.58
SBC3919.033514.690.5274.34
BSky5718.144117.34121.000.5820.80
BigS1426.931219.000.5387.93
CAAF5613.594015.68114.000.583-2.09
Ivy2917.242614.50123.000.5272.74
MEAC3615.193613.8629.500.5001.33
MVC4719.093519.340.573-0.26
NE1917.582315.300.4522.27
OVC4314.532715.700.614-1.17
Pat1819.332417.040.4292.29
Pio4715.002919.690.618-4.69
SoCon3716.323612.610.5073.71
SLC5616.614017.40222.000.583-0.79
SWAC3220.503316.55119.640.4923.95
 
All105617.0685615.914013.030.5521.16
That is not nearly as comprehensive as Boyd's analysis but it does support the notion that the home team wins about 10 per cent more often than expected. That leaves open the question of how to describe that in terms of points. It even suggests that my initial notion of using ΔMargin is not likely to be an accurate predictor.

Next Step

It still makes sense that a home field advantage should appear in the margin of victory data, even if not reflected in the averages. Upon reflection it is not surprising that averages over all scores don't show it - there just aren't that many games decided by one score. A pattern does show up if we count the home and road wins by margin, we count the cumulative home and road wins by number of games won by a margin less than or equal to the margin.
All non-neutral games 2016-2018
Notice that the slopes of the graphs stay about the same when the margin is less than a score, then the graphs diverge from parallel fairly rapidly when the margin is larger than that. The averages are dominated by the larger values of X where either team would've won regardless of location, and the home average margin is larger because there are more "optional" games where the weaker team visits the stonger. We're interested in the close games, and more than half (52%) of all games were decided by 14 points or less, so we zoom in on that part of the graph.
Non-neutral games decided by 14 points or less 2016-2018
The slope of the home win graph is greater than that of the road win graph in 1 ≤ X ≤ 4 and then essentially the same until X ≤ 9. So maybe there is a point-value somwehere between 1 and 4 that can be called a home field advantage.

Overthinking?

There's one good reason to ignore a lot of the above. The evidence is that the home team wins about 10 percent more often than would be expected, but in terms of points there's no way to quantify whether the home team's losses would be by a higher margin than were they playing on the road for instance. Here's five years worth of non-neutral site conference games, showing the margin as average point-difference and then the difference in average margins for home and road wins.
Season(s)#HW#RWHW%HSRSHS-RSGmS(Home W)GmS(Road W)MarHWMarRWMarHW-MarRW
20145084520.52928.4926.482.0135.82-18.7835.12-20.2417.0414.882.16
20155354470.54528.6625.992.6636.85-18.8934.50-18.8617.9615.642.32
20165334420.54728.8726.921.9536.18-19.4035.99-20.0516.7915.940.85
20175444460.54927.7825.142.6435.60-18.2433.55-18.2317.3615.322.04
20185364340.55328.9727.061.9136.51-19.8735.94-19.6616.6416.280.36
2-season moving average
2014-2015 10438990.53728.5726.232.3436.35-18.8434.81-19.5617.5115.262.25
2015-2016 10688890.54628.7626.452.3136.52-19.1435.24-19.4517.3715.791.58
2016-2017 10778880.54828.3226.022.3035.89-18.8134.76-19.1317.0815.631.45
2017-2018 10808800.55128.3726.092.2836.05-19.0534.73-18.9317.0015.801.21
3-season moving average
2014-2016 157613410.54028.6726.462.2136.29-19.0335.20-19.7217.2715.481.78
2015-2017 161213350.54728.4326.012.4236.21-18.8434.68-19.0417.3715.631.74
2016-2018 161313220.55028.5326.362.1736.09-19.1635.15-19.3116.9315.841.09
4-season moving average
2014-2017 212017870.54328.4526.132.3236.12-18.8234.79-19.3517.2915.441.85
2015-2018 214817690.54828.5626.272.2936.28-19.0934.99-19.1917.1915.791.40
5-season average
2014-2018 265622210.54528.5526.312.2436.19-19.0435.01-19.4117.1615.611.55
Looking at the graph of home and road wins in non-neutral site conference games with MoV ≤ x for the 2018 season
home wins and road losses by margin less than or equal to x
I decided to find the best fit logarithmic curve for the two series.
#home wins = 159.49 log mov - 105.9 with R2 0.9648
#road wins = 128.74 log mov - 81.873 with R2 0.9623
Setting #home wins = #road wins and solving for mov we find that when the margin of victory is greater than 2.19 the home team is more likely to have won.

Wrong!
After all that, I was gently reminded by Dr. Massey that looking for a qualitative difference between home team wins and losses was not the right way to look at things. The HFA if it exists also affects margins in which the home team won by more than "expected" or lost by less than "expected." I was looking for the games the home team won when it wouldn't have been expected to, and while that might account for the home winning percentage being 10 per cent higher than expected, it doesn't translate into points as I was trying to make it do.

Instead of comparing the margin of victory in home wins and losses, it is more illustrative to plot the frequency against home scorevisitor score. The home losses show up as negative margins, and the distribution looks appropriately normal.

5-year home margins
That is a graph of all non-neutral site conference games from the 2014 through 2018 seasons. The obvious interpretion of home field advantage is the distance the entire normal curve is shifted to the right of the y-axis. In other words, it is just the simple
home scorevisitor score
which for the 4877 such games since 2014 is 2.24.

Early in the season there aren't all that many conference games so for the purposes of calculating an HFA I'll use non-neutral site conference games from the previous season beginning from the equivalent week in that season. Had I done that for the 2018 season, we would've had:

Thrudate#ghs-vsCumhs-hs            FromTo#Gmshs-vs
5-Sep-188-4.138-4.136-Sep-175-Sep-189942.62
12-Sep-18136.08212.1913-Sep-1712-Sep-189952.72
19-Sep-1821-1.95420.1220-Sep-1719-Sep-189992.60
26-Sep-18634.671052.8527-Sep-1726-Sep-189913.02
3-Oct-18771.921822.464-Oct-173-Oct-189853.08
10-Oct-18950.242771.7011-Oct-1710-Oct-189853.11
17-Oct-18961.873731.7418-Oct-1717-Oct-189762.67
24-Oct-18981.124711.6125-Oct-1724-Oct-189712.64
31-Oct-181063.255771.921-Nov-1731-Oct-189702.91
7-Nov-181102.886872.078-Nov-177-Nov-189702.96
14-Nov-181131.938002.0515-Nov-1714-Nov-189742.94
21-Nov-181041.189041.9522-Nov-1721-Nov-189712.14
28-Nov-18590.859631.8829-Nov-1728-Nov-189761.98
5-Dec-1852.209681.886-Dec-175-Dec-189681.88
12-Dec-1815.009691.8913-Dec-1712-Dec-189691.89
19-Dec-18123.009701.9120-Dec-1719-Dec-189701.91
I don't use data from prior seasons for any other factor in my ratings, but it appears justified for the hfa calculation.

© Copyright 2019, Paul Kislanko
Football Home