Basketball fans love a close game. A nail-biter with fifteen lead changes is much more fun to watch than a blowout. But do the refs like close games as much as the rest of us?

Motivation to read on: they do.

The play-by-play data used in this analysis is curated by BasketballValue.com. You can find their files on this page. Alternatively, you can download the data in the final form that I used – regular-season data from 2006-2012 here, or the playoff data from 2007-2011 here.

Notes about the upcoming analysis

  • It uses the 7111 game regular season dataset.
  • All fouls with fewer than 3:00 minutes remaining were ignored, to minimize the impact of intentional fouls.
  • Only regular fouls were considered; technical fouls were ignored.

Home vs. Away

The first obvious task when collecting foul data is to examine the difference in the number of fouls recorded by the home and away teams. Here’s a histogram comparing them. Data above 0 means that there were more fouls called on the home team:

It looks pretty normal, with a mean just below 0 \((-.837)\). Sure, this is interesting, but it doesn’t reveal too much because differences in games with many fouls are weighted the same as differences in games with very few. Imagine a game in which the home team recorded 5 fouls and the visiting team 10. In this graph, it looks the same as a game in which the home team recorded 35 and the visiting team 40, though these games were fundamentally different. In the first, the away team had twice as many fouls; in the second, only 14% more. Thus, it’s more useful to look at a normalized measure.

This measure is expected “percentage difference” between the two foul categories, defined as 100 times the difference divided by the average, or \(200\times\frac{a-b}{a+b}\). For example, the percentage difference between 40 and 35 is \(13.33\%\).

Here is the same data graphed to reflect percentage difference rather than absolute difference:

Now that we have a normalized measure, we can find out whether the variation from zero is statistically significant. If we assume that the regular season data is a representative sample from the total pool of all NBA games, (it’s not, but such is life) we can determine a confidence interval for the real mean.

The mean percentage difference between the number of fouls recorded by the home team and the away team is \(-3.93\%\). A \(95\%\) confidence interval yields bounds between \(-4.57\%\) and \(-3.30\%\). The home team gets called for fewer fouls. This result is consistent with prior research; see this talk, presented at the Sloan conference at MIT, this paper examining NCAA basketball and coming to the same result, or this paper from the Journal of Economics and Management Strategy. The last paper also shows that the home team’s advantage rises as the number of fans increases, which suggests what Paul Aufiero (Patton Oswalt) was definitely thinking in Big Fan: refs favor the home team (at least in part) because of the crowd.

Winning vs. Losing

If refs want games to be close, we might expect that they (consciously or unconsciously) call more fouls on the team that’s currently in the lead.

Still ignoring the last three minutes of the game, this graph shows the percentage difference between fouls called on the team that is currently in the lead and the team that is currently losing. A number above 0 means that the leading team is more likely to get called for a foul.

This seems a little more biased.

Here, the mean percentage difference is \(6.91\%\), with a \(95\%\) confidence interval from \(6.20\%\) to \(7.63\%\)

Interestingly, despite the fact that the home team is less likely to foul and that the home team won almost \(60\%\) of their games during these seasons, the winning team at any given moment is more likely to record a foul. In fact, the percentage difference is almost \(7\%\). This is striking; make of it what you will.

Winning (by a lot) vs. Losing (by a lot)

If we make the same analysis, but only consider fouls that occurred when the point differential was 10 or more, we get the following graph (again, above 0 means the winning team fouled more):

There are peaks at \(\pm 200\%\) because many games only have one or two fouls when one team is winning by at least 10.

Now, the mean percentage difference is \(22.30\%\). Twenty-two percent!

This suggests one of two things:

  1. The leading team is more likely to foul.
  2. Referees are more likely to call fouls on the leading team.

This data can’t tell us why the winning team tends to foul more, but it seems unlikely that the winning team is actually more likely to commit a foul. Why would a team foul when they were 20 up – ever? It stops the clock, makes for easy points, (during bonus and for shooting fouls) etc. In fact, to me, it looks as though referees are very sympathetic to the losing team.

How Sympathetic?

If we vary the point differential that we examine, it’s clear that the greater the point differential, the more likely that a foul will be called on the leading team.

This graph is insane. It’s practically (within the error bars) increasing linearly. The greater the lead, the more likely that a foul will be called.

The same analysis on the playoff data yields the same trend, possibly even amplified. (We have less data, so the error is more severe.)

Conspiracy?

There’s a simple economic motivation to keep games close and exciting, but this data doesn’t offer any insight into whether the referees actually have that motivation in mind. Other studies suggest that teams facing elimination in the playoffs tend to be given an advantage by the refs, forcing each series to seven games as often as possible (and thus increasing revenues). However, I’m not comfortable saying that the NBA is encouraging this behavior by its referees; I imagine word would have gotten out by now. It’s probably just human nature. Nature that the NBA needs to fix!

There’s a lot more information to be teased out of this data. I’d love to hear any more results or insights.