This is the 4th part in my series on examining the concept of momentum in NFL games.The first part looked at whether teams that gained possession of the ball by momentum-swinging means went on to score more frequently than teams that gained possession by regular means. The second part of this series looked at whether teams that gained possession following momentous plays went on to win more often than we would otherwise expect. The third part focused on drive success following a turnover on downs, which is often cited by coaches and analysts as a reason not to go by the numbers when making strategic decisions.
This article will examine how 'streaky' NFL games tend to be. If momentum is real and it affects game outcomes, it would result in streaks of success and failure that are longer than we would expect by chance. But if consecutive plays are independent of previous success, the streaks of success and failure will tend to be no longer than expected by chance. This method of analysis does not rely on any particular definition of a precipitating momentum-swing, as it looks at entire games to measure whether success begets further success and whether failure leads to more failure.
For momentum to have a tangible effect on games, it does not require completely unbroken strings of successful or unsuccessful plays. But if success does enhance the chance of subsequent success, then the streaks of outcomes will be longer than if by chance alone.
For this analysis, I applied the Runs Test to the sequence of plays in a game. This produces a statistic indicating how streaky a string of results is compared to what would be expected by chance. For example, consider the following 3 strings of results of flipping a coin 8 times:
HTHTHTHT, HHHHTTTT, HTTTHTHH
The Runs Test works like this:
The average expected number of runs of the same result (the 'mu' symbol) is calculated based on the number of 'successful' trials (N+) and the number of 'unsuccessful' trials (N-). In the 3 examples above, the test says we should expect an average of 5 unbroken 'runs' of one result or the other:
In the first example above, we have 8 separate runs, which is choppier and more alternating than we would expect. And in the second example, we see only 2 runs, which is fewer than expected and might suggest some sort of trends are at work. The third example is more in line with what we would commonly expect by chance, producing 5 runs of heads and tails.
Instead of the heads and tails of coin flips, I examined the success of football plays. I classified the plays of a game as successful (S) or unsuccessful (U) based on whether the play improved or worsened a team's net scoring potential, as measured by Expected Points. (This is the basis of the ANS Success Rate statistic.) So instead of HTTTHTHH... we would have a string like SUUUSUSS...
One thing to keep in mind is that unlike coin flips, some teams are better than others, so there isn't a 50/50 chance of success for plays in any particular game. Fortunately, the Runs Test accounts for this kind of imbalance.
For each game we can measure how streaky the plays were. If the plays are significantly more streaky than we would expect from an independent distribution like flipping a coin, then we have evidence of momentum. For every game from the 1999 season through week 8 of the 2013 season, I examined the sequence of play success for all common scrimmage plays from the home team's perspective.
This is a key element of my methodology because this would detect momentum for a team's offense or defense alone, as well as detect any kind of carry-over from one side of the ball to the other. In other words, this method will detect whether an offense or defense 'gets on a roll' in addition to whether there plays on one side of the ball inspire their teammates to play better. No matter what your definition of momentum, this method should be able to detect evidence if it exists.
The bottom line is that we are testing for independence between successive plays. If plays are independent 'trials', then there can be no momentum.
Let's start with an example. The Broncos hosted the Eagles in week 4 this season, winning 52-20. In that game, DEN had a total of 147 plays on both sides of the ball. They had 90 successful plays and 57 unsuccessful plays, resulting in 62 streaks. Was this streaky or not?
We expected about 71 streaks, but there actually only 62 runs of the same result. So this game was streakier than we might expect. That doesn't yet say whether momentum is at work. Some games are bound to be streakier than others by chance. So how unlikely is an outcome like this? The standard error of the runs test can be calculated from its variance, and this produces a p-value which is the probability of such an outcome by chance.
In the case of the PHI-DEN game, the p-value was 0.12. That doesn't meet the traditional cut-off of .05 for statistical significance, but it's still fairly unlikely.
One game doesn't tell us much, but 3,875 of them can. I repeated this analysis for every game since the start of the 1999 season, and here are the results:
-The average expected number of runs was 67.0.
-The average observed number of runs was 64.7.
-The average standard error for a single game was +/-8.0 runs.
-49 games (or 1.2%) of the 3,875 games exceeded the p=0.05 threshold for streakiness
The number of observed runs per game was less than the expected number, which indicates there is more streakiness than if all the plays were purely independent. The difference is by 2.3 runs per game. However, very few games were more streaky than expected to a statistically significant degree.
I interpret these results to say that yes, there may be some degree of momentum, on average, in an NFL game. But it is imperceptibly small, and we can only point to handful of games over the past 14 years that are particularly momentum-rich.
I characterize the difference as imperceptible because 2.3 runs is 4% fewer than expected. It's a difference of only about 1 play in a game. In other words, we only need to flip one play from a success to a non-success (or vice versa) to create 2 additional runs. If there were a string of heads and tails like HHHHTHTT with 4 runs, we only need to change one result to create 6 runs: HTHHTHTT.
Further, the difference between success and failure can be razor thin. A run of 5 yards on 1st down is typically a success, but a run of 3 or 4 is usually not. That's the practical size of the effect at work. I don't think one or possibly two additional consecutive successes or failures in an entire game are what believers in momentum have in mind.
Still, these results do show some evidence of momentum. That's interesting in itself. However, this result may only be due to teams with offenses and defenses at opposite ends of the performance spectrum. For example, when the 2000 Ravens defense was on the field, we were likely to see streaks of successful plays, but when their offense was on the field, we were likely to see streaks of unsuccessful plays. On net they had an average success rate somewhere between their two squad's rates, which increase the number of runs we should expect.
In the next part of this study, I'll separate offenses and defenses, and also look at series-level success. As mentioned, the difference between success and failure at the play-level can be very thin, and momentum may manifest itself at a higher level. For example, an offense could string together plays like this: SUSUSUS, and still march down the field to score as long as the successes are of enough magnitude to convert first downs. Other methods of analysis could include autocorrelation plots.
If you're curious, here are a few of the most streaky games in the data:
2000 MIA 37 at NYJ 40 (a classic OT comeback) - 84 expected runs, 57 observed
2002 JAX 23 at CAR 24 (another comeback) - 66 expected, 44 observed
2012 SD 27 at NYJ 17 - 61 expected, 42 observed
2013 DAL 21 at SD 30 - (a great visual example of streakiness) 68 expected 48 observed
And here's a good example of non-streakiness. You can actually see the choppiness in the graph:
2012 TEN 23 at IND 27 - 71 expected run, 81 observed