Part 1 examined the possibility that momentum exists by measuring whether teams that obtain the ball in momentum-swinging ways go on to score more frequently than teams that obtained the ball by regular means.
Part 2 looked at whether teams that gained possession following momentous plays went on to win more often than we would otherwise expect.
Part 3 focused on drive success following a turnover on downs, which is often cited by coaches and analysts as a reason not to go by the numbers when making strategic decisions.
Part 4 applied a different method of examining momentum by using the runs test so see the degree to which team performance is streakier than random, independent trials.
In this part, I'll apply the runs test at the series level, to see if teams convert first downs (or fail to convert them) more consecutively than random independence would suggest. But first, I'll tie up some loose ends left hanging from part 4. Specifically, I'll redo the play-level runs test to eliminate potential confusion caused by a team with disparate performance from their offensive and defensive squads.
Recall that the runs test indicates how streaky a string of results is compared to what would be expected by chance. (By the way, just in case anyone is confused, "runs" refers to "streaks" of an outcome, not to running vs. passing.)
The average expected number of runs of the same result ('mu' in the equation above) is calculated based on the number of 'successful' trials (N+) and the number of 'unsuccessful' trials (N-). For our purposes, a trial could be a football play, a series of downs, drives in a game or season of games. One of the best features of the runs test is that it accounts for the proportion of success. In football terms that means it accounts for differences in team strength and the fact that better teams are more likely to have runs of success.
Part 4 of this series looked at the play level, classifying the success of football plays as successful (S) or unsuccessful (U) based on whether the play improved or worsened a team's net scoring potential, as measured by Expected Points. (This is the basis of the ANS version of the Success Rate statistic.) If games tended to produce fewer strings of consecutive success than the runs test tells us to expect, that suggests there may be an element of momentum in the game.
The results of the previous analysis showed that we should expect 67.0 runs in a game. But we actually observed 64.7 runs, which indicates there is slightly more streakiness than if all the plays were purely independent. The problem with my original analysis was that it treated all plays by both a team's offense and defense together, which could overstate the amount of streakiness when a team's offense is significantly better than its defense or vice versa.
For example, imagine a team with the perfectly good offense and a perfectly bad defense. Its total success rate would be 50%, suggesting a relatively un-streaky performance. But in actuality, we would see very long runs of success and non-success, and a new run would be created only when there was a change in possession. Even without any momentum present, we would only observe about 20 runs when we should expect about 80 in this extreme example.
So I reran the analysis, separating offensive and defensive performance. At the play-level:
-Home offenses should expect an average of 32.4 runs, and we observed 31.4.
-Home defenses should observe an average of 33.0 runs, and we observed 31.9.
This is consistent with the results of the initial analysis. We see a very slight momentum effect to the tune of about 2 fewer total runs in a game. But it often only takes 1 flip from success to non-success to create 2 additional runs. (SSS = 1 run, SUS = 3 runs.)
A shortcoming of any play-level analysis is that it ignores the format of the game. A team can alternate successful and unsuccessful plays all the way to the end zone if the minimum magnitudes of its successes are high enough. So a series-level analysis might be more enlightening. I think the series level is the better way to look at things, and it's probably how people that perceive momentum do experience it--in terms of moving the chains. At the series-level:
-Home offenses should expect an average of 13.7 runs, and we observe 12.5.
-Home defenses should expect an average of 13.7 runs, and we observe 12.7.
We can detect a slightly stronger momentum effect in relative terms at the series level as compared to the play level for an entire game. There appears to be about 1 to 2 series worth of streakiness beyond what would be expected if the game were purely comprised of independent trials. (Remember that in only takes one "flip" from a success to a non-success, or vice versa, to create two additional runs: HHH=1 run, HTH=3 runs)
There's some streakiness there in the data, but it's a far cry from what the believers in momentum have in mind. I believe this degree of streakiness is imperceptibly small to even the most experienced observer. If two gamblers in the old west bet on a flipped a coin and they saw HHTTHTHHHT (6 runs) instead of THTHHTTTHT (7 runs), one wouldn't suddenly draw his six-shooter accusing the other of cheating with a trick coin.
The momentum effect we observed with the runs test might be explained by natural phenomenon unrelated to the common notions of momentum. Key mid-game injuries could cause an otherwise unexpected run of non-success. A fourth-down conversion failure is counted the same as a punt or FG attempt, but it also gives a team an additional bite at the success apple, prolonging a run further than expected. "Trash time" might be the biggest factor, where there is an unexpected run of outcomes inconsistent with the rhythm of rest of the game. In retrospect, I think it would be quite shocking to find that football plays were purely independent trials.
This series of articles examined momentum in a number of ways--using different definitions of momentum and different methods of analysis. It looked at momentum at the game level, the drive level, the series level, and the play level. Although it can't be ruled out that there is some grain of truth to the role of momentum, the effect sizes we observed are probably too small to be noticed by a fan or even by a player or coach.
Notes: The data set includes all games from 1999 through the 2013 conference championships. A series success is defined as any conversion for a first down or a touchdown.