The Passing Premium

In previous posts, I've referred to a concept called the passing premium. Specifically, I point to Benjamin Alamar's paper which found that the expected yards per play is higher for passing than for running. The difference accounts for incompletions and the risk of interception.

I've discovered a major flaw in the author's analysis, however. He does not appear to account for sacks.

His analysis finds that for every passing play in the 2005 NFL season, the expected gain is 5.8 yards per attempt. Interceptions are factored in by assigning them a -45 yard value. (40-50 yards as an equivalent for interceptions is a commonly accepted value. It also makes intuitive sense because an interception differs from an incompletion by precluding the possibility of a punt, which usually nets about 40 yards.) Touchdown passes also get an adjustment because the goal line truncates the pass. For every touchdown pass, an extra 10 yds is added.

The true expected gain from a pass play should be:

(Pass Yds Gained - Sack Yds - Int Adjustment + TD Adjustment) / (Pass Att + Sacks)

The author leaves out the sacks both in the numerator and denominator, which makes a large difference.

The average run yields 4.1 yards. The difference is an unexplained premium for passing, suggesting that play selection is not rationally balanced in the NFL.

Realizing that football is more complex than a binary run or pass decision, and that averages are not always the truest measure of performance in all situations, the difference of 1.7 yards per play remains considerable. So despite those limitations, perhaps coaches should be calling more passes and fewer runs.

I performed my own analysis, repeating Alamar's methodology for 2005 data, and then expanding it to data from the 2002-2006 seasons. By adding 10 yds per touchdown pass and subtracting 45 yards per interception, I also calculated 5.8 yds per attempt. But when I subtracted sack yards, the expected yield for a pass attempt becomes 5.0 yards per attempt.

The passing premium now becomes 5.0 - 4.1 = 0.9 yards per play, much smaller than the author found.

I also calculated running yards per attempt when the same touchdown adjustment is applied. Aren't many running touchdowns truncated by the end zone too? It does seem generous to add 10 yards because some touchdown runs are goal line dives, but many are not. Some touchdown passes would not automatically yield an extra 10 yards either. Adding the 10 yard touchdown bonus makes the expected gain for running 4.5 yards per attempt.

The passing premium would now become only 5.0 - 4.5 = 0.5 yards per play.

Running in some situations, however, has value in addition to yards gained. Towards the end of a game, the leading team can use more clock time by running, denying additional opportunities for the trailing team to score. In short yardage situations, running for a short gain can be more beneficial than the chance to have a longer gain with a pass. Goal line runs can sometimes require 2 or 3 attempts before a gain of the single yard that yields the touchdown, but that single yard is worth the possibility of no gain on previous plays. (In a way, the generous 10 yard bonus for a touchdown run seems more appropriate considering the frequent stuffs on the goal line due to the high predictability of running in that situation.

All things considered, perhaps the run and pass are nearly balanced in the NFL. Balance is important because it suggests maximization. If a team runs too much, a defense can concentrate their efforts on stopping run plays, reducing the expected gain for a run but a greater expected gain for pass plays. Every team would have their own optimum balance, but over the league as a whole the optimum run-pass mix would yield about the same expected gain every play.

Underdogs, Reducing Possessions, and Super Bowl XLII

In the last article, I made the point that underdogs have a better chance of winning a game when each opponent has fewer possessions. Specifically I wrote, "The more possessions each team has, the more likely it is the better team is going to eventually come out on top. With fewer total possessions, the underdog has a better chance to win because randomness plays a bigger role relative to team ability in a game’s outcome." This article will estimate just how much an underdog can benefit by reducing the total number of possessions in a game.

I built a crude simulation of a football game in the PHP programming language. By specifying the number of possessions for both teams and the scoring rate of each team, a simulated score and winner can be determined. By running the simulation many times, we can get a good estimate of the win probability for various numbers of total possessions.

But first, we need to pick a couple of teams as guinea pigs. One needs to be a big underdog against the other. Hmmm, let's see. How about the Giants and Patriots?

The Patriots scored touchdowns in 42% of their possessions in 2007. And 13% of their possessions resulted in field goals. They gave up TDs in 17% of their opponents' possessions and allowed FGs in 7% . In contrast, the Giants scored TDs in 21% of their possessions and kicked a FG in 12%. They allowed TDs in 19% of their opponent possessions and gave up FGs in 11%. The table below summarizes each team's respective drive stats.

Scoring Rate

NE (%)
NYG (%)
Own TDs4221
Own FGs1312
Opp TDs1719
Opp FGs711

If we assume that the each team will score according to the mid-point between each offense's and defense's scoring rate, we can construct a fairly solid model. For example, given the Patriots' 42% offensive TD rate, and the Giants' 19% opponent TD rate, we could estimate the Patriots would score (42 + 19) / 2 = 30.5% of the time. (I realize this is pretty rough, but it suits the simulation's purpose.)

Unfortunately for New York Giants fans, the Patriots won the first simulation 38-7. That was for 12 possessions for each team, the most common number of team possessions in the NFL. But one simple simulation is pretty pointless. After 10,000 of them however, New England won 75.6% of the games and the Giants won 20.5%, with 3.8% of the games going into overtime.

But what if each team only had 10 possessions? How do the underdogs fare? The Patriots' win 72.7% of the games and the Giants win 22.4%. Reducing the number of possessions does boost the chances of the underdog, but only slightly.

The table below lists typical numbers of possessions for each team in NFL games along with the simulated probabilities of winning for each team.

Possessions NE Wins
NYG WinsOvertime

First thing to note is that the Giants have an improbable challenge, no matter how few possessions to which they can limit their opponents. This method appears to confirm my standard logistic regression efficiency model that gives the Giants about a 1 in 4 shot at the title. But that's nothing to sneeze at. How much sleep would you get if you knew you had a 25% chance of your life savings being wiped out by morning? It's not a guaranteed victory for New England by any stretch.

The fewer the number of possessions, the greater the chance of upset. Letting the clock run is in the Giants' interest. Did you hear that Plaxico? Take the hit and stay in bounds, (as long as the score is close). You'll be helping your team more than you could ever understand.

Keeping an Offense "Off the Field"

I frequently hear this nugget of wisdom from football analysts: “Team X needs to run the ball to keep Team Y’s high-scoring offense off the field.” At first it makes sense. An offense can’t score if it’s not on the field, no matter how good it is. But at second glance, maybe it’s not so logical.

Number of Possessions

Unlike a sport such as hockey, football is mainly a game based on turns. Football has no face-offs or tip-offs. After one team has possession of the ball, regardless of how that possession ends, the other team will have a turn with possession of the ball. In a pure turn-based game, both teams are going to have an equal number of possessions. But football is complicated by the fact that time expires at the end of each half. In each half it is possible for the team that started the half with the ball to have one more possession than the other. Therefore, in almost all cases the game will end with the number of possessions being equal or within one possession.

I say almost all cases because there are rare exceptions. An interception or punt returned for a touchdown technically does not count as a possession. In those cases, it is possible for a team to have two consecutive possessions. But you don't have to be a statistician to know that's not a good way to get additional possessions. The analysis below accounts for such cases.

I’m not saying that clock management is not important. Once the game clock winds down towards 3 or 4 minutes remaining, it may become somewhat clear how many possessions are likely left for each team. At this point it makes sense to try to manage the clock by play selection and time-outs. And toward the end of a close game it’s absolutely critical.

But until the final minutes of a half, the way the possessions will shake out is completely unpredictable. Until that point, it’s simply silly to try to keep an offense “off the field” by running the ball. Chewing up time by running early in a half does not guarantee your team will be the one with the extra possession. Ironically, a coach may be keeping his own offense off the field for all he knows.

Slowing Down the Game

On the other hand, a strategy that slows down the game could benefit an underdog by causing fewer total possessions by both teams. The more possessions each team has, the more likely it is the better team is going to eventually come out on top. With fewer total possessions, the underdog has a better chance to win because randomness plays a bigger role relative to team ability in a game’s outcome. A 17-10 lead is far more vulnerable than a 34-20 lead to a single kick return or interception return for a TD.

Can Play Selection Really Reduce Possessions?

Can play selection really limit the number of opponent possessions, and if so by how much? Using data from the 2005-2006 seasons, I analyzed how team possessions and time of possession were affected by play selection.

Play selection can be defined multiple ways: run/pass ratio, runs as a percent of total plays, or a simple difference between runs and passes. I chose the simple difference (runs minus passes) because choosing a run excludes a pass. Every choice precludes the other option. A team can’t increase the number of running plays without declining to pass. (It also had the strongest correlation with time of possession (TOP).)

I adjusted the number of offensive possessions by the number of defensive touchdowns allowed. This correction is necessary due to the effect noted above in which a team can actually have two consecutive possessions due to a return for a touchdown. Likewise, opponent possessions were corrected for defensive touchdowns scored.

The correlations between play selection and the other variables are listed below. There are small but significant effects on TOP and team possessions due to play selection.

Run/Pass Selection Correlation with:
Time of Possession0.23
Adj Own Possessions-0.16
Adj Opponent Possessions-0.15

The correlation is just as strong for a team’s own number of possessions as for opponent possessions. No surprise here—as we expected, the total number of possessions are reduced but the team choosing to run does not have an advantage.

But how strong is the effect? Is it worthwhile to sacrifice optimum play selection to reduce the number of possessions? Based on the standard deviations and correlations of each variable, we can estimate that for every 3 plays called as a run instead of a pass, a team can expect an average of 1:28 additional TOP.

The correlation between TOP and opponent possessions is 0.08. Therefore on average, a team with an extra 1:28 in TOP can reduce an opponent’s number of possessions by 0.64 drives. Therefore, to reduce an opponent’s number of possessions by a full possession, a team would need to swap about 5 passes for runs.

Teams average 28 run attempts and 35 pass attempts per game. That means a team would need to run about 33 times and pass about 30. Actually, it would probably be closer to 31 runs and 28 passes because of the reduction in remaining game time available.

A team would either need to have a very large advantage in the running game or already have a comfortably large lead to run so much. For even the worst passing teams, passes normally have a much higher expected return than runs. Accounting for the possibility of interceptions, passing still has a significantly bigger expected payoff. Except in specific circumstances such as short yardage situations, for every run play chosen over a pass play, a team is sacrificing some amount of total effectiveness (as long as a team runs often enough to prevent the defense from only defending the pass).

I think a much better strategy for reducing the number of opponent (and total) possessions would be to just tell your receivers and backs to take the hit and stay in bounds a few times. Reducing penalties would have a similar effect by keeping the clock running when it would otherwise stop. That way an underdog can keep the game relatively close by reducing total possessions and still have the freedom to optimize its play calling based on match-ups and game situations.

A coach should play calls to maximize his chances of keeping the ball, getting first downs, and scoring. After all, it’s only incomplete passes that stop the clock. If a coach is calling plays expecting a disproportionate number of incomplete passes, he’s probably going to lose anyway.

Is "Red Zone" Performance Real?

Team performance in the red zone is obviously very critical. Teams can make up for a lot of deficiencies if they can move the ball inside the opposition's 20 yard line. But is the red zone real? Certainly, it's tougher to move the ball efficiently inside the 20 because the field is compressed and the defense has less territory to protect. As a consequence, outside the red zone the 2007 league average pass completion rate was 63%, but inside the red zone it was only 56%. So in that sense, the red zone is real.

In this article, I'll compare 2007 quarterback performances inside and outside the red zone. After testing the differences statistically, we'll see if some QBs really have a knack for success inside the 20, or if we're just witnessing the randomness of a small subsample of passes.

Football experts saw a correlation in red zone performance and scoring, and quickly focused a lot of attention on how well a team does inside the 20. It does make sense, to a degree. Teams that are otherwise efficient and can move the ball, but don't put it in the end zone won't score and won't win. A quarterback whose stats are particularly good in the red zone is considered a clutch performer and is believed to have some special ability to perform when it counts most. Likewise, a QB who under-performs in the red zone is seen as lacking what it takes to succeed in the NFL.

"My theory is that the very same abilities that lead to success in the other 80 yards of the football field also lead to success inside the 20."

But what if red zone performance was really just a random subset of overall performance? If we arbitrarily divided the field into any other 20-yard segment and analyzed performance, would we find that some QBs have a special talent between the 40s? After all, pass attempts inside the red zone comprise only 13% of all passes.

My theory is that the very same abilities that lead to success in the other 80 yards of the football field also lead to success inside the 20. A QB who is accurate, aware, can read defenses, and has a strong arm will likely do well in any part of the field. Someone would have to convince me that there is some special talent that becomes more important in the red zone.

Some might say that abilities or flaws are magnified in the red zone because of the compressed field--the density of pass defenders is much higher. If that's true, we should see quarterbacks with good stats do especially well, and below-average quarterbacks should do especially poorly in the red zone.

But that's not what we find. The #1 QB in completion percentage in the red zone last year was Sage Rosenfels. Outside the red zone however, he ranked 20th. Trent Edwards was 29th outside the red zone, but 3rd inside. Looking at the rankings, they seemed very random.

Although completion percentage is generally not the best measure of QB performance, it might be particularly relevant in the red zone for several reasons. Yardage measures are problematic because passes thrown into the end zone are truncated at the goal line. Efficiency measures may not make sense because a 2 yd pass completion from the 2 on 3rd and goal is more meaningful than a 4 yd completion from the 18 on 3rd and 8. Over the course of an entire season and across a full 100-yard field, yards per attempt might tell us a lot more than within a select set of passes near the goal line. In general, a completion in the red zone is always better than an incomplete pass or an interception. Close to the goal line, there isn't often a better or deeper option--any completion is good.

To test whether red zone completion percentage is a special talent or simply a random subsample of overall completion percentage, I used a statistical test known as a t-test. Those familiar with statistics and regression know this is the test that indicates if a variable is significant or not. But it can be used on its own without a regression to test if observed differences are due to a systematic effect or just due to random variation or sample error.

The t-test produces a probability that the observed differences are really just due to chance. Generally, a p-value below 0.05 means a variable is significant--there is a 95% chance there really is a systematic connection between variables. For example, if Matt Cassel completed 4 out of 5 passes in one series of relief for Tom Brady, does that make him more accurate than Brady's 398 for 578? The t-test says no--the sample size is too low and the difference is not big enough to conclude Cassel is more accurate than Brady.

So I performed a t-test on each quarterback's completion percentage inside and outside the red zone. But there was a wrinkle. We already know it's tougher on every quarterback in the red zone because of field compression. We're really interested in whether some QBs significantly over- or under-perform their overall completion percentage. To compare apples to apples, I used a special version of the test that accounts for known expected differences in the means of each group. The NFL average completion percentage is 7.2 points lower for the red zone than outside the red zone, so I used 7.2% as the difference in means.

The table below lists the leading QBs in completion percentage. Their completion percentage in the non-red zone (NRZ) and inside the red zone (RZ) is listed. Next is listed a "value over average" (VOA) number that indicates how a QB over- or under-performed his overall percentage when inside the red zone, accounting for the general increase in difficulty inside the 20. A high positive number means a QB did especially well in the red zone compared to outside it. The final column is the result of the t-test. A number below 0.05 is generally considered significant.

Click on the table headers to sort.


Note that there are in fact three QBs with statistically significant differences in completion percentage inside the red zone. Brees, Edwards, and Rosenfels all had significantly better than expected performance in the red zone. No QBs were significantly worse than expected. So this is proof that those three QBs have a special ability to perform in the compressed field inside the 20, right?

"It appears that despite all analysis to the contrary, there is nothing special about any particular quarterback's ability inside the red zone as compared to outside the 20."

Not exactly. The table above lists 30 of the league's leading passers, and we would expect a handful of QBs to appear significant just by chance (this is known as a "Type I" statistical error). Further, the distribution of t-test values is evenly spread from 0.93 to 0.04. There is no bunching of values toward the significance level of 0.05.

From this, it appears that despite all analysis to the contrary, there is nothing special about any particular quarterback's ability inside the red zone as compared to outside the 20. It's simply a random subsample of overall performance. If a QB is a good passer, he'll probably be good inside the 20.

If this conclusion is correct, it should adjust our view of some of the QBs in the table above. (Sort by the red zone value over average (RZ VOA) column to see the most over- and under-performing QBs.) Take Trent Edwards for example. His over-performance inside the red zone probably makes him appear to be more productive a passer than he truly is. He's not likely to over-perform inside the 20 in 2008 like he did in 2007. Chances are Buffalo fans will be disappointed next year.

The opposite could be said of QBs such as Kellen Clemens. He under-performed in the red zone in 2007. Although his overall numbers aren't terribly great, chances are he will not under-perform inside the 20 as severely as he did last season. It's not a guarantee Clemens will improve, but he would likely trend toward his overall performance level and not repeat his dismal 39% completion rate in the red zone.

This result also means that prediction models and handicappers that overweight red zone performance are reducing their predictive power. They are chasing the past randomness of a small subset of performance. The same is probably also true of the myriad of stats "splits" such as "under the lights" or "road division games " or any other silly and arbitrary classification.

Cold Weather Effect on Scoring

In a recent post I theorized that the sudden importance of run defense in the playoffs might be due to cold weather. This post will continue that line of analysis and look at the effect of cold weather on scoring.

The past weekend's conference championship games were played in frigid weather. It seemed that expert after expert remarked that cold weather would keep the scoring down. Certainly it makes sense to anyone who's played sports in extremely cold weather. It definitely makes it harder to throw, catch, and even kick. But it's just as cold for defenses as for offenses. So does cold weather really keep NFL scores lower?

Here are the average home and visitor scores for various circumstances. The first column is for all regular season games in the 2002-2006 seasons (n=1280), and the second column is for those games played in cold climates (n=114), as defined here. Since many playoff games are played in cold weather, the third column is for all playoff games (n=50+5). (Super Bowl scores are not included because there is no home advantage.)

Reg. SeasonColdPlayoff

The second table looks at scores from the same sets of games differently. The average scores of the winning and losing teams are listed. (Super Bowl scores are included as playoff games here.)

Reg. SeasonColdPlayoff

Cold weather doesn't appear to have a large effect on scoring. It seems to slightly enhance the spread between winner and loser by depressing the score of the loser. This is likely due to the "dome at cold" effect discussed in previous posts.

Playoff scores are generally higher, both in terms of winner and loser, and for the home and visiting teams.

It doesn't appear that cold weather reduces scoring.

I can understand where the perception might come from. Because dome teams are at a disadvantage playing outdoors in cold weather, it follows that they would score less. Many competitive dome teams in recent years have been ones with fast-scoring offenses. The Vikings, Rams, and Colts of recent memories all featured very strong offenses. When these teams were competitive late in the season (and when people were paying attention to them), they would be expected to score less when playing outdoors. But this effect on dome teams would be limited to these specific circumstances and not affect teams in general.

When the Giants played in Green Bay or the Chargers played in Foxboro yesterday, we should not have expected low scores due to the cold temps. Although the frigid sub-zero temperatures yesterday were extreme, even for Green Bay standards, the point is that the weather affects both offense and defense.

Playoff Predictions - Conference Championships

Game probabilities for the NFL conference championships are listed below. Probabilities consider only the last 10 weeks of football. The first half of the season is excluded and recent playoff performance is included, accounting for regular season and playoff strength of schedule.

The probabilities are based on an efficiency win model explained here and here. The model considers offensive and defensive efficiency stats including running, passing, sacks, turnover rates, and penalty rates. Team stats are adjusted for previous opponent strength. Click here for a sortable comparison of playoff team stats from the regular season.

V ProbGameH Prob
0.30 SD at NE 0.70
0.28 NYG at GB 0.72

Does Cold Weather Change the Game?

Recently I looked at the importance of various phases of the game in winning playoff games. I analyzed and compared regular season games featuring only playoff-caliber opponents and playoff games themselves. This analysis began with the question of whether defense really does win championships, but since has focused on broader comparisons as well.

In the last post, I found that over the past five seasons teams with the better run defense won slightly less than half of games between playoff-caliber opponents. But in the playoffs, the team with the better run defense won 67% of the time. With 114 regular season match-ups between playoff-caliber teams and 55 playoff games during the period studied, the difference may only be due to chance. However, in the comparison of regular season games and playoff games, pass offense, pass defense, and run defense did not show any differences nearly as large as run defense.

[Edit: Based on a 2-sample unpaired t-test, the difference in winning percentage of the better defense (67% vs 48%) is indeed statistically significant at the p=0.02 level. In other words, the sample sizes are large enough to say it is extremely unlikely the difference is by chance.]

Here is the table from the previous post. The winning percentage of the team with the superior stat is indicated. For example, the team with the better offensive passing efficiency won 52.6% of the regular season "good vs. good" match-ups and won 63.6% of playoff games.

StatGood vs Good Playoffs
O Run45.645.5
D Run48.267.3
O Pass52.663.6
D Pass51.856.3
O Int Rate50.958.1
D Int Rate55.358.1
O Fum Rate55.340.0
D FFum Rate54.454.5
Pen Rate47.352.7

One possible explanation for the difference in the importance of run defense could be the weather. The playoffs are played in January when the weather is cold and often windy in most NFL cities. Teams might bias their play selection toward the run because of the perceived increase in passing difficulty.

To test if weather is the reason for the observed difference in the importance of run defense in the playoffs, I analyzed regular season games played between any-caliber opponents in cold weather. Without direct temperature and wind data for each game, I defined 'cold weather' as being played outdoors in December in a city that averages below 40 degrees wind chill. There were 118 such games between 2002 and 2006.

(I also looked at just the games between playoff-caliber teams played in the cold. However, there were only 12 such games, so the results are not very meaningful. I also expanded the definition of playoff-caliber to 9+ win teams, for which there were 26 games. I'll list both results anyway in case anyone is curious.)

Below is the table of results. Again, the percentage of games won by the team with the superior stat in each category is listed. The first column (Reg. Season) is for all regular season games and all opponent types (n=1280). The second column (In Cold) is for all games played in cold climates (n=118). The third column (9+ Wins) is for games played in cold climates between teams that ultimately finished with 9 or more wins (n=26). The last column (10+ Wins) is for games played in cold climates between teams that finished with at least 10 wins (n=12).

StatReg. SeasonIn Cold9+ Wins10+ Wins
O Run55.055.445.850.0
D Run50.048.858.350.0
O Pass63.865.666.766.7
D Pass59.860.445.858.3
O Int Rate59.561.054.233.3
D Int Rate59.459.433.333.3
O Fum Rate60.865.054.233.3
D FFum Rate58.061.354.250.0
Pen Rate54.

I'll address the results from the first and second columns. The importance of each stat appears about the same in cold weather as in all games. Other than home field advantage, fumble rates are the only stats that indicate any significant difference in cold weather.

Focusing on run defense, we see that the team with the superior run stopping ability only won 48.8% of the 118 games played in cold weather. This rate is very close to the overall rate of 50.0% for all regular season games. This suggests that it is not the weather but some other factor in the playoffs that may enhance the importance of run defense.

The increased importance of home field advantage in cold weather is probably due to the 'dome at cold' effect, in which dome teams tend to have very little success playing outdoors in cold weather.

Although nothing was conclusively proven with this analysis, there are indications that playoff football is different than regular season football. Although the sample sizes weren't large enough to make solid conclusions regarding most variables, there is enough evidence to suggest that the playoffs comprise a special set of circumstances that may change the dynamics of the game. The level of competition, the weather, the prospect of elimination, or other factors may influence strategies and performances.

At the very least, we can see how fans may perceive defense as being more important in the playoffs. Whether there is truly a systematic link between run defense and playoff success, or it is only the randomness of a small sample, may not be relevant. We have witnessed teams with stronger run defenses win more playoff games. It is apparently, if not in reality, the most important part of the game come January.

Run Defense Dominates in the Playoffs

In the last post I indirectly analyzed playoff games by looking at regular season games that only featured opponents who both went on to win at least 10 games. We saw that teams with the better running games actually won less than 50% of the time.

In this post, I'll look at actual playoff games directly. Compared with the 114 "good vs. good" regular season games I looked at yesterday, there were 50 playoff games, plus 5 Super Bowls, in the 2002 through 2006 seasons.

The table below lists the winning percentage of the team with the superior season-long performance in each stat. In other words, the team with the better [stat] won [x] percent of the time. The regular season good vs. good match-ups are also listed for comparison. (The home win percentage excludes the five Super Bowls during the period.)

StatGood vs Good Playoffs
O Run45.645.5
D Run**48.267.3
O Pass*52.663.6
D Pass51.856.3
O Int Rate50.958.1
D Int Rate55.358.1
O Fum Rate**55.340.0
D FFum Rate54.454.5
Pen Rate47.352.7

** = good vs. good / playoff difference is significant at the p=0.05 level
* =
good vs. good / playoff difference is significant at the p=0.10 level

The home team won more often in the playoffs than in good vs good regular season match-ups, which is expected because higher seeded teams host the playoff games.

The team with the better run efficiency won only 45.5% of the 55 playoff games during the '02-'06 period. Keep in mind the small sample size could make these results misleading. 45.5% is only 2.5 games below 50%. But the result echoes the same result for the regular season good vs. good match-ups.

Defensive run efficiency is a different story. Although the team with superior run stopping ability won only 48.7% of the regular season good vs. good match-ups, it won 67.3% of playoff games. This is a striking difference to say the least, especially considering how unimportant run defense is based on regression models of regular season games.

The passing game stands out as well. Both offensive and defensive passing stats appear to be more important in the playoffs. Interception rates also appear very important.

Another striking result is that of offensive fumble rates. The team with the lower fumble rate wins only 40% of playoff games. As one of the more random stats, it's not too surprising to see a spurious result for fumble rate, but 40% is fairly low, even for such a small sample size.

I suspect that coaches may become too conservative in the playoffs, relying on the run. This might explain why having a good running game doesn't help teams win and why the ability to stop the run suddenly becomes very important in the playoffs.

My own gut feeling is that coaches don't coach to win. They coach to avoid a loss. It sounds inane, but there is a difference. Maybe the play-calling in the playoffs becomes even more timid. But then again, perhaps the January weather has something to do with it. Previous research has established the importance of climate, especially when dome teams play outdoors. Weather may explain the 67% win percentage of teams with the better run defense. Defense may win championships after all, particularly run defense.

Maybe Defense Does Win Championships...

...but the running game probably doesn't help at all.

I previously thought that the "defense wins championships" theory was conventional wisdom bunk. But after doing a new analysis, I think it might be true. It's the importance of the running game, however, that really surprised me.

In a recent post, I illustrated the distribution of offenses and defenses in terms of total efficiency (yards per play). The distribution for offensive efficiency was wider than for defensive efficiency. This indicated that "good" offenses were better than the equivalent "good" defense. In other words, the best offenses in the league tend to get more yards per play above average than the best defenses in the league give up below average.

"Having a good running game is not only unimportant, it actually seems counter- productive."

However, that analysis was for regular season games. Post season games comprise a smaller sample size, usually too small for very meaningful analysis. They're also biased in certain ways. For example, the home team is usually the better team, and it would therefore be hard to separate the advantage in team strength from home field advantage. But the biggest difference between the regular and post seasons is the level of competition.

As an indirect way to infer tendencies about post-season competition, I analyzed regular season games that featured only opponents that would go on to win at least 10 games. I think this criteria best reflects the level of competition usually found in the playoffs. Although 9-win or even 8-win teams occasionally make the playoffs, many do not. Plus, 9 wins is only 1 win above a .500 win percentage, and a 9-win team has never won a championship.

First, I looked at how important various team stats were in determining the winner of match ups between 10+ win teams. I looked at offensive running and passing efficiencies, turnover efficiencies, penalty rates, and home field advantage. The data is from the 2002-2006 seasons, and there were 114 such games between "good" teams. (The stats used here are year-long stats, not stats only within that particular game.)

Instead of an advanced regression analysis, I started by looking at how often a team with an advantage in each particular stat won. The table below lists various team efficiency stats along with the win percentage of the team superior in that stat. For example, the team with home field advantage won 59.6% of the match-ups between 10+ win teams. And the team with the better offensive pass efficiency won 52.6% of the match-ups. The winning percentage for all regular season games is included for comparison. Significant differences in winning percentages between good vs. good games and all games are noted.

StatGood vs. Good
All Reg Season
O Run**45.655.0
D Run48.250.0
O Pass**52.663.8
D Pass**51.859.8
O Int Rate**50.959.5
D Int Rate55.359.4
O Fum Rate55.360.8
D FFum Rate54.458.0
Pen Rate*47.354.1

** = difference is significant at the p=0.05 level
* = difference is significant at the p=0.10 level

What immediately strikes me is that being good in the running game, both on offense and defense, appears to be no help in beating other good teams. Teams with the better offensive running efficiency won only 45.6% of the games, and teams with the better defensive running efficiency won only 48.2% of the games.

Teams with superior passing, fumbles, defensive interception rate, and penalties win slightly more than 50%-55% of the games. I'm surprised passing efficiencies don't appear to be more important. The stats that tend to be more random, such as fumbles and interceptions, appear to make the biggest difference. This result may be due to the fact that when good teams play each other stats like passing efficiency and offensive interception rates are very good for both teams, and the difference is in the more random stats.

"When teams very close in ability meet, the more important other factors such as randomness and home field advantage become."

Home field advantage also appears more important than is typical in the NFL. Home teams usually won 57.4% of all regular season games in the period studied. In the good team vs. good team match-ups, home field advantage appears slightly stronger. Again, the closer the teams are in ability, the more important other factors become.

But having a good running game is not only unimportant, it actually seems counter-productive. How can this be? (First, I should note this is not a regression tested for significance, but with 114 observations, and the fact that both offensive and defensive running abilities appear unhelpful, the results are likely somewhat meaningful.) If true, my theory is that winning teams that count on the running game to win might overuse the run against better opponents. Leaning on the running game wouldn't help, and may actually hurt.

Running too frequently would do harm because the pass does have a higher expected return per attempt (link requires registration), even accounting for the possibility of an interception. Every run attempt precludes a pass attempt, reducing ultimate effectiveness.

To get a better context of the results in the table above, I also calculated the winning percentage of teams with superior stats for other types of match-ups. I analyzed "bad vs. bad" match-ups which featured both opponents that ultimately earned 9 or less regular season wins. Also analyzed were "good vs. bad" match-ups which featured a 10+ win team against a 9- win team. (I realize 9 wins is not "bad," but it's a lot shorter than "other than good.")

The winning percentage of teams with the better stat are listed for each type of match-up in the table below.

StatGood vs GoodBad vs BadGood vs Bad
O Run45.653.755.8
D Run48.251.467.9
O Pass52.663.665.4
D Pass51.859.863.7
O Int Rate50.959.553.8
D Int Rate55.359.457.3
O Fum Rate55.360.563.7
D FFum Rate54.458.062.3
Pen Rate47.353.871.9

The results for the other types of match-ups seem to make sense. Being superior in any of the stats does not appear to be unhelpful (as with running in the good vs. good match-up). The bigger the difference in team record, the larger we would expect the difference in each team stat. Accordingly, the winning percentages are higher for the bad vs. bad and good vs bad match-up types than the good vs. good match-up type.

I could go on and on with observations. Penalty rates appear critical in good vs bad match-ups, offensive passing efficiency appears most important in the bad vs. bad match-ups, etc. I'll leave it to others to draw their own inferences.

Ultimately, when teams very close in ability meet, the more important other factors such as randomness and home field advantage become. Playoff teams are by definition relatively similar in ability, so home field and randomness become critically important. Turnovers are the most random of the stats, especially defensive turnover efficiency. Perhaps then it is randomness that wins championships. And because defensive performance trends are more random than offensive trends, perhaps that's why we see defense as more important come January.

Playoff Predictions - Divisional Round

Game probabilities for the NFL divisional playoff round are listed below. The probabilities are based on an efficiency win model explained here and here. The model considers offensive and defensive efficiency stats including running, passing, sacks, turnover rates, and penalty rates. Team stats are adjusted for previous opponent strength. Click here for a sortable comparison of playoff team stats.

V ProbGameH Prob
0.26 JAX at NE 0.74
0.20 SD at IND 0.80
0.45 SEA at GB 0.55
0.19 NYG at DAL 0.81

Is 3rd Down Conversion Percentage a Good Stat?

I've previously commented that using 3rd down percentage in an analysis of team strength or a game prediction model is not a good practice. I realize this is counter intuitive. 3rd down percentage is highly correlated with winning, and unlike total rushing yards, the direction of causation is clear. Converting the always-critical 3rd down leads to winning. So why wouldn't it be a good stat?

"When we rate how good a team is, we're better off knowing how likely it is to win future games than dissecting past games into molecular detail."

3rd down percentage is a function of a team's passing ability, running ability, an opponent's ability to stop them, and often random luck (you guess pass I call a draw). 3rd down percentage is an intermediate result between running/passing and the final result of interest--team wins. Injecting an intermediate result into a regression model may be useful in analyzing why teams won or lost past games, but it does not help evaluate how good a team is, or will be.

Bill Parcells once barked, "You are what your record says you are." I prefer the saying that "you're only as good as your next game." A team's record may be what matters when deciding who goes to the playoffs, but a team can't really change its record--except by winning or losing its next game. So when we rate how good a team is, we're better off knowing how likely it is to win future games than dissecting past games into molecular detail. Many of those details are unique to the circumstances of the past. Models like this are known as "over-fit."

Stats such as 3rd down percentage tell us more about what has happened to a team in the past than how well it will do in the future. In a recent article, I tested how well various stats endure through the season. If a team stat from the first half of the season does well predicting itself in the second half of the season, we have a good idea that it is an enduring and repeatable skill, and not primarily the result of randomness and non-repeating circumstances. The table below lists how well each team stat correlates with itself between the first and second half of a season.

O 3D Rate0.43
D Int Rate0.08
D Pass0.29
D Run0.44
D Sack Rate0.24
O Fumble Rate0.48
O Int Rate0.27
O Pass0.58
O Run0.56
O Sack Rate0.26
Penalty Rate0.58

Offensive 3rd down rate endures fairly well within a season, with a correlation coefficient of 0.43. But what if I could predict a team's 3rd down percentage with a completely different stat better than past 3rd down percentage itself? What does that tell us about 3rd down percentage as a stat?

The table below lists other offensive efficiency stats as predictors of 3rd down percentage. In other words, these are the correlations between a team's other stats from the first half of a season and the team's 3rd down percentage from the second half of the same season.

O 3D Pct0.43
O Pass0.56
O Run0.08
O Sack Rate
O Int Rate

We can actually predict a team's 3rd down percentage better with offensive pass efficiency, or with sack rate, better than with a team's to-date 3rd down percentage. And with the correlation with run efficiency at a very small 0.08, we see that the passing game has almost everything to do with 3rd down conversions. (Teams tend to pass on anything longer than 3rd and 1 these days.)

So why include 3rd down percentage in a rating of team strength or a win prediction model when passing stats are already included? It would only serve to add random noise. Instead of telling us how good a team is or will be, it would tell us more about the unique circumstances and random luck the team experienced in the past.

If we still want to use 3rd down percentage as a stat to predict how good a team will be, we can. After all, 3rd down success is critical in sustaining drives and scoring points. It correlates with team wins at about 0.49 and with points scored at 0.65, both relatively very high. Instead of actually using to-date 3rd down percentage, we should estimate what the 3rd down percentage will be based on the stats we know to be predictive.

The table below is a regression model using passing stats to estimate future 3rd down percentage.

O Sack Rate-11.60.00
O Pass Efficiency
O Int Rate-1.530.00

The actual model coefficients aren't as important as the fact that the r-squared is 0.94. That means that we can predict a teams's future 3rd down percentage with almost crystal ball-like accuracy using passing efficiency stats. And ironically, if we add previous 3rd down percentage itself to the model, it is the only non-significant variable (p=0.13) and r-squared is (strangely) reduced.

An r-squared of 0.94 is the equivalent of a correlation coefficient (r) of 0.97. Remember, this compares to the self-correlation of previous 3rd down percentage of only 0.46.

So if we want to know a team's ability to covert 3rd downs, we're far better off looking at passing stats than previous 3rd down conversion rates. And a prediction model is far better off using those passing stats (pass efficiency, interception rate, sack rate) and excluding to-date 3rd down percentage.

Explanation vs. Prediction

I'm always interested in improving my model for predicting game outcomes. My logistic regression model is based on straightforward variables: offensive and defensive passing and running efficiencies, turnover rates, and penalty rates. In this post, I'll question some of my own assumptions and begin to look at which variables really belong in a prediction model.

Explanatory vs. Predictive Models

As I updated the data for the predictions each week over the recent regular season, I noticed that some of the variables were more consistent than others. Turnover rates were particularly erratic. A team with a very good interception rate in the first half of the season would very often have a below average interception rate in the second half of the season.

Any basic regression model that attempts prediction is based on an assumption that the variables used as predictors, the 'to-date' variables, are indicative of what the same variables will be in the future. In football terms, when I include each team's interception rates from weeks 1-8 in the model to predict outcomes for week 9, I'm assuming that previous team interception rates are representative of future team interception rates.

But what if interceptions were completely random? Weeks 1-8 would not be predictive of week 9. Even though interceptions would still explain a large part of previous outcomes, past interceptions would not predict future outcomes at all. Just like mutual funds, past turnover performance does not guarantee future returns.

What if interception rates were just 'mostly' random? Should they still be included in a prediction model? Perhaps variables with lots of random noise such interceptions or fumbles should not be included in predictive models even though they explain a large part of past outcomes. The question becomes 'what is the critical signal-to-noise ratio that makes a variable appropriate for inclusion as a predictor?' Building on my previous efforts to devise a better passer rating, and on my analysis of Air Yards, I've created a more complete passer rating formula.

"Interceptions are very random, and they are 'thrown' by an offense much more than they are 'taken' by a defense."

This question underscores an important part of model construction. There are two kinds of models. One kind explains past outcomes, and the other predicts future outcomes. The 'explanatory' kind can contain all kinds of random variables, but the 'predictive' kind should limit the amount of random noise as much as possible. Ideally, it should be all signal and no noise.

There is so much statistical data available for football teams that it is tempting to dump them all into regression software. Doing that would produce a very high r-squared, but would include so much noise, so many non-repeating circumstantial conditions, that it would not be an effective prediction model. I believe this is why so many other models out there do so poorly. A system like DVOA may be very good at quantifying how well teams have done to date--something we already know, but not as good at telling us which teams are likely to do well in the future.

Team Stat Self-Correlations

To test which variables should remain in a prediction model, I tested how well each variable predicted itself from the first half of a season to the second half. This is known as longitudinal auto-correlation. This method tests how enduring and repeatable each variable is.

I tested how well team efficiency stats from weeks 1-8 predicted themselves from weeks 9-17. For example, I tested how well offensive passing efficiency from the first half of the season predicted pass efficiency in the second half of the season. Both offensive and defensive stats were tested. I used data from the 2006 and 2007 regular seasons for all 32 teams (n=64, with two exceptions: mid-season 3rd down conversion rate and penalty rates were not available for 2006.)

The correlation coefficients between team stats from weeks 1-8 with stats from weeks 9-17 are listed in the table below.

D Int Rate0.08
D Pass0.29
D Run0.44
D Sack Rate0.24
O 3D Rate0.43
O Fumble Rate0.48
O Int Rate0.27
O Pass0.58
O Run0.56
O Sack Rate0.26
Penalty Rate0.58

The longitudinal correlations range from as high as 0.60 for defensive pass efficiency and 0.58 for offensive pass efficiency, to as low as 0.08 for defensive interception rate.

The defensive interception rate stands out as the least enduring, least consistent team stat. In contrast, offensive interception rates correlate significantly better, with a coefficient of 0.27.

This indicates there is a lot of randomness in interceptions, which is no surprise. But producing defensive interception does not appear to be an enduring, repeatable ability of a team. Instead, it appears that defensive interceptions are more of a function of 1) randomness, and 2) their opponents' tendency to throw interceptions. In other words, interceptions are very random, and they are 'thrown' by an offense much more than they are 'taken' by a defense.

In following posts, I'll demonstrate that some of the more random team stats can be more accurately predicted by using other, less noisy stats instead of the to-date stats themselves. This may have large implications for an improved game prediction model.

Response to Luck and Belichick Article Comments

The traffic to this site has increased quite a bit lately. A lot of it is likely due to the interest in the playoffs, but much of it is from direct links from other sites. Most direct links are to my articles about Rating 'Gameday' Coaches and about Belichick Cheating Evidence. They have appeared on many message boards across the football world, and many of the comments and criticisms are outstanding. Allow me to address some of the comments here. The foundation of both articles deals with luck. Because there are so many new readers here, I'd like to clarify what I mean by luck, and how I use this definition when I apply statistics to make observations about coaching.


To me, a good example of luck is the "bunching" of successful events. In football, first downs are nice, but consecutive first downs are what allow touchdown drives. The number of first downs and the yardage gained in each should be ascribed to skill. Those are things the teams on the field control. But whether those first downs come in bunches or are interspersed is something different.

Perhaps a baseball analogy illustrates my point best. One single per inning gets a team zero runs after nine innings. But nine singles in one inning, followed by zero hits in eight innings would usually yield about six runs. In football, think of a drive as an inning--a team usually needs consecutive successes to score. If players could control when successful plays occurred, sports would be very different. Batters would save their hits for when runners are on base, or when the game is on the line. Receivers would save their dropped passes for the 4th quarter of blowouts.

Here is a football example I've used before: Let's say both PIT and CLE each get 12 1st downs in a game against each other. PIT's 1st downs come as 6 separate bunches of 2 consecutive 1st downs followed by a punt. CLE's 1st downs come as 2 bunches of 6 consecutive 1st downs resulting in 2 TDs. CLE's remaining drives are all 3-and-outs followed by a solid punt. Each team performed equally well-same yards, first downs, turnovers, kicking etc. But the random "bunching" of successful events gave CLE a 14-0 shutout.

That's what I mean by luck. There are several other factors that could be considered random, but my theory is that the bunching effect could explain the bulk of the observed differences in in-game performance and ultimate outcomes.

Luck and the Model of Team Wins

So when I rank teams by luck, I am using my prediction model to estimate how many games a team would be expected to win given their on-field performance. If they win more than expected, I say they're lucky. If they win less, they're unlucky.

Admittedly, the model can't possibly account for every possible consideration on the field of play. There are too many moving parts and inter-dependencies in football to just say, "whatever I don't account for must be luck." There are other factors, such as weather or coaching tactics, some of which are unmeasurable.

But we do know how accurate the model is. We know it accounts for 80% of the variance in team win totals. And there are sound techniques showing that luck, or randomness, accounts for a very large part of the 20% that's left over. That's why I call the model's residual (the difference between estimated and actual wins) luck. Or least the bulk of it is.

Coaching Tactics

Coaching tactics on gameday, such as clock management and whether to kick or go for a first down, are one of the things the model does not account for. It is a small part of the residual. When I ranked coaches on their gameday tactics I used the residual of the model. But a critic would rightfully point out the obvious contradiction--How can the residual be considered luck in one case and 'coaching tactics' in the other?

The answer is that luck is random by definition. It does not correlate with anything. So if you average out enough years of performance, the luck part of the residual tends to cancel itself out, and what's left over is non-random considerations, including coaching tactics. The more years you have in the data, the more likely it is that the luck cancels out. When there is only one year of data, the residual will still contain the luck. This isn't my own personal theory, but one of the central tenets of inferential statistics.

Further, when you have enough years of data and divide up and analyze the data by coaches, and not by something else, you get a good estimate of that coach's gameday contribution to his teams' win totals. Essentially, I'm saying other coaches, given the same on-field capability of their players, would win X many games. Coach so-and-so won on average X+1.2 games per year, so he is credited with a +1.2 "wins added wins per year" score.

A coach that takes a fantastically talented football team to a 10-6 record would not score high. But a coach that can consistently take an average team to a 10-6 record would be ranked at the top.

By the way, I call it 'gameday,' because the preparation and practice part of the coaching job would be reflected in the on-field performance stats and not in the residual. It's the 4th down decisions and such that aren't captured in the efficiency data I use.

Belichick and Cheating

After ranking all the coaches, I had expected to see Belichick at or near the top of the list. He was actually near the middle. So I split his ratings for his tenures at Cleveland and New England. His 'wins added' score was literally off the charts. It was 3 standard deviations beyond any other coach, and he never had any single year that wasn't off-the chart itself. That would make him not only a once-in-a-lifetime type of tactician, but a once-in-a-millennium super-genius.

At first I thought, wow, he really is something special. But then the cheating revelations hit, and I thought this could be due to more than just genius. In fact, it makes a lot of sense given that we already know he is willing to break rules for a competitive edge. There were many other reports of cheating by the Patriots, beyond taping signals, such as exploiting QB helmet radio communications in various ways.

I'm not saying the Patriots aren't a great team or even that Belichick isn't a great coach. They obviously are. But both things can be true. They can be both great and cheating.

A solid criticism of my approach would be that I can't just chalk up the one team that breaks my model to cheating. I'm not. I was scratching my head wondering why this one team defies the statistical tendencies of the 31 other teams. Then several weeks later, it was revealed that that same team had been cheating.

By no means do I claim that my analysis is iron-tight. I think it's useful and interesting. Feel free to disagree.

The Patriots and the Conjunction Fallacy

The conjunction fallacy is when people judge the probability of a series of events to be larger than one probability of its component events. In simple terms, many people mistakenly assign a higher probability to a specific outcome than a more general one. In football terms, this means that many fans underestimate how difficult it is for a team, even an extremely good one, to win the Super Bowl.

In my very simple poll asking if the Patriots had a better than 50/50 chance to win the Super Bowl, 64% of the (few) respondents said yes. I'd vote no, but let's look at what it would take for NE to win the championship.

The Patriots need to win three consecutive games, two of which are at home and one at a neutral site, against the NFL's top teams. What kind of win probability would they need for each game to arrive at a 50/50 chance to win it all? x * y * z = 0.50. For a rough estimate, let's assume their chance to win each game is roughly equal. Their probability would need to be 0.79 in each game. (0.79^3 = 0.50.)

This seems reasonable, but since they wouldn't have home field advantage in the Super Bowl, they would need slightly higher probabilities for the division and conference rounds of the playoffs.

Let's look at what Vegas thinks. According to a major online gambling site, NE is given 9 to 4 odds (0.69 probability) of winning the AFC championship, and 3 to 2 odds (0.60 probability) of winning the Super Bowl. They are also 13 point favorites to beat JAX this Saturday. With a 49 point over/under, 13 points roughly equates to a 0.78 probability (using this method).

There is something out of whack. A 0.60 probability of winning the Super Bowl and a 0.69 probability of winning the AFC, means the individual Super Bowl game probability must be 0.60/0.69 = 0.87. That's amazingly high. And I suspect that's where the conjunction fallacy may be having an effect. The individual game odds are incongruent with the conjunctive odds of NE winning all three games.

I'd guess there is some kind of arbitrage opportunity there for gamblers. Personally, I just think it's interesting how many people intuitively estimate the odds of future events.

My own model estimates NE has a 0.74 probability of winning this weekend. Against IND they get a 0.65 probability. But against SD they would have a 0.84 probability of winning. In total, that gives NE a 0.68 chance at appearing in the Super Bowl. To have a 50/50 chance of winning the Super Bowl, NE would need a 0.74 chance of beating the NFC representative. That would be the same chance they have against the AFC's #5 seed, a (pretty good) Florida team playing in Foxboro in January.

My own sense is that the Patriots have about a 40-45% chance of winning the Super Bowl. I would say 40, but as a dome team, the Colts would have a tougher time in Foxboro due to the January weather.

Luckiest Teams 2007

Based on opponent-adjusted generic win probability (GWP), the number of expected wins can be estimated for each team. Teams that have won more games than expected can be considered lucky, while teams with fewer wins than expected can be considered unlucky.

The list of NFL teams sorted from luckiest (positive numbers) to unluckiest is posted below. We would expect most teams to be within +/- 1.0 wins. So teams outside that margin can be deemed significantly lucky or unlucky.

Click on the table headers to sort.

RankTeam GWP ActualExpectedLuck
1 GB0.641310.22.8
2 SF0.1552.52.5
3 DET0.2874.62.4
4 ARI0.3585.62.4
5 CHI0.2974.62.4
6 CLE0.48107.72.3
7 NE0.871613.92.1
8 NYG0.52108.21.8
9 CAR0.3776.01.0
10 DAL0.751312.01.0
11 NO0.4076.40.6
12 SD0.661110.60.4
13 SEA0.60109.60.4
14 OAK0.2443.80.2
15 MIN0.4987.80.2
16 TEN0.62109.80.2
17 HOU0.4987.90.1
18 PIT0.651010.3-0.3
19 BUF0.4677.3-0.3
20 CIN0.4677.4-0.4
21 WAS0.6099.6-0.6
22 IND0.851313.6-0.6
23 STL0.2534.0-1.0
24 DEN0.5278.3-1.3
25 JAX0.771112.3-1.3
26 BAL0.4056.4-1.4
27 KC0.3445.4-1.4
28 PHI0.6289.9-1.9
29 ATL0.3846.0-2.0
30 TB0.73911.7-2.7
31 NYJ0.4747.5-3.5
32 MIA0.3415.4-4.4