## Importance of Run Defense

After reviewing how well the model fit with the actual 2006 season and estimating the luck factor, I noticed that teams with strong run defenses were not well represented in the playoffs. This result is contrary to most conventional wisdom that emphasizes "stopping the run" as a key to winning.

The conventional wisdom makes sense. In a purely logical sense if a team were infinitely bad at stopping the run, its opponents would score a touchdown on every run attempt. So every incremental improvement over an "infinitely bad" run defense would have to improve a team's chance of winning.

But look at a 2006 ranking of run defense. The playoff teams are highlighted in yellow. The horizontal line is at the league average.

Notice that only 5 of the 12 playoff teams were above average while the other 7 are below average. Only 1 playoff team was in the top 7 in run defense. Additionally, 4 of the worst 6 run defenses made the playoffs including the absolute worst 2. Incredibly, the 2006 Super Bowl winner was the very worst--and not just by a little but by .39 yds/run, almost an entire standard deviation worse than the next best team.

To illustrate this another way, I've plotted wins against defensive run efficiency and added a regression line. Notice that as run defense gets worse, wins increase. This is backwards and obviously indicates a big problem in the data.

In fact, the simple linear regression illustrated above indicates that defensive run efficiency is not significant at all (p = 0.839). This is partially because of the small population of NFL teams, n=32.

This raises a larger problem with the model and its baseline data. The model's coefficients were drawn from the 2005 season only. In 2005, winning and run defense were correlated much better than 2006.

There were 256 games in 2005, which means there were 512 "game efforts" by all the teams. The model's significance calculations are based on the sample size, and n=512 seems like a healthy sample at first glance. But although there were 512 "game efforts" there only 32 different teams which creats a hybrid dataset in which n=512 and n=32.

The model can definitely be improved by adding more seasons of data to calculate the coefficients.