Throughout the rest of the 2007 season I intend to publish win probabilities for each game, and season win projections for each team. This post explains the methodology used to calculate these.

Based on a logit regression of every game played for the past 5 seasons, a mathematical model was established to determine the probability each opponent would win a game. The model is based on team efficiency stats which include:

- Offensive pass efficiency, including sack yardage
- Defensive pass efficency, including sack yardage
- Offensive run efficiency
- Defensive run efficiency
- Offensive interception rate
- Defensive interception rate
- Offensive fumble rate
- Penalty rate (penalty yards per play)

Home field is also included in the model. These factors were selected because they are most predictive of future performance, and not necessarily because they explain past performance. Over the past five seasons, the model predicts winners correctly in 69.8% of games (retrospectively). In 2006, the model was correct in 65% of games, well ahead of consensus favorites as determined by betting lines. Last year was particularly difficult year for prognosticators, as consensus favorites only won 57% of the games.

Touchdowns, or red zone performance, or third down success rates are not used in the model because I believe those things are the results of passing and running ability etc. To include them in a model intended for prediction would guarantee it is severely "overfit." In other words, it would capture and explain the unique qualities of past events at the expense of predictive power.

Once the model is established, each game's outcome probability can be calculated. But there are other applications. By calculating the probability a team will win against a notional league-average team at a neutral site, a generic win probability can be determined for each team.

This year the model includes an adjustment for opponent strength. This is especially important earlier in the season when there are fewer data points to establish each team's baseline performance levels. Each opponent's generic win probability is averaged for each team. It is then included back into the win model to refine each prediction. For example, a team with impressive stats against weak teams would not be favored as strongly as a team with similar stats against strong teams.

Another application of the opponent-adjusted generic win percentage is a ranking of each team. Such a ranking is similar to the now ubiquitous "power rankings." A better term for the rankings on this site would be "efficiency rankings."

Lastly, final win totals can be estimated by calculating the probabilities of a team's future games. By using the law of total probability, the probility that each possible final record will occur can be determined. For example, if a team has two games left, one with a 0.7 chance of winning and one with a 0.5 chance of winning, the probability of winning 0, 1, or 2 games can be calculated.

Touchdowns, or red zone performance, or third down success rates are not used in the model because I believe those things are the results of passing and running ability etc. To include them in a model intended for prediction would guarantee it is severely "overfit." In other words, it would capture and explain the unique qualities of past events at the expense of predictive power.

Once the model is established, each game's outcome probability can be calculated. But there are other applications. By calculating the probability a team will win against a notional league-average team at a neutral site, a generic win probability can be determined for each team.

This year the model includes an adjustment for opponent strength. This is especially important earlier in the season when there are fewer data points to establish each team's baseline performance levels. Each opponent's generic win probability is averaged for each team. It is then included back into the win model to refine each prediction. For example, a team with impressive stats against weak teams would not be favored as strongly as a team with similar stats against strong teams.

Another application of the opponent-adjusted generic win percentage is a ranking of each team. Such a ranking is similar to the now ubiquitous "power rankings." A better term for the rankings on this site would be "efficiency rankings."

Lastly, final win totals can be estimated by calculating the probabilities of a team's future games. By using the law of total probability, the probility that each possible final record will occur can be determined. For example, if a team has two games left, one with a 0.7 chance of winning and one with a 0.5 chance of winning, the probability of winning 0, 1, or 2 games can be calculated.

2 wins = 0.5 * 0.7

1 win = 0.5 * (1-0.7) + (1-0.5) * 0.7

0 wins = (1-0.7) * (1-0.5)

The same math can be applied with many more games to go but becomes far more complex. Then once we determine the most likely winning percentage for each team, we can compare those expected values to actual outcomes to determine which teams have been lucky or unlucky.

Lastly, as playoff time approaches, we can go one step further. By applying the same principal of total probability, the outcomes of playoff races can be estimated.

Note: The actual game prediction model and coefficients can be found here.

Lastly, as playoff time approaches, we can go one step further. By applying the same principal of total probability, the outcomes of playoff races can be estimated.

Note: The actual game prediction model and coefficients can be found here.

Rather than assuming that including something like, say, first downs or TDs will overfit the model, why not test for he predictive power of these statistics?

I sense some disdain for the FO methodology here. It's true they haven't done (or at least they haven't claimed to have done) any rigorous satistical analysis of the factors they consider. But they do claim that all changes to the model are tested by whether they improve the correlation of the statistics from one year to the next, or the correlation of last years DVOA to next year's wins. They are not testing correlation of this year's stats to this year's wins, which would obviously lead to severe overfit.

I agree for the most part. I'm working on something just as you suggest, but I'm letting the season generate some more data before finalizing it or posting anything.

I'm building a model around series success rates (SSR). It's the percentage a team gets a 1st down in any given series, or prevents one on defense. The average rate is 65% in the NFL. I would think that each teams offensive and defensive SSR is a very simple, handy method of capturing a lot of data about a team. One way or another it captures run and pass efficiencies, turnovers, sacks, penalties, and coaching tactics.

So far, however, it's not as predictive as efficiency stats.

Some of FO's stuff is really good, but some of it leaps to conclusions after a couple interesting correlations.

If you're attempting to use SSR as a predictor why not just use points for and against adjusted for strength of opponent? I ask because you say SSR is a simple way of capturing a lot of data about a team...well points for and against is a simple way of capturing ALL of the data about a team.

SSR turned out to be ok as a predictor as I recall, but not better than a yardage efficiency model. Just like points for/against it captures a lot of luck and unique game situations. Yes, points for/against captures ALL team data, but it captures even more noise. The finer the resolution in the picture, the clearer it will be.

Brian,

A friend and I are in the process of developing a regression model for predicting the probability that a team will win an upcoming game. I've looked around the web and have found your site to be one of the most thorough and systematic.

I was hoping you could help answer a few of our questions or direct us to some useful resources. First, I've read your four part series on determining how many wins a team should get over the course of the season. I was wondering if you have a similar set of articles that outline how you determined the coefficients for the week-to-week efficiency model. Second, compiling all the stats available from the past five seasons is quite an undertaking. I was wondering if there is some site you could direct me to that has the stats already set up in a spreadsheet? That would sure save us a lot of time!

Keep up the good work. Your site's always interesting to read. Its good to know that other people are out there trying to understand football in a systematic way.

Brian,

I have a few questions about how you are adjusting for opponent strength.

First, you said that you calculate each opponent's win probability against an average opponent at a neutral site. Doesn't you model require that the AHome variable equal 0 or 1? How can you model a neutral site?

You said that you average the generic win probability for each opponent together and then include it back into the win model. How does this work? Are you creating a new independent variable for opponent strength?

I'm curious because the "Game Model Coefficients" post does not mention adjustments made for strength of opponent.

Thanks,

Brian

The simple explanation is: find each team's opponents' average generic win probability. Say the Cardinals' is 0.55. That translates into a logit value of 1.2 (or so). This means that Arizona is underrated before accounting opponent strength. So, in the logit equation that computes the win probability in their next game, I add 1.2 to their estimation. I would do the same thing for every opponent. Say they're playing Seattle and their opponent average win probability is 0.45. This works out to a logit value of 0.8. I'd add 0.8 to the Seahawks logistic estimate. In all, there's a net 0.4 advantage for Arizona because of opponent considerations. This works out to, say, a 3% adjustment in win probability in favor of ARI.

Sorry, this doesn't translate well without writing out the equations.

Brian,

Thanks for the quick response. Wouldn't a 0.55 win probability correspond to a logit value of 0.2 (not 1.2). Tell me if this is correct. Set 0.55 = 1/[1+(e^(-z))] and then solve for z. Doing this gets me z = 0.2. You got 1.2. What am I doing wrong?

Brian-Your equation is sound. I just picked those numbers out of a hat for an example of the process. The math won't add up.

I'll use a real world example. After week 13 last year, ARI's own unadjusted generic win probability was 0.47 (from a logit value of -.14). Their opponent average win probability was 0.45 (a logit value of -.21).

So I subtracted -.21 from -.14 for an adjusted logit value of -.35. The adjusted win probability then becomes 0.41.

It gets a little trickier, too. Now that I have an adjusted logit value for each team, that changes everyone's average opponent strength. So I iterate the process until the probabilities converge on a stabilized solution.

Ultimately, ARI's generic win probability stabilized at 0.40 for week 13. It dropped from .41 to .40 because their opponent's had slightly weaker schedules themselves.

So now that you know the stabilized value for Arizona's opponents' generic average win probability and its corresponding logit value, you can now include that logit value in the equation for their win probability in their upcoming matchup correct?

Unrelated to all this, I have one more question and I promise this is my last one on this article. In your article titled "Why the Chargers Defense Will Decline in '08" you described how defensive interception rates are more due to luck than any defensive skill. However, you include it as an independent variable in your prediction model. Since you have proven defensive interception rate is no indication of future performance, wouldn't the model be better off without it?

what would the chart look like for the buffalo-houston, 1992 AFC wildcard game; 35-3 with a 41-38 final.

Hi Brian;

I just have a question regarding how you tested

your chosen statitics( above) for your model.

I remember in one of your other posts you outlined this. I believe you collected each statistic through eight games then measured how

well they predicted(the next eight games)

Could you remid me how this is done(or direct me to your past post)

did you run a correlation to 2nd half wins? or points?

thx dan

Dan-Here is the article you're looking for.

Your approach is very interesting. As a statistician I think it's neat. I have been working on a similar type of statistical model that predicts teams' winning percentages and also winning margins against the point spreads. The variables I use are slightly different from yours, however. I am particularly interested in beating point spreads which my model has done 55% of the time between 2004 and 2008. This year (2009) so far my model has beaten the point spreads 62% of the time. You can check out my website at: www.nflforecasts.com

I know you posted this article some time ago, but would you lend some insight as to how you generate the in-game win probabilities? How do you take into account the time left? What does the dataset look like before you fit the model? Many thanks in advance! This stuff is great!

Brian, looking over the week 7 predictions (for 2011), I was surprised not just by Dallas and SF, teams whose records suggest different placings, but by the bottom. I find it hard to imagine Seattle losing at a neutral site to Indy, or that Tampa is worse than Saint Louis.

That lead me to read your methodology. I can't criticize it: I don't have the statistical or historical football knowledge to offer a substantive challenge, but I am curious why points and wins (or wins vs adj. opponent quality) aren't included.

With SF this year, their efficiencies might look poor, but their point differential is third. While the blow-out win against Tampa skews that, I would expect that consistently winning by decent margins would be an indicator of a good team. Likewise, Dallas has a losing record and has negative points differential, not qualities that suggest a good team. Saint Louis hasn't won a game, is -88 points and still ranked above teams with winning records and not entirely pathetic point differentials.

Have you written about why you don't include wins or points in your model (or by not reading the detailed model, did I miss that)? If not, would you address it some time?

What are the odds that all teams lose?

Brian,

Love site, go here all the time, tell my friends about it, etc.

Was looking at WP Calculator. Used the following inputs:

Score Diff: -1 (team with ball trailing by 1)

Time Left: 4 mins in 4th Qtr

Field Position: 36 yard line of opposition

2nd Down and 10 to go.

This gives a WP of .49.

If you change Field position to 35 yard line of opposition, WP changes to .59.

Seems like an extreme swing for 1 yard. Was wondering if I was missing something. Please explain how 1 yard changes 10% of games with 4 minutes left.

Python,

The calculator is using the cutoff of the 35 yl for the boundary between a FG attempt and a punt on a 4th down. It's not 100% realistic, I realize, but that's the explanation.

hi Brian,

good stuff here… but i was just looking at your 4th down calculator and i'm wondering one thing…

how can you predict win percentages etc without entering timeouts??

end of game scenarios, impact of going for it, would be hugely affected by timeouts..

right?