Throughout the rest of the 2007 season I intend to publish win probabilities for each game, and season win projections for each team. This post explains the methodology used to calculate these.
Based on a logit regression of every game played for the past 5 seasons, a mathematical model was established to determine the probability each opponent would win a game. The model is based on team efficiency stats which include:
- Offensive pass efficiency, including sack yardage
- Defensive pass efficency, including sack yardage
- Offensive run efficiency
- Defensive run efficiency
- Offensive interception rate
- Defensive interception rate
- Offensive fumble rate
- Penalty rate (penalty yards per play)
Touchdowns, or red zone performance, or third down success rates are not used in the model because I believe those things are the results of passing and running ability etc. To include them in a model intended for prediction would guarantee it is severely "overfit." In other words, it would capture and explain the unique qualities of past events at the expense of predictive power.
Once the model is established, each game's outcome probability can be calculated. But there are other applications. By calculating the probability a team will win against a notional league-average team at a neutral site, a generic win probability can be determined for each team.
This year the model includes an adjustment for opponent strength. This is especially important earlier in the season when there are fewer data points to establish each team's baseline performance levels. Each opponent's generic win probability is averaged for each team. It is then included back into the win model to refine each prediction. For example, a team with impressive stats against weak teams would not be favored as strongly as a team with similar stats against strong teams.
Another application of the opponent-adjusted generic win percentage is a ranking of each team. Such a ranking is similar to the now ubiquitous "power rankings." A better term for the rankings on this site would be "efficiency rankings."
Lastly, final win totals can be estimated by calculating the probabilities of a team's future games. By using the law of total probability, the probility that each possible final record will occur can be determined. For example, if a team has two games left, one with a 0.7 chance of winning and one with a 0.5 chance of winning, the probability of winning 0, 1, or 2 games can be calculated.
2 wins = 0.5 * 0.7
1 win = 0.5 * (1-0.7) + (1-0.5) * 0.7
0 wins = (1-0.7) * (1-0.5)
Lastly, as playoff time approaches, we can go one step further. By applying the same principal of total probability, the outcomes of playoff races can be estimated.
Note: The actual game prediction model and coefficients can be found here.