Desperate for football, I'm watching the Hall of Fame Game tonight. There's four minutes left in the fourth quarter and the Redskins are up by 7 over the Colts. Can Jared Lorenzen lead his 4th-string squad to a comeback? I must be the only person in the country who cares. And I only care because I'm investigating Win Probability (WP) in NFL football.

WP is simply an in-game estimate of who's going to win based on the current score and other game variables. This post will examine the potential application of WP and will illustrate a first cut at actual WP for various scores and time remaining.

Two of my recent posts discussed measures of utility in football. I looked at first down probability and at point expectancy. First down probability analyzes how likely an offense is to convert their present down and distance situation to a first down. The success of a play can be judged based on how it changes the probability of a first down. Point expectancy measures how many points a team scores on average based on its field position. This technique not only measures success, but can provide coaches with a decision-making tool. We saw, however, that both of these techniques had their limitations.

Win Probability (WP) has been a facet of baseball sabermetrics for many years. In baseball it basically measures the probability one team will win based on score, inning, outs, and runners on base. I suppose the batter's count could be included in the calculations too.

The usefulness of WP goes beyond fan curiosity about whether the home team has a chance to win (but that's interesting in itself). Take a situation in baseball with a team down by 1 run in the 9th inning. There's a runner on first base and no outs. Should the manager call for the steal? WP should instruct his decision. We could calculate the WP of the steal decision by totaling the WPs of the potential outcomes. The WP of the steal decision would be:

WP(steal) = Pr(successful steal) * WP(runner on 2nd, no outs) + Pr(caught stealing) * WP (no runners, 1 out)

This would be compared to the various outcomes of the current at bat without stealing to decide which decision gives the team the best chance to win. There would be analogous applications of WP in decision-making in football.

Baseball is a sport well-suited to WP because it has a limited number of discrete states. There are 27 outs for each team, 3 bases, and 3 outs per inning, and there is enough historical data to accurately calculate the historical WP for each state. Football is far more complex. The states are continuous and non-discrete. For example, compare field position to runners on base. There are only eight (I think) combinations of base runners, but there are 99 yard lines. Or compare each baseball team's 27 outs (54 total) to the 3600 minute and second combinations in a 60 minute football game. There is literally over a billion potential combinations of score, field position, down and distance, and time remaining.

WP in football can be simplified, thankfully. For example, time remaining can be grouped into minute or 30 second increments. Field position could be grouped in chunks too. Even so, there are still so many combinations of states in a football game that a good WP model would need a lot of data. And I'm not talking about an entire season of play-by-play information. We'd need years of data, and even then we'd need a lot of mathematical smoothing and "best fit" estimation.

Others have looked at WP before. The site ProTrade.com built a model and it was used on ESPN.com for a time. This site (which I highly recommend) also has a model based on backward induction. [Edit: I had originally mentioned the FootballCommentary.com model was flawed because it treated each score difference linearly, which is not the case. The author, Bill Krasker, wrote me to correct the record. My thanks to him.]

The graph below is my own first cut at WP for the NFL. It is based on all regular season games from the 2000 through 2007 seasons. For the most common score differentials, it plots the WP of the team with possession of the ball. For example, the top curve, labeled +7, is the WP for a team that is winning by 7 points, and has the ball, at each minute remaining in a game. The -7 curve, at the bottom, is the WP for a team trailing by 7 points and has the ball.

It does not factor in field position or down and distance situations. So, this should be considered a baseline and not a finished product. But already we can see some interesting things.

A few things stand out to me right away. Notice the sudden drop off of the -7 curve. A team trailing by a touchdown with 20 minutes left in the game (5 min left in the 3rd quarter) sees its chance of winning fall dramatically. There are similar drop offs for teams trailing by 1 and 3 points at the beginning of the 4th quarter.

Also notice how the WP for a team down by 3 has a slight uptick, from .20 to .30, in the last few minutes of the game. I think this is because if they are able to score and at least force overtime, there is not much time left for the opponents to mount a scoring drive themselves.

One particular surprise is that teams down by 1 point early in the 4th quarter, who have the ball, are actually favored to win. By 7 minutes remaining, the team trailing by a point falls below even odds.

The application of WP could have profound consequences. Take a fairly common scenario. Your team is down by 3 points with 4 minutes left in the 4th quarter. You're facing a 4th down and goal from the 2 yard line. Conventional wisdom screams field goal! Get some points on the board and at least force OT. But WP might say something else. (Keep in mind field position is not considered yet, so this is rough).

If you kick the (virtually automatic) field goal, that ties the score and gives the opponent the ball with just less than 4 minutes remaining. Your opponent has a 70% chance of winning in this situation according to the graph. You've got a 30% chance.

Let's see what happens if you go for it. If a 4th down try for 2 yards would be successful about 30% of the time (which it is, at least for 2007), your PW is the total probability of the two outcomes:

= 0.30 * 0.92 + 0.70 * 0.22

= 0.43

That's a 43% chance of winning by going for the TD vs. a 30% chance by going for the FG. I think this actually understates the chance of winning by going for it because if you fail, the opponent gets the ball very near his own goal line. So if you get the ball back, chances are you won't have far to go to get back into field goal range.

A lot of work still needs to be done, but the potential for WP in football is enormous.

Wow, something besides a fat joke came from watching Jared Lorenzen play football.

I was in Vegas last weekend, and let me tell you, you were absolutely NOT the only person who cared about the result. Although I suppose most Vegas folk were more concerned about "Redskins -4.5" than Redskins.

Personally, it's always a little strange for me to watch Colts vs. Redskins. I'm happy since nobody got hurt.

I like the analysis. It's good to be getting back to football.

The way I see it, you should be able to take the +'s and -'s and apply one to you and the other to your opponent. So there are 10 minutes left in the 4th quarter and you are u by one point.

Shouldn't your chance to win when up by 1 point + your opponents chance to win when down by 1 point for any given minute of the game add up to 100%.

I'd expect that the + curves the - curves would exact reverses of each other.

With 10 minutes left in the 4th quarter the team that's up by 1 has a 57% chance of winning and the team that's down by 1 has a 65% chance of winning. That adds up to 122%. Seems wrong to me but everything I know about statistics I learned from reading NFL stat sites so I could be wrong :)

Anon-No, the probabilities for +1 and -1 would not add up to 100% in this case. But good question and this is something I should clarify.

The probability curves in the graph consider possession. So the probability curve labeled +1 is for a team up by 1 point

and has the ball. Conversely, the curve labeled -1 for for a team down by a pointand has the ball.The extra 22% you point out could be considered the value of simply possessing the ball, in terms of probability of winning at that point in the game.

Thanks for the question.

Can you explain the +0 line? In that situation, both teams should have it, so, I don't grok how the line could deviate from 50%?

Argh. Didn't read through all the comments. The "has the ball" was the part I missed.

I understand how having the ball could give you a >50% chance of winning with -1 or 0 pt deficit, but how can being down 1 point (vs. 0 or +1) have a better WP?

In other words how can the purple line ever be above the green (let alone red). Ditto green over red for that mystery spot at the beginning of the 4th quarter.

WS-Very perceptive. Regular reader JonnyMo pointed that out to me. See the explanation in my article "The End Game." Bottom line is that teams with very small leads play too conservatively and teams with small deficits play more aggressively, which may be closer to the generally optimum level of risk/reward balance.

Hey,

I just stumbled upon this here. I was the one who created Protrade's Win Probability model. Even though our business model has steered us away from analytics a bit for the time being, I still see it as my baby :-) It took a LONG time to build.

Some comments:

* we used about 7 years of Play by Play data (about 2M plays)

* we did not group into 5 minute intervals, and did account for end of game situations

* we attempt to account for discountinuous effects (scoring comes in 3/7 pt chunks and down by 4 vs 5 late is very similar)

* our model takes many factors into account: score differential, down, distance, field position, time, timeouts, field types,...)

* the output of our WP model does agree with Roemer that coaches are too conservative on 4th down.

Keep up the good work Brian! All these articles are very interesting.

-Mark mkamal@protrade.com

Great site. How do you generate the win probabilities?

Thanks.

The win probabilities are derived empirically. I simply look at the database and compare all games with the same time remaining/score difference/field position/etc. Whatever percent of the time a team in the same (or very similar) situation wins becomes the WP.

There's some data smoothing and interpolation too, but mostly that's it.

Is it safe to assume that at 30min remaining the score differential reflects that half time score and is unaffected by field position and down?

Yes. But that is a general average that does not account for the fact one team is due to receive the kick off.

"The win probabilities are derived empirically. I simply look at the database and compare all games with the same time remaining/score difference/field position/etc. Whatever percent of the time a team in the same (or very similar) situation wins becomes the WP. "

Brian you have an amazing site. Really not sure why you don't have a job with an NFL team. My question is, how large a sample size are we talking about for each situation? For example, how often is a team down by 7 on their own 46 yard line with 13:00 left in the 4th quarter?

Thanks. It varies greatly. I use chunks of data, so depending on the situation, I'll average a block of 20 yards of field position and up to 5 minutes of time. Then I'll interpolate between "chunks" for the particular win %. There's a lot of sophisticated modeling and smoothing going on underneath the raw win%. So even if the "chunk" size is very small (<100 observations), it's supported by data in adjacent chunks.

For trivia purposes, (and to show off how easy it is for me!) the number of cases since 2000 that featured a 1st down when down by 7 on their own 47 with 13 min left in the 4th is:

two."For trivia purposes, (and to show off how easy it is for me!) the number of cases since 2000 that featured a 1st down when down by 7 on their own 47 with 13 min left in the 4th is: two."

Well played, and thanks for the response!

do you believe, the outstanding coaches of recent times [walsh, beliczek] were less timid than the run-of-the-mill coaches in play-calling?

they usually were overdogs :)

we don't like jimmy johnson on lombardi ave :(

fascinating stuff. I'm confused though, in this post you said that your WP does not factor in field position, down, distance. But your WP calculator does have those factors. So does that mean you have an updated WP chart?

http://wp.advancednflstats.com/winprobcalc1.php

Why was it stated in the Belichek thread that there was a 60% chance of making 4th and 2 when this very article states that the probability is roughly 30%?

good question. That was for 2007 only, which had an unusually low conversion rate. The numbers in the Belichick analysis are based on a much, much larger data set. See part 3 of 'The 4th Down Study' article linked to at the top right of this page. It has the full numbers.

I think the formula is wrong. I think that based on your chart, the probability of winning with a 4 pt lead is about 80%, not 92%. That's because when you score the TD on 4th and 2, you're giving the ball to the opposition. If you look at the chart, -7 and the ball is roughly 15% and -3 and the ball is roughly 22%. So I'm guesstimating that -4 and the ball is 20%.

Similarly, if you fail on 4th and 2, the win probability is roughly 10%, not 22% because again, you're giving up the ball. IOW, the opponent has the ball and a 3 pt lead so the chart says he has 90% chance of winning and thus you have 10% chance.

If you change the equation to reflect these new values, you get:

.3*.8 + .7*.1 = .31. Just about a wash compared to kicking the FG.

I smell some sample issues with this study. On top of the points Western Spartans made (which I don't think were really resolved) there are two other spots on the charts that can't be right.

One, at the end of the game where the orange line climbs above the purple, it can't be possible that a 3-point deficit is superior to a 1-point deficit. Also, I don't think it should ever be the case that the green line drops below 50%. Having the ball must always be superior to not having the ball.

Sorry if that comes off as too critical! Love the you are created and the logic behind it. It just seems like you need a whole bunch more data to make it accurate.

Oops, that should read "love the model you have created" ...

Interesting stuff. But shouldn't your WP of going for it also factor in the probability of missing the field goal? Shouldn't the equation be something such as ....

WP(go for it) = 0.30 * WP(+4 point lead) + 0.70 * WP(-3 point deficit) + (probability of missing 19-yard field goal) * WP(-3 point deficit) ????

The fact that the +7 and -7 aren't reflections of each other suggests something in your calculation is incorrect.

The graph represents the team that has the ball, so it would not have to add up to 100%

The graph represents the team that has the ball, so it would not have to add up to 100%