After an 11-yard sack, Donovan McNabb and the Philadelphia Eagles were backed up to their own 25 yardline. Down 17-14 to the Green Bay Packers with 1:12 remaining in the game, the top-seeded Eagles were on the verge of being eliminated from their first game in the 2003 playoffs. On 4th-and-26, the Eagles call for a 25-yard slant. McNabb drops back and throws a bullet to Freddie Mitchell in stride, converting and laughing in the face of probability. The Eagles then drove down the field and kicked a field goal, sending the game into overtime where they would eventually win. What are the odds that a drive containing a 4th-and-26 from the 25 would end with a successful field goal? According to the Markov model, a whopping 1 out of 175.

Now some math jargon: A Markov chain is a form of a Stochastic process. A Stochastic process is any process for which we do not know the outcome, but can make estimates based on the probability of different events occurring. A simple example is flipping a coin over and over. We do not know how many heads or tails there will be, but we can guess based on the fact that there is a 50% chance it lands heads and a 50% chance it lands tails.

A Markov chain is special in that only the most recent event matters in predicting the future of the process. If a team has a 1st-and-10 from their own 20, it does not matter how they got there: touchback, converted a 1st-and-10 from their own 10, interception, etc... But, based on the fact that they are currently in a 1st-and-10 from their own 20, we can predict where they will end up next and how they will ultimately finish the drive.

Now onto the creation of the model: The first step was to divide a drive into all possible situations and label them as distinct states. The non-drive-ending states, also known as transient states, were determined by down, distance-to-go, and yardline. The field was divided into 20 zones, one for every 5 yards. Similarly, the distance-to-go was split into 5-yard increments—with all to-go distances of more than 20 yards grouped into one label of 20+. This was done to ensure high enough frequencies for every state; if there were any states that never occurred in a game in the past 5 years, it would detract from the accuracy of the model. The range of frequencies was 6 to 6624, with an average of about 550 visits to each state. There were a total of 340 transient states.

There are 9 possible drive-ending scenarios—known as absorbing states—fitting into the three categories listed above: scoring, giving the ball back, end of half or game. The absorbing states are as follows: touchdown, field goal, safety, missed field goal, fumble, interception, turnover on downs, punt, end of half or game.

**x**100 times, 40 times we went to state

**y**, 60 times we went to state

**z**then the transition probabilities are

**Px,y = 0.4, Px,z = 0.6**.

Once we have the transition probabilities, through matrix manipulation, we can calculate the probability of being "absorbed" into any of the 9 drive-ending states. In addition, we can calculate the average remaining length of a drive, and expected points on any given drive. To play around with the model, go here.

You can see the Markov analysis of Jim Harbaugh's decision to "take the points," here.

*Keith Goldner is the creator of Drive-By Football, and Chief Analyst at numberFire.com - The leading fantasy sports analytics platform.*

This is neat. Do you account for the potential of multiple absorbing states when you calculate the probabilities? I'm thinking like a last second field goal as time expires, or... actually I guess it's more unlikely than I originally thought.

I'm also assuming that touchdown only includes offensive touchdowns - ie when the Giants returned the failed lateral for a TD last night, that the absorption state is just "fumble", not "fumble + TD". Is that right? Otherwise you could have some complicated plays like the Desean Jackson punt return touchdown as time expires (Punt + TD + end of game).

I understand you group by 5-yard increments in order to have sufficient data in every bin, but it seems very likely to introduce significant errors. The error induced is going to be proportional to the probability variation within a state, and the number of times you access a state; and these are both probably largest for relatively low number of yards to go states.

Instead, I suggest you try to fit a set of parameterized functional forms to at least the low-number-of-yards-to-go data, so that you can stretch the data you do have better and make better informational use of it. You can then substitute the functional form's answer instead of the observed rates into finer-grained states.

James -

For times when there were multiple absorbing states, I implemented a hierarchy of preference. So in the example of a FG as time expires, that was listed as a FG. Similarly, like you mentioned with the Rams fumble that was returned for a TD, that was listed as a fumble. In the model, since the defense does not possess the ball, they cannot score (the model terminates in an absorbing state as soon as the fumble is recovered).

Andrew -

Very interesting, and definitely something I will look into. I have a few other tweaks I would love to make to the model if time would permit it, but I think you are right in that the greatest variability would be for the lowest yards-to-go.

Keith,

A very good article and description of the model!

To what extent do you actually believe football follows a Markov process? The foundation of this model (and many others, to be sure) is the idea that momentum doesn't exist. Though they probably don't articulate it this way, I think this is an objection to these types of metrics shared by many "old school football guys" (read: Polian).

I think where the community should go from here is to try to test that assumption. It's been almost 10 years since I've studied this stuff, but there must be some way of using this data to test the Markov assumption.

I'm interested in your thoughts on this.

Thanks for the good article!

Brian

Great model and great analysis. The problem is that your model, like many expected value models, is not robust when it comes to unexpected outlying events and falls apart apart a bit when it comes to extreme cases.

It strains credulity to say that the Eagles' drive scores only 1 in 175 times. The problem for that strange drive is that your model does not take into account time left in the half or the score. A proper analysis of the particular McNabb play would exclude any drive in which punting was a reasonable option. This would involve not only removing all actual punts, but also the successful drives where the offense's situation was no so dire as to completely preclude kicking the ball away.

We would probably be left with only samples from last in the fourth quarter and thus a rather small sample size. The sample size might even be small enough that any analysis wouldn't be meaningful at all.

What I'm getting at here is what we colloquially call four-down territory. The outcomes of drives when a team has the ball between their own 20- and 40-yard lines on 4th-and-20+ is going to be hugely different where they simply need to score now to avoid losing.

Brian -

Yes, there are definitely limitations when it comes to momentum and other immeasurables. As a sports fan and athlete, I believe in the power of momentum. As a statistician, I don't. I'm inclined to believe that we notice momentum as a product of the current events, rather than momentum having predictive power in terms of future outcomes. I've read a few studies about it, but I'm always interested in attempts to measure the immeasurable.

DGold -

I agree with you completely. It is definitely a stretch in those cases since we do not take into account score and time left. But, like you said, sample size is the biggest issue to over come even with years and years of data. A simpler way would be to calculate a team's probability of converting a 4th-and-20+ based on all the relevant attempts and then use the Markov probabilities for the following 1st-down in conjunction with the 4th-down conversion probability.

What about getting even morre technical, and tracking team tendencies in certain situations and the outcome? Like, on 3rd and 5 from their 36 to the opponents 35 yard line, the Saints go with 4 WR's and usually pass to the Right side of the field 58% of the time. In this same scenario, their opponent, the Bucs, run a 2 sided defensive line stunt and blitz their inside backer. We could then compare the outcomes of each of these scenarios to get a more realistic probability.

Following up on Jeff's comment, I think it would be interesting to apply this model to the team level. There will be a data sparsity issue, and it might not be possible to get much data on 3rd and 5 from the 36 against Tampa Bay. But there may be enough data on Sean Payton's offense to get a model specific to the Saints. Thoughts, Keith?

People in chemistry and biophysics attempt to make Markov models for all kinds of phenomena. For instance, they might take terabytes of results from a molecular simulation to make a Markov model of how a protein folds. The Markov model is simple and can predict things on longer time scales. In football, the play logs are like the results of molecular simulation. While much of this work is ad hoc, they may have developed some mathematical ideas that could be applied to football.

http://thepowerrank.com/

In 2010, the standard deviation among teams to convert a set of downs was about 5%. In a very stupid model in which you get three uncorrelated chances to convert a set of downs, this implies a 4% variation among teams in the probability of obtaining a first down on a single play.

For a binomial process it takes 120 plays to start seeing a 4% difference with any meaningful significance. Since the average team runs about 1000 plays a season, you can only bin-up about 8 game-states if you divide up your data by team and you want to see meaningful differences.

Instead, you'd be much better off to aggregate the data over all teams for making the base model. Then find a some way to make adjustments to the model's probabilities using one or two parameters per team, fit from the whole data for the team. (For instance, a "forward-slosh parameter" per team that takes some fraction--depending on the value of team's parameter--of each state's transition probability and moves it to the probability to transition to the "next best" game state. Definition of "next best" may be tricky.)

The various numbers I give above--derived from a patently stupid model of down sets--are not meant to be authoritative, but merely to illustrate rough sizes of probability effects that should be considered important and meaningful, and to get some idea of what statistics might be supportable on a per-team basis over a season.

Ed-

I have wanted to apply this model to teams for awhile. The problem, as you mentioned, is sample size. I would probably follow a similar solution to Sarah Rudd from On Footy (who applied a similar model to Soccer) where for the rare situations, you apply the league average. Another concern is that you could not use more than a years worth of data to apply the model to a team since there is within team variation from year to year.

Andrew -

Definitely an interesting thing to consider. A form of team-adjusted markov model would definitely be interesting. It would require a lot of research to see the most accurate way to adjust the model, however.