QB Performance and the Wonderlic Revisited

A recent look at the relationship between the NFL’s Wonderlic test and QB performance suggested that more intelligent QBs performed significantly better than less intelligent ones. Although it was a very interesting study it fell short in several respects.

The study found a very strong relationship between Wonderlic scores and total passing yards (and total TD passes) thrown within the first 4 years of a QB’s career. The study found that when only QBs with at least 1,000 yards total passing are included in the analysis, the results were very significant.

I thought a better way to the study would be to use a minimum number of pass attempts for inclusion, and to use career rate stats such as passing yards per attempt as the measure of performance. I also thought that some other passing stats such as completion percentage and interception rates may be particularly connected to intelligence. I re-did the study with these considerations in mind using the Wonderlic scores reported by the authors and gathering additional data from PFR.

I tested the correlations of various passing stats to Wonderlic scores using two different methods. The first method excluded QBs with fewer than 200 pass attempts and analyzed the remaining QBs (n=29). The alternate method included all the QBs with reported Wonderlic scores including those with very few attempts, but I replaced their stats with those of the 5th percentile qualifying QB (n=61). (My data file can be found here in case anyone wants to do his own analysis.)

This alternate method was intended to account for the fact that many QBs were not deemed good enough in the eyes of their coaches to start or play on a regular basis. Because I’m using rate stats for the analysis, “0.0 yards per attempt” is unrealistically low to assign to a QB. This methodology basically says that given the chance, those QBs would likely end up with replacement-level stats. I labeled this method “Replaced” in the table below.

The correlation coefficients between Wonderlic scores and various QB passing stats are listed below for both analysis methods. Keep in mind I’ve run several correlations, and we would expect to see an apparently significant correlation in some stats by chance 1 out of 20 times.

Stat200+ AttReplaced
Cmp% -0.01-0.02
TD% 0.03-0.08
Int% 0.200.00
Adj Yds/Att-0.10-0.04
QB Rating-0.06-0.04
NY/A 0.070.04
ANY/A -0.030.01
Sk Yds/Att

Adj Yds/Att = Adjusted Yds per Attempt (adjusted for interceptions and TDs)
NY/A = Net Yds per Attempt (yards per attempt factoring in sacks)
ANY/A = Adjusted Net Yds per Attempt (yards per attempt factoring in sacks, interceptions, and TDs)

The chart below is a scatterplot of Adjusted Net Yards per Attempt, a stat that considers nearly all aspects of QB passing performance.

These results tell a very different story than the strong correlations (r=0.50) found by the original authors. It appears that QB passing performance is not related to intelligence as tested by the Wonderlic. None of the correlations are statistically significant. The >200 attempts group requires a correlation of at least 0.31 to be significant, and the “Replaced” group requires at least a 0.21.

Two aspects of QB performance do appear somewhat correlated to intelligence. Smarter QBs tend to get sacked less often and throw more interceptions. It’s a stretch, but it sort of makes sense to me. With a 280 lb defensive end barreling down on a QB like a freight train looking to knock his head off, the smart thing to do would be to throw the ball away! I would say that’s the appropriate (and intelligent) survival response.

The study’s original authors concluded there was a strong connection between QB total passing yards and Wonderlic scores within the first 4 years of a QB’s career, but when we look at career rate stats, the relationship disappears. However, I think we shouldn’t overlook what this suggests.

Smarter QBs accumulate more total yards (and TDs) not because they turn out to be better players, but because they tend to start playing earlier in their careers. They may be able to learn the complex offensive systems of the NFL quicker, and their coaches would be more comfortable putting them behind center. Whether they are truly better prepared to play, or simply appear so to their coaches, would still be an open question.

Do Wonderlic Tests Predict QB Performance?

Quarterbacks with above median intelligence as measured by the Wonderlic test perform significantly better than those with below median scores according to a report done by Criteria Corporation, an employee testing company.

For QB's drafted between 2000 and 2004 who qualified with at least 1,000 yards, the correlation between Wonderlic scores and total passing yards is r=0.51. For TDs thrown the correlation is 0.49. Those are incredibly strong correlations. As the authors put it, "some of the strongest coefficients reported anywhere in organizational psychology."

Here is their graph of passing yards vs. Wonderlic score. You can see the upwardly sloping trend clearly.

The authors restate their findings, "the QBs who scored below the median Wonderlic score (for QBs) of 27 averaged 5,202 passing yards and 31.2 TDs over their first four years, whereas those scoring above the median averaged 6,570 yards and 40.8 TDs over the same period." Here is their other graph:

The research has some shortcomings, however. By my count there are only 30 qualifying QBs, which is a relatively small sample. The results are still significant, meaning we can be fairly certain the true correlation isn't zero, but it may be somewhat less than 0.5. If you move the qualifying cutoff line to 2000 yds and throw out Tom Brady [33 Wonderlic, 10,000+ yds] the correlations become 0.26 and 0.28. Also, the study uses total yards and total TDs thrown through a QBs first 4 years as its performance metrics. This may may bias the results because some QBs in the data have not yet completed 4 years of play. Plus, even below-average QBs can amass large amounts of total yards and "trash" TDs if their team is frequently behind in the 4th quarter.

I would suggest using a minimum qualifying cutoff of pass attempts, and then use yards per attempt or yards per attempt adjusted for interceptions as the performance metric. The authors generously provided their data so perhaps I'll redo the study with that in mind.

The Criteria Corporation study was itself a reaction to another done by professors at the University of Louisville that reached opposite conclusions. That study is riddled with flaws too numerous to detail. In short however, they used initial salary, draft order, and NFL QB rating rating as performance metrics, used no minimum qualifying standard for inclusion, and limited performance scope to the first 3 career years regardless of how little playing opportunity a player had. It's as if the authors don't understand football at all, or intentionally sought out to discredit any connection between the Wonderlic (or intelligence in general) and performance. (By the way, it's frankly amazing to me how full-fledged PhD researcher types can publish severely flawed studies like this, even after peer review.)

Despite the shortcomings of the Criteria's report, I think the data does suggest there may be a measurable connection between intelligence and QB performance. Bad news for Tennessee fans.

If you're curious how your favorite QB scored on the Wonderlic, here is a site with reported scores through the 2006 draft class. And here is another study on the Wonderlic and QB performance that finds no connection.

I took another more critical look at this study here.

Hat tip to Erik Loken via Statistical Modeling, Causal Inference, and Social Science.

Turnovers and 2008 Expected Wins

In a post from last year I noted how team records tend to regress to the mean from year to year based on how well a team did regarding interceptions. When teams did notably well in either offensive (low) or defensive (high) interceptions, the overwhelming trend was for them to win fewer games the following year. Likewise, teams with poor interception stats tended to win more games the following year.

When we look at team records from year to year, regression to the mean dominates. Good teams win fewer games the next year, and bad teams win more. This tendency is extremely strong as illustrated by the graph below. The horizontal x-axis represents each team's regular season win total from the prior year. The vertical y-axis is the change in each team's win total from the prior year to the subsequent year. The more wins a team had, the farther the drop in the following year. Likewise, the fewer wins a team had, the stronger the improvement. For example, the typical 13-win team will tend to win 4 fewer games the following year. And the typical 4-win team will tend to win 3 more.

I previously attributed the strength of the regression phenomenon to the scheduling system which matches opponents according to how they placed in their respective divisions, the draft which allocates draft position in reverse order of win-loss records, and salary cap boom/bust cycles in which individual teams load up on talented and costly players, then 'purge' their rosters to recover salary cap room for the dead weight of past signing bonuses.

While those considerations are very likely to contribute to the churn of team records, I now believe the major cause is the randomness of turnovers. Each team's turnover stats have a random component--think of tipped passes or fumbles bouncing on the turf. To test how strongly turnovers drive the phenomenon of win regression, I calculated the correlations between each turnover stat and the year-to-year change in team win totals (Win Δ). The data is from all 32 teams' five season-pairs from the 2002-2007 regular seasons (n=160).

StatWin Δ Correlation
Int Taken-0.34
Fum Taken-0.10
Net Takeaway-0.32
Int Thrown0.36
Fum Lost0.25
Net Giveaway0.38
Net TO-0.47

These are very strong correlations, considering we are estimating next year's wins with previous year's stats. It's important to point out these are inverse correlations. The better a team does in terms of turnovers one year, the fewer games it is expected to win the following year. To put this in context with other correlations in the NFL, current year TD passes correlate at 0.50 with current year wins.

Based on each team's 2007 turnover stats we can estimate their improvement or decline for 2008. The estimates are based on a linear regression on Win Δ by fumbles lost, fumbles taken, interceptions thrown, and interceptions taken. Those teams that benefited the most from favorable turnover stats would be expected to decline, and vice versa. The table below lists each team and their expected change in wins from 2007 to 2008.

(One caveat--these are not definitive predictions for 2008, these are just based on the overwhelming tendency for teams to regress based on turnovers. Think of these as estimates about which other factors, such as injuries and fundamental improvement or decline, would operate.)

TeamInt TakenFum TakenInt ThrownFum LostNet TOExp Win Δ

Why does randomness and regression to the mean appear so strong in the NFL? I think it's due to a combination of a short schedule and team parity. Sixteen games is simply not long enough for "the breaks" to even out. And if the opponents are relatively equal in ability, then random factors will play a large role in determining game outcomes. When randomness is decisively involved, regression to the mean will be a strong force from year to year.

Elo Ratings

The Elo rating system is a method of ranking players or teams in sports and games. It only considers wins and losses, and it ignores margin of victory. The system was originally created to rate international chess players by Arpad Elo, a physics professor who was himself a master chess player.

In a nutshell, the system estimates the probability one opponent should beat another. If an opponent wins more often than expected, his rating would improve, and vice versa. The algorithm needs to start with a prior expectation of how good each player (or team) is. Then, as the players complete matches, their ratings are adjusted upwards or downwards based on who won. The size of each adjustment is based on how significant the win was. For example, if a grand master chess player beats a novice, his rating would hardly budge, but if a novice beat the master, both ratings would move significantly.

The actual algorithm is based on the function below. EA is the expected win probability of player A. RA is player A's rating, and RB is player B's rating.

After a game between opponents A and B, player A's new ranking (R'A) is revised as:

where K is a maximum size of adjustment, and SA is the actual result of the match. The K value has traditionally been 32 for chess, but it can be adjusted to tailor the system to various other games and sports. Ratings are typically set to have an average of 1500, but this is arbitrary and can be adjusted also.

For example, if player A's rating is 1655 and player B's rating is 1500, then according to Elo's function the probability A would beat B is 0.65. If player A defeats player B, then the actual outcome is 1.00. Player A's new rating would be:

R'A = 1655 + 32 * (1.00 - 0.65) = 1666

One interesting way to look at the ratings is to create a generic win probability. By using the Elo algorithm to compute the expected win probability against a notional average rating, we can get a sense of each team's expected winning percentage.

Sagarin's Application of Elo

Jeff Sagarin uses a version of the Elo system to create NFL team ratings. He transforms them to produce ratings that are predictive of a game's point spread. So the difference between two opponents' ratings, plus an adjustment for home field advantage, predict the margin of victory. Sagarin's adjustment is a straightforward linear transformation of the original Elo system, as you can tell from the graph below. (I suspect Sagarin may over-weight recent games, however.)

Elo Mimicked

Using the same method as I described in my last post, we can mimic Elo ratings. That method computed team ratings based on margin of victory from each game. Instead of using margin of victory we can simply replace the score of each game with a 1 or 0 based on who won. Then we can solve for the ratings that best estimate the game outcomes. Because the ratings are linear we can transform them into individual game probabilities or generic win probabilities using a logistic transformation:

These rating systems can be adapted for any type of game or sport. Recently, on-line games have been using similar algorithms to rank players. The primary advantage to this type of system is that it discounts victories over very weak opponents. Often players will set up phony opponents to beat in order to inflate their own scores.

To get a sense of what these rankings would look like for the most recent (2007) NFL season, the table below lists several ratings for each team. The Elo column lists the ratings I derived from the actual Elo algorithm. The Sagarin column lists Jeff Sagarin's version of Elo--his final 2007 season ratings . Lastly, based on the Elo algorithm, the win probability column lists the probability each team would beat a league-average team on a neutral site. All ratings include results from the playoffs and Super Bowl.

TeamEloSagarin Win Prob