As aforementioned a major concern is what will happen to onside kicks. The solution that seems to be emerging is to give the scoring team a fourth down in their own territory. However, what the field position and yardage to cover for a first down should be? Depends on the objective. Do you want to give the scoring team the same chances of recovering (and maybe scoring after the recover) as in the NFL’s onside kick? If yes, you need to set the distance to a value that will lead to conversion rate of about 18.5% – which is the onside kick recover rate. Following I present the conversion rate for 4th (and 3rd) downs in the NFL (granted AAF might be a level or a half level below NFL in terms of talent but these are fairly good numbers as a starting point), where the onside kick recover rate is represented with a horizontal red line.
It seems that 4th and 10 will give an advantage to the offense compared to the current NFL onside kick recover rate. So maybe a 4th and 15 will be more fair (again it depends on what the league is after in terms of allowing teams to get the ball back).
Another variable is where do you position the chains for this 4th down attempt. The idea would be to position them at the team’s own 35 yard line. However, what is also important is where the ball was typically recovered from an onside kick. That was the teams own 47 yard line, so if we assume a 4th and 15, this gets us to the team’s own 32 yard line, which is not that far from the 35 yard line.
Of course, here is where it can really get interesting. The league could allow the team to exchange part of the yardage-to-go for field position (and vice versa). Depending on the talent of the team’s offense or the opponent’s defense this can be a strategic decision. The league’s exchange converter will be a league-average, so individual teams might be able to exploit this. These exchange converters can be calculated from NFL data (and possibly adjusted for the AAF).
The other part that needs to be considered is what happens after a score if the team does not want to attempt an onside kick. Where does the opponent get the ball? The simplest thing (and what I am assuming will end up happening) would be to get it at their own 25 yard line, similar to a touchback. However, this eliminates the ability of a good kicker to pin the offense close to their own yard line (or even a good return team get better field position). So here is another suggestion: have the scoring team choose the extra point kick distance and exchange it with field position for the ensuing drive (similar the starting position for a two-point attempt). For example, if I want to pin my opponent closer than 25 yards to their own territory I might take me kick PAT 10 yards out. Again we will need an exchange converter but again this can be done with appropriate data. For example, using some of the data from my paper presented in the Workshop on Machine Learning and Data Mining for Sports Analytics back in 2015 following is the success rate of a field goal as a function of distance:
The vertical line represents the current NFL kick PAT and we are interested in values greater than this. For the range of distances between 32 and 55, the drop seems to be a bit linear and in particular a linear model explains 85% of the FGs conversion rate. In particular, each 1-yard increase (beyond 32-yards) drops the expected points per FG by 0.014 points. Furthermore, the following is the probability of TD, FG and failed drive given the starting field position as taken from the original paper (so the touchback line is still the 20-yard line ):
If we do the calculations each yard behind the touchback line (we used the 25-yard line) reduces the expected points from this drive by 0.027 (the linear model explains 94% of the variance). So it almost seems that there could be a simple 1-to-2 exchange rule between PAT and field position at the league average level. Every 2 yards behind the current PAT kick, will bring the offense back by 1 yard. Now it is clear that if you want to pin the offense all the way down to their own 1 yard line, this would mean increasing the distance of your PAT by about 50 yards, which will give you practically a 0% chance of making the extra point (falls out of the linear range we used above). This is still a decision the team could make based on their kicker and how he compares to a league-average kicker (and considering the opponent’s offense, time left, score differential, its own defense etc.). The team could even have the opportunity to choose closer distance for the extra point and give the opponent better field position (but I would not expect that to be the case). A similar idea can be used for (no) “kickoffs” after a field goal score. I.e., you can exchange distance for starting field position. Obviously a different exchange converter is needed. All of these could be similar to penalties – i.e., can be accepted or denied by the other team.
The only thing that has not been covered here is what happens with the return unit. In the above only the scoring team (kicking team) makes decisions and can decide where they want to pin the opponent’s offense. How can the (no) returning team simulate a return? A possibility would be again to enhance the above mechanism with an exchange between field position and yards-to-go for the first first down of the next drive. This could actually mean that the scoring team might choose to give the opponent’s offense a better field position for longer yardage-to-go (?) or to spice things up in a FG attempt the defense might also be allowed to offer the offense closer distance for the FG in exchange to yardage for better starting field position (offense can obviously accept/decline). Again data can come to the rescue for this. But well, I am not going to spill all the beans here, but if AAF is interested I am available for hire But jokes aside, it is very interesting to see where this no kickoff rule will go and what will be the impact on collisions and player safety.
]]>Where am I going with this? Well unfortunately in academia people tend to choose their research topics based on where there is funding, since they are constantly pressured by their supervisors (that is, deans, provost, chancellors etc.) to bring in research funds. This is understandable but sometimes we need to make short-term sacrifices (in this case do some unfunded research) to see the long-term potential (i.e., cultivate the land for research funding in the area of interest). I think this time has come for sports analytics. With more academics being hired by league offices and analytics departments of teams I expect them to be more open to these possibilities and initiatives. Tech giants like Google, Facebook and Amazon offer university grants for topics that are tangentially related to their focus. I think that this is a great opportunity for league offices and teams to get answers from academia to questions they might have been trying to answer. After all they are trying to do so through hackathons (which I do not think is the best way to do it – in my opinion). On the other hand, academia can get access to data and funds that will allow them answer deeper questions (similar to the ones people like Thaller, Massey, Romer and others have tried to answer) that advances science as well. This kind of industry grants are not large (typically between 50K-150K) but they are enough to fund one or two years of research and have a long-lasting impact in the organization. In fact, Pitt made some initial effort this year to promote research on sport tech. And as the call mentions: “While the immediate goal is to improve athletic performance, the solutions developed are expected to have broad applications and the opportunity to positively impact people of various ages and physical conditions“. I hope we will see more of that and not only from Pitt and other universities but from professional sports entities too. Actually Texas Rangers took a (very minor) step towards this offering a small scholarship to undergrads that came up with an “innovative analytics approach”. Of course the ratio of the reward to the potential benefits for the Rangers is a bit skewed but still a start…
I can see this being a win-win situation but there are two things that need to be done:
Let’s see how things unfold in the years to come…!
]]>Therefore, from now on I will not be blogging! Or let me say it better, I will only be blogging actual research that I have performed and is completed and peer-reviewed in a more compact way. People that will be interested they can then read the actual research paper. Similar to this post or this post or this post or this post. I will also be posting blogs that explain an analytical technique (using sports data), like this post or this one. (or of course my predictions/opinions). But what I will not be doing is presenting “new research”. I really think that blogging has turned the corner and has actually negative returns for people creating them and reading them, and I can see this from my students! Today there are peer-reviewed journal that will accept and publish applied research too. So you can publish there (the review process will help your analysis too). So I will make a plea to all (sports) bloggers out there: blog responsibly!
]]>So let’s start with some basics on z-score. The z-score of a data point tells us how many standard deviations above/below the mean of the whole dataset this observation is and is calculated by:
where is the mean of the dataset and is the corresponding standard deviation. This is a way to standardize the data, since if one takes the z-scores for all the observations in a dataset they will have a mean of 0 and a standard deviation of 1. Standardizing observations allows us to make comparisons of data that are in different scales, and can thus be used to make comparison of players/teams of different eras. To understand this better before delving into the Bulls-Warriors debate let’s see a nice example from the Mathletics book. Let’s say that someone asks you to compare Roger Hornsby to George Brett based on their batting average. Hornsby had a .424, while Brett had a .390 batting average. A naive comparison will conclude that Hornsby was more impressive since his batting average was higher. However, in the 1920s when Hornsby played the average batting average .299 with a standard deviation of .0334, while in the 1980s, when Brett played the average batting average was .274 with a standard deviation of .0286. If we calculate the z-scores for the two players we have:
Based on the z-scores, Brett was more than 4 standard deviations better than an average batter in his era (of course Hornsby was still extraordinarily better than his average competition, but not as extraordinary as Brett). In other words, standardization puts the data into context relative to the era’s competition and is a nice tool to keep in mind for such comparisons. Let’s now see how we can use z-scores to compare the Bulls and the Warriors.
Obviously there is not one statistic that can answer this kind of question. Furthermore, there is clearly not a deterministic question, since no team has a 100% win probability against any opponent (unless if the game is over), let alone when you compare two all-time greats. Hence, in principle what we would like to know is what is the win probability of a Bulls (90s) – Warriors (10s) matchup. Earlier in this blog we presented a simple, yet accurate, model for calculating pre-game win probabilities for NBA matchups, namely, the Basketball Prediction Matchup (BPM). While you can see the details at the corresponding post, in brief, BPM relies on a Bradley-Terry model on Oliver’s four factors, namely, effective field goal percentage, turnover rate, offensive rebound rate and free throws to field goal attempts ratio. So what one could do is to take each teams four factors, throw them in BPM and obtain a win probability. Not so fast though! The game has changed dramatically the two decades that separate the two teams. For example, in the 90s the NBA was a big man’s league, while now it is a small man’s league. Not to mention the explosion of 3-point shooting. This latter point particularly has a strong effect on one of the four factors, that is, the effective field goal percentage! And while we are at that the 90s Bulls played two seasons with a shortened 3-point line; it would not be fair to make direct comparisons across eras with similar differences. In order to be a little more smart, we will calculate the z-score for the 90s Bulls four factors and then use these z-scores to project them to contemporary NBA. Now, you might be wondering, is this accurate? Most probably not completely – after all every model is wrong, but since some are useful I think this is a good start. Now we had two options here; (i) to project Warriors factors to the Bulls era, and (ii) to project Bulls factors to contemporary NBA. I chose to do the second since we are conditioned to think contemporary but most importantly because BPM was trained using data from the past couple of seasons.
Using data from basketball-reference.com the following table shows Chicago’s z-scores for the four factors during their 6 championship years (note that negative z-scores for the turnover rate is good):
We will begin by considering all possible team matchups (e.g., 1990-91 Bulls with the 2017-18 Warriors etc.). Hence, in order to project the Bulls performance from season to season we will use the corresponding z-score from season and the league averages of season . For example, the projected eFG for the Bulls for the 2014-15 NBA season would be:
Using these projections we can now use BPM to obtain win probabilities. The following show the win probabilities for the 1990s Chicago Bulls against the contemporary Warriors for home and away games respectively:
As we can see – both from the z-scores and the win probabilities – the 1997-98 Bulls might be the least competitive against the Warriors. Potentially this is due to the bad start of the season playing without Scottie Pippen for the first half abd practically being around .500. However, overall the Bulls would be favorite in 83% of the matchups, while the average win probability for a home game for the Bulls is 60%, while for an away home it is 55%. This is not a terribly telling stat, but the Bulls seem to have a slight (?) edge over the Warriors. We simulated 20,000 7-game series between the two teams using these win probabilities (10,000 with Bulls having the home advantage and 10,000 with the Warriors having home court) and the Bulls won 66% of them. On average Bulls won in 6 games, while Warriors won in 6.5 games. Following is the distribution of the series length:
Obviously the above does not settle the debate by any stretch of imagination. As aforementioned using z-scores is just a rough way of making similar comparisons and unfortunately (or fortunately) we will never know how a game between the two teams would turn out. However, I hope it showcases how one can start thinking about comparing players/teams of different eras! Plus it is a fun way to understand (and teach) standardization.
]]>Let’s see why betting is not really different than gambling..ummm investing in the stock market. First we will briefly introduce the Kelly’s growth criteria. Kelly assumes that our goal through betting is to maximize the expected long-run percentage growth of our portfolio measured on a per gamble basis. Without getting into the details of derivation, the Kelly’s criterion determines the optimal bet fraction of your bankroll as:
where:
So basically, if you know the probability of winning the bet then being disciplined and betting a fraction of your bankroll will maximize in the long-run the expected growth of your portfolio. The Kelly criterion helps strike the right balance between risk and safety and most importantly in an easy manner. Of course, if you do not put any wager on the bet! Now of course it comes with drawbacks. For one, you need to be able to work out the real probabilities of bets. However, as I will explain later, for specific types of bets (i.e., the moneyline bets) this is no different than evaluating the calibration of your prediction model! Secondly, the Kelly criterion is inherently aggressive, since it can lead the bettor to wager even half of his total bankroll. However, one can tweak the approach and decide apriori a maximum wager on each bet (which needs to remain fixed for ALL bets), which basically means that the final amount bet would be . Let us see each of these points in more detail.
First let’s explain the moneyline bet which we will use from now on. Consider you want to bet on the Steeler’s season opener with the Browns. You might see something along the lines Steelers -260 and Browns +250. What this means is that if you take the Steelers to win you need to “risk” $260 to win $100 if Pittsburgh wins, while if you take the Browns you can win $250 if you risk $100 and Cleveland wins. Now that we got this out of the way let’s get into the investment strategy.
Real probability of winning a bet: Let us consider that you are betting on a moneyline bet. Say that your model predicts Steelers are going to win with a probability 80%. If your model is well-calibrated (i.e., the probability obtained is the true win probability for each team), then this is the probability of winning a moneyline bet on the Steelers. If you bet on the Browns, then you have a 20% probability of winning the bet. Therefore, if you trust your model — or even better if you have evaluated your model and is well-calibrated — you have the parameters and for the Kelly criterion. The only thing that remains is to calculate and . The is always 1, since when betting on the moneyline you are losing exactly what you have wagered. The depends on whether you bet on the favorite or the underdog. If the line for a favorite bet is , then your if you bet on the favorite. For a bet on the underdog with a line , your . Now you have all the parameters to calculate .
Applying Kelly’s criterion: One of the problems with Kelly criterion as alluded to above is its aggressiveness, represents the fraction of the total bankroll that Kelly suggests to bet. To make things a little less aggressive, one can set a fixed maximum amount of money to wager on a bet (e.g., $50). It is crucial for the strategy suggested here that this maximum wager per bet does not change (i.e., be disciplined to temptetations)!! So assuming a max wager per bet of , for each bet we calculate and . If $latex f_{favorite}>0$ we bet on the favorite , otherwise we pass on this bet. Similarly, if , we bet on the underdog, while if we do not bet on the underdog.
Let’s put this strategy into play — through simulations!!! I collected Vegas moneylines for the NFL seasons 2009-2015 and I used our well-calibrated football prediction matchup (FPM) model. I will not get into the details of the model, but in brief it is a combination of bootstrap and a logistic regression model, while details can be found here. Using the above strategy the following figure presents the Return on Investment (RoI) for each season. RoI is defined as:
The blue line presents the return on $1 of wager for the each season separately, while the red line is the cumulative/rolling return since 2009 and up to the corresponding season. Overall, through these 7 seasons you would have won 25 cents on every $1 you wager.
Furthermore, the average wager on a favorite was , while the average wager on an underdog was . Finally, the average win multipliers were: and .
What about the stock market? A 25% return on investment certainly sounds better than the 7% long-term return to the stock market. There are many similarities on the way the two markets operate but also some important differences that might make specific aspects of sports betting more appealing to some. In both markets the goal is to outsmart some one else (and in both cases we need to be disciplined). In the case of betting we try to outsmart the bookmaker (or your friend if you are placing a friendly wager), while in the case of the stock market we are trying to outsmart another person that will buy from (sell to) us stocks of company X. In the case of (moneyline) betting as explained above outsmarting means making better probabilistic predictions compared to the bookmaker, while for the stock market it means making better predictions on whether the price of a stock will increase or not. Both seem very similar tasks in principles but I have never tried to predict the stock market and there are very good reasons for this (with the main one that I obviously might not be that smart). However, I can guarantee you that a simple model like the one I presented for the NFL prediction will do poorly in predicting the movement of the stock market. The main difference between the two prediction tasks, is that sports games are kind of a closed system, i.e., the only variables that really matter are the players and their performance, while the stock market is a wide-open system where an event 4000 miles away might impct the stock market movement. Simply put, everything that affects the outcome of a game can be measured to some degree, while the price of a stock can be affected by any possible news – even if its fake news. Most glaringly, even for anticipated news we seem to have as good of a grasp as Malkiel’s blindfolded monkey. For example, after news like Brexit and the latest US presidential elections, pundits made sure to inform us about the upcoming crash in the stock market, which we by now know how it ended up. Of course, all of this does not mean that you cannot do well in the stock market; certainly you can and there are indeed experts that have done well fairly consistently, but from a purely predictability point of view, I think it should be obvious that predicting sports outcomes might be easier. And while we are on this topic, Lopez, Matthews and Baumer, have put together a nice analysis on the predictabiltiy and luck for different sports.
What does this all mean? This does not guarantee you gains! Let’s be clear about this. But it shows that if you are disciplined you can have a good return. Do you need to be a math wiz to predict games? No!! But you have to have a good understanding of what probability means (and this is another benefit of legalizing betting; people might understand probabilities better when they lose approximately 25% of the time on a -300 bet!). You could potentially use some of the prediction models that are out there (e.g., FiveThirtyEight’s which seems to be also well-calibrated). If you decide to go down Kelly’s road you should also remember that you need to be patient! You might lose more bets than you win! But this is not what ultimately defines return on investement. Kelly’s formula tries to find the optimal wager given your (or your model’s) belief on a game and the implied belief of the bookmaker from the moneyline. Also think about it — if you bet on a 20% underdog you will lose more than win (in terms of number of bets), but if your probabilities are correct, you will win more money than you lose – again if you are disciplined(!), and this discipline involves avoiding any sort of biases (e.g., for your favorite team).
And just to be clear; I am neither endorsing gambling, nor I say you should use the approach I described I am just saying that there is no reason to believe that betting on sports is different than investing in the stockmarket! I am very interested to see how this progresses.
]]>In order to explore our hypothesis we used play-by-play data from the 2014-2016 NFL seasons. In particular, for every game in our dataset we calculated for each team the utilization of passing as the fraction of passing plays over the total number of its offensive snaps. We further calculated the average expected points added for the passing plays for each team and each game. We have adjusted the expected points for strength of defense. The following figure presents the results (binned for better visualization), where to reiterate passing efficiency is the expected points added per passing play. As we can see there is a declining trend for the passing efficiency as we increase its utilization.
The correlation coefficient is . These results, while they account for quality of passing defense, they do not account for the quality of the rushing game as well as the overall passing ability of the team that can impact the results. Therefore, we build a regression model where the independent variable is the average expected points added per passing play (adjusted for defense) within a game, while the dependent variables include:
The table above presents our results where we can see that the utilization is still negatively correlated with the expected points added per passing play. The interaction term also shows that this correlation depends on the rushing ability of the offense. In particular, the effect of passing utilization on its efficiency is , namely, if the offense runs the ball better the negative relationship between and is less strong. In particular, with (the maximum observed value in our dataset), the corresponding coefficient is -0.42 — compared with a coefficient of -1.33 for the minimum value of in our dataset, i.e., -0.73.
So how much should a team run? Obviously the question depends on many factors but it should be evident that calling passing plays all the time is going to have diminishing returns. While the passing efficiency might still be greater than that of rushing even when , this does not mean that it is the best the team can do. What we a team is interested is maximizing the efficiency on a per-play basis regardless of the type of play, i.e.,
The following figure presents the passing utilization that maximizes the above equation for different values of and . As one might have expected for teams with better passing rating a higher utilization is recommended for fixed rushing ability, while better running game reduces the optimal passing utilization. Note that a rushing EPA higher than 0.3 per play per game is rather unrealistic, and so is having . For the average rushing EPA (marked with the vertical line), the optimal fraction of passing plays is 0.3, 0.47 and 0.63, for a bad, average and great passing offense respectively.
I’d like to note here that the results of the analysis are not and should not be treated as causal, that is, running more does not necessarily cause passing to be more efficient. It might as well be the case that teams that are trailing in the score turn to more passing and this bias the results. However, by estimating the league passing skill curve at the end of third quarter (to avoid the artificial decrease of passing utilization from a team that is ahead), we still got a decreasing function:
When controling for quality of passing and rushing, utilization is still negatively corrlated with passing efficiency. Furtherore, we have estimated these curves using a different EPA model (in fact, the presence of many different EPA models might be one of the reasons that these models/statistics are still not mainstream). In particular, we used the EPA model used in nflWAR and is available at the nflscrapR package. Again similar results are obtained (slightly lower correlation).
Finally, we have treated rushing as being constant regardless of its utilization. While rushing skill curves are weaker as compared to passing the final results will quantitatively (not qualitatively) change. However, it should be evident that there is a clear interaction between passing efficiency and utilization that makes rushing still a piece of the puzzle in the NFL.
Updated content
One of the questions that I keep getting is since running is so much less efficient how can one support (average) teams run as much as the optimization gives you (i.e., close to 50%)? Well I guess people don’t believe in optimization. But regardless, of course the above optimization is a crude reduction of a complex decision/problem, and the exact/true solution most probably is something else. However, it should be clear that is not 100% passing, or heck not even 80% passing. Anyway, since I got asked a lot I will be updating this thread with testable hypotheses on why and when a team might be better off running more. In fact, the above analysis only tells us that there is an interaction between rushing and passing, but nothing about the reasons and mechanisms behind it. So let’s start.
I have been thinking about underdogs a lot — who does not like it when an underdog wins. So one of the first things I want to examine is whether there is a benefit to rushing when you are playing a better team. There are many reasons that I can think of why this might be a true/plausible hypothesis, but the general idea is that more running reduces the pace and creates more variance for the outcome, similar to basketball even though clearly the number of possessions/drives is much smaller. Anyway the reason why I think this might be a plausible hypothesis is not important; what is important is what the numbers say!
I collected Vegas betting lines for the 2009-2015 seasons and for every game with a more than 3 points underdog (to avoid fairly equally matched games) I calculated the rushing fraction for the underdog and whether the dog won or not. Then I grouped the games based on the rushing utilization. For every group I calculated the expected number of underdog wins using the Vegas point spread. In particular, if a team is favored by points, the probability of this team winning the game is:
where, is the cumulative distribution function for the standard normal random variable. Then the expected number of games won by an underdog (p < 0) in a set of games is:
By comparing the actual games won by an underdog and the expected number won we get the following results:
As we can see a higher rushing rate for underdogs lead to more wins than expected. And because I know you are going to ask for reverse causality, all play fractions are calculated up to the end of Q3 in order to avoid “teams that have gone ahead in the score from passing, run the clock in Q4”. OK this tells that more running from underdogs is associated with winning more than expected. But we still do not know why. People have pushed back on my initial reasoning saying that running more does not prolong the drives – i.e., reducing the pace. While this might be the case, the evidence I have seen online (pointed to me by the critiques) are not compelling in the sense that what I see is basically a bunch of drives with 40% rushing plays that are anywhere between 5 and 20 plays. OK, this only tells us that a drive with 40% rushing will give you 5 to 20 plays in total. We still do not know what is the trend when rushing is 50% or 20%. We might still get the same variance (5 to 20) or we might get the same variance shifted (e.g., 10-30). Anyway, let’s assume that indeed running more does not decrease the total number of drives significantly – which to repeat it is highly plausible particularly given the low number of possessions to begin with. What else can be driving the result above? One of the potential reasons might be a lower number of TOs. TOs can happen both in rushing and passing so we obviously have to consider both of them. So let’s examine the TOs per game and their relation with rushing utilization. First we make the observation that the number TOs per game/per team follows a Poisson distribution. In particular, following is the number of TOs per game for underdogs – again Q1-Q3 so do not bring up the fact that a team that is behind the score will call more risky plays at Q4 – (the league-average is very similar — shifted to the left):
We have also plotted a Poisson distribution with mean 1.65 and as we can see they match fairly well. In fact, the chi-square test cannot reject the hypothesis that the true distribution of the data is a Poisson. Given that the TOs/team/game follows a Poisson distribution we run a Poisson regression for the number of TOs committed by an underdog during a game using as covariate the rushing utilization. We also do the same for the league-average case, i.e., not only considering underdogs. The following plot provides the expected number of turnovers obtained from the Poisson regression for underdogs and for the league.
It is clear that overall underdogs commit more TOs on average (they are underdogs for a reason), but the mean of the Poisson distribution reduces with rushing utilization (I guess underdogs tend to have worse QBs…). Now for the league too, the relation is negative, but the slope observed is less steep (actually if we consider only the favorites the slope is very slow). This basically means that underdogs will increase their TOs faster than favorites when they increase their passing rate. How much is this worth in terms of (expected) points? Using the expected points added for every play that led to a TO, a TO is worth approximately 4 points. Hence, when the TO rate is , the expected points lost from TO are:
Using this equation underdogs can “save” anywhere between 1.5-2 points per game from turnovers by running a little more. Obviously this does not explain the whole line they cover, but I would not expect there to be a single mechanism that explains the interaction between rushing and passing. Also when I mention that underdogs do not have good QBs I always get the response that well they do not have good RBs too. That’s most probably true, but it seems it is easier to throw and interception than fumble (?). Also if we break down the TOs, the expected points added for an INT is -4.7 on average, while for a fumble is -2.6 on average (possibly because more INT are returned for a TD – 11% of INT are pick-6s, while 5% of fumbles are returned for a TD).
In conclusion, it seems that there are supporting evidence for an underdog to run a little more. Again everything is observational but the evidence for this hypothesis seem compelling.
]]>We have also included the following interaction terms:
We have used play-by-play data to build this model from the 2014-15 and 2015-16 NBA seasons and following is the reliability curve we obtained.
As we can see the obtained reliability curve is practically the y=x curve, which means that our in-game win probabilities are well-calibrated.
We can then use this win-probability model to calculate the win probability added (WPA) for a lineup during every stint it played and then calculate its win probability aded efficiency (i.e., win probability added per 100 possessions) as: . After obtaining these raw win probability efficiencies we can use the Bayesian average to adjust the ratings in such a way that accounts for the different number of observations (i.e., possessions) we have for each team: , where is the (league) average total win probability added and the average number of observations for all the lineups. We have essentially considered that our prior belief for lineups we have never seen is that of a league average lineup. The more observations we obtain for lineup l, they will eventually outweight our prior belief and when .
But what is the reason for doing all of this? Does this win probability added indeed provide us with a different view of the lineups? To examine that we trained a win probability model using the play-by-play data from the 2014-15 NBA season and calculated the WPA efficiency and the net efficiency rankings for every lineup in the 2015-16 season. We then calculated the ranking correlation between the two rankings obtained from the different metrics. The correlation coefficient is 0.71 and there are two observatins to be made:
The two metrics agree to a large extend with regards to which lineups add value to a team. This is important since the net efficiency rating is in general considered a good lineup evaluation metric.
You might be wondering “Then if this WPA efficiency for 5-men lineup is so great then why isn’t used more?”. The major drawback of this approach is the fact that it is dependent on the in-game win probability model used, and therefore, different models might provide different ratings. In contrast, the efficiency ratings are purely based on points scored and allowed, which is unambiguous. Of course, this is not something that hasn’t happened before in sports analytics. For example, (and full disclaimer I am not a big baseball fan and hence, I am not very well aware of the analytics literature) in baseball there are many different versions of wins above replacement! What is important is the high level idea; how one develops it further and implements it is secondary – at least at this stage – but certainly is critical for the success of the “metric”.
]]>The ratings we got after the end of the regular season are:
Using these ratings we get the following projections (they will be updated after every game):
As we can see CSKA opens as the bit favorite followed by Real Madrid and the reign champion Fener. Many fans (at least in Greece) are looking forward to a Greek final, but the chances of this happening are only about 1.5%.
After the first two games of the series two teams have made the break, Real and Zalgiris. On the other hand CSKA and Fener have made a strong case for being present at the final 4 once more, while the chance of a Greek final is still around 1.5%.
Final Four
We are only days away from the final four and here are our updated projections for the semifinals and the finals:
Big Final
CSKA once more defied the probabilities and … did not win the title, so we are headed for a very balanced final between Fener and Real Madrid. Will Obradovic win his 10th title, or Real win her 10th title? Numbers do not help since they do not give any favorite (in fact Fener is a slight favorite with 50.5% win probability if this is considered a favorite). We are in for a great final!
]]>These ratings are obtained through a regression model (a simplified version of it can be found here) and as we can see Monaco and Tenerife are the top-2 teams with Riesen Ludwigsburg following at the third place. Using these ratings we simulate the playoffs 10,000 times to make our predictions for the final ultimate winner. For the final 4 pairings, we draw them randomly in every simulation (since after all the same will happen when the 4 finalists are known!). Following are our initial projections:
It is not a suprise that Monaco and Tenerife are the favorites, while the cluster of Nymburk, Strasbourg, Neptunas and AEK that are matching up together at the top-16 and top-8 is the most balanced one in terms of probability of making it to the playoffs!
With the first leg of the top-16 round coming up following is the distribution of the expected point differential for each of these games:
Each colored area gives the win probability for the corresponding team (or else positive values for point differential correspond to home team win, while negative values correspond to visiting team win). As we see the two games that are expected to be the most balanced are PAOK-Pinar and Bayreuth-Basiktas.
With the first leg of the top-16 round over, following are the updates in our predictions. As you will see there are some pretty big swings in favor of Banvit, Nymburk and Riesen Ludwigsburg.
Taking the results of the top-16 leg 1 into account following are the projections for each of the upcoming rounds:
Tenerife is still at the top of the race, while Ludwigsburg has emerged as the second favorite, while Monaco’s poor outing (compared to the expectations) at leg 1 dropped their chances for the trophy to 10%.
Quarter finals
The first round of this year’s playoffs involved two major upsets (based on the results of the first leg and the team ratings). The first BCL champions, and the favorite for the back-to-back championship, Tenerife got disqualified after a sensetional performance by Murcia! Before leg 2, Murcia had a mere 1% chance of advancing! At the same time AEK with the (practically) buzzer beater from Punter covered in Nymburk the 10 point deficit from leg 1 in Athens and qualified in the top-8 after having just about 12% chance of doing so! Quarter finals are here and following are the point differential predictions for leg 1 and the updated probabilities for the rest of the tournament. AS Monaco is back at the top of its game and our ratings. However, don’t count out anyone just yet! An exciting round of quarter finals is ahead!
After the first leg AS Monaco remains the favorite for the title but there have been some favorites emerging for participating in the final four. After leg 1 of the quarter finals here are the updated odds.
Final 4
The final 4 is here and after the pairings’ draw here are the projections:
AS Monaco is the big favorite but AEK – playing in front of its fans – has good chances to advance to the final and challenge the French team.
Final
Our predictions for the semi-finals were correct and the final is going to be between AEK and AS Monaco. Monaco is the heavy favorite with a 74% win probability but never count out AEK in front of its fans (and I hope I am wrong — just to reveal my personal preferences if you have not figured them out yet :)).
As you all might know by now AEK defied the odds and whon its 3rd European title. Closing out these predictions here is the in-game win-probability from the big final:
Looking forward to a new season!
]]>As you can see we can easily get the offensive and defensive rating (efficiency) for the lineup, as well as its net rating (simply the difference between the offensive and defensive rating). We also know the minutes played by the lineup and its pace. Using these two we can get an approximation for the number of possessions that the lineup played. The pace value is the number of possessions per 48 minutes for the lineup and therefore the specific lineup shown above played a total of (335/48)*96.5 = 673.5 possessions. When we want to compare two lineups we can check the ratings provided on the NBA’s website and simply see which lineup has higher (lower) offensive (defensive) rating. Right?
Well, not so fast! There are lineups like the one above that have played more than 600 possessions, while there are lineups that have played less than 10 possessions (e.g., Irving, Larkin, Morris, Rozier and Theis have played a whopping 3 possessions!). How confident are we that the lineup ratings we have obtained, are indeed their true ratings, especially for lineups that have played few possessions? We could calculate a probability that lineup A is better than lineup B by making an assumption for (or learning through data) the distribution of the actual performance of a lineup . For example, Wayne Winston in his book Mathletics indicates that when it comes to a lineup’s +/- rating, the actual performance of the lineup over 48 minutes is normally distributed with a mean equal to the lineups +/- rating and a standard deviation of points. Therefore, a lineup that has played a few only minutes will be associated with a high variance and we will be able to further calculate the probability that this lineup is better than another lineup of the team (for which we can also model its performance through a similar normal distribution). However, even if this probabilistic analysis were to be the most accurate representation of reality, when you are presenting your analysis to the coaching staff you should have a simple (yet concrete) message. Probabilistic comparisons of lineups are great but too cumbersome to digest, especially if you are not trained in probabilities and statistics. So is there a way that we can adjust the lineup ratings to account for the fact that different lineups have played many more or less possessions and hence, their true efficiency might be different than the one reported on the NBA’s website (or the one you calculated on your own from the play-by-play data)? Luckily the answer is yes!
In order to achieve our goal we will make use of the notion of Bayesian average. The idea behind the Bayesian average is that when we have a small number of observations (possessions in our case) for an object of interest (lineup in our case), the simple average can provide us with a distorted view. Consider the case of the lineup mentioned above with 3 (offensive) possessions observed. In this situation, all three possessions can easily end up in a made 3 point shot, which will lead to an offensive efficiency of 300 (points/100 posessions). However, it is also very possible that all of the 3 possessions end up with a missed shot, a turnover etc., leading to an offensive efficiency of 0! Simply put, when we have few observations it is very likely to obtain extreme values just by chance. So here is where the Bayesian approach comes into play. In the case of probability estimates, obtaining new evidence allows us to use Bayes theorem and update a prior belief we had for an event:
What does this have to do with our lineup ratings? Well we can adjust the ratings based on some prior belief we have for them. In our case this prior belief can be the team weighted average efficiency of a lineup (or the league weighted average efficiency of a lineups). In particular, considering the team weighted average, the Bayesian adjusted efficiency of lineup i is:
Essentially for every new lineup we begin with a prior belief that this is a (team/league) average lineup. Then every time we obtain a new observation (i.e., a new possession) we can update our rating for the lineup. It should be evident that as we accumulate enough observations for a lineup (i.e., is large compared to ), the impact of our prior belief gets smaller and smaller. For example, while the Bayesian adjusted rating of the lineup in the above figure is 111.6 (practically equal to its “raw” rating of 111.9), for a lineup with fewer observations there can be significant differences. For instance, the Celtics lineup Baynes, Brown, Ojeleye, Rozier and Smart have played 33 possessions with a raw offensive rating of 60.5. However, the Bayesian adjusted rating of this lineup is 78.1, since we have considered a prior based on 24 possessions on average for each Boston lineup and a 102.6 offensive efficiency. The following figure presents the raw and Bayesian-adjusted efficiency ratings for all the Celtics’ lineups. The size of each point corresponds to the number of possessions observed for every lineup. As we can see for lineups with many observations the two ratings have a good correlation. In fact, there is a negative correlation (-0.25, p-value < 0.001) between the absolute difference of the two ratings for a lineup and the number of possessions observed for the lineups, i.e., the fewer the observations the larger the adjustment.
Furthermore, the Spearman ranking correlation between the two ratings is 0.83, which means that while there is a good relationship between the two ratings, there are differences in the rankings that they provide.
As it should be evident one can do the same with defensive and net efficiency ratings. I hope we will start seeing these Bayesian adjustments in mainstream statistics.