So let’s start with some basics on z-score. The z-score of a data point tells us how many standard deviations above/below the mean of the whole dataset this observation is and is calculated by:
where is the mean of the dataset and is the corresponding standard deviation. This is a way to standardize the data, since if one takes the z-scores for all the observations in a dataset they will have a mean of 0 and a standard deviation of 1. Standardizing observations allows us to make comparisons of data that are in different scales, and can thus be used to make comparison of players/teams of different eras. To understand this better before delving into the Bulls-Warriors debate let’s see a nice example from the Mathletics book. Let’s say that someone asks you to compare Roger Hornsby to George Brett based on their batting average. Hornsby had a .424, while Brett had a .390 batting average. A naive comparison will conclude that Hornsby was more impressive since his batting average was higher. However, in the 1920s when Hornsby played the average batting average .299 with a standard deviation of .0334, while in the 1980s, when Brett played the average batting average was .274 with a standard deviation of .0286. If we calculate the z-scores for the two players we have:
Based on the z-scores, Brett was more than 4 standard deviations better than an average batter in his era (of course Hornsby was still extraordinarily better than his average competition, but not as extraordinary as Brett). In other words, standardization puts the data into context relative to the era’s competition and is a nice tool to keep in mind for such comparisons. Let’s now see how we can use z-scores to compare the Bulls and the Warriors.
Obviously there is not one statistic that can answer this kind of question. Furthermore, there is clearly not a deterministic question, since no team has a 100% win probability against any opponent (unless if the game is over), let alone when you compare two all-time greats. Hence, in principle what we would like to know is what is the win probability of a Bulls (90s) – Warriors (10s) matchup. Earlier in this blog we presented a simple, yet accurate, model for calculating pre-game win probabilities for NBA matchups, namely, the Basketball Prediction Matchup (BPM). While you can see the details at the corresponding post, in brief, BPM relies on a Bradley-Terry model on Oliver’s four factors, namely, effective field goal percentage, turnover rate, offensive rebound rate and free throws to field goal attempts ratio. So what one could do is to take each teams four factors, throw them in BPM and obtain a win probability. Not so fast though! The game has changed dramatically the two decades that separate the two teams. For example, in the 90s the NBA was a big man’s league, while now it is a small man’s league. Not to mention the explosion of 3-point shooting. This latter point particularly has a strong effect on one of the four factors, that is, the effective field goal percentage! And while we are at that the 90s Bulls played two seasons with a shortened 3-point line; it would not be fair to make direct comparisons across eras with similar differences. In order to be a little more smart, we will calculate the z-score for the 90s Bulls four factors and then use these z-scores to project them to contemporary NBA. Now, you might be wondering, is this accurate? Most probably not completely – after all every model is wrong, but since some are useful I think this is a good start. Now we had two options here; (i) to project Warriors factors to the Bulls era, and (ii) to project Bulls factors to contemporary NBA. I chose to do the second since we are conditioned to think contemporary but most importantly because BPM was trained using data from the past couple of seasons.
Using data from basketball-reference.com the following table shows Chicago’s z-scores for the four factors during their 6 championship years (note that negative z-scores for the turnover rate is good):
We will begin by considering all possible team matchups (e.g., 1990-91 Bulls with the 2017-18 Warriors etc.). Hence, in order to project the Bulls performance from season to season we will use the corresponding z-score from season and the league averages of season . For example, the projected eFG for the Bulls for the 2014-15 NBA season would be:
Using these projections we can now use BPM to obtain win probabilities. The following show the win probabilities for the 1990s Chicago Bulls against the contemporary Warriors for home and away games respectively:
As we can see – both from the z-scores and the win probabilities – the 1997-98 Bulls might be the least competitive against the Warriors. Potentially this is due to the bad start of the season playing without Scottie Pippen for the first half abd practically being around .500. However, overall the Bulls would be favorite in 83% of the matchups, while the average win probability for a home game for the Bulls is 60%, while for an away home it is 55%. This is not a terribly telling stat, but the Bulls seem to have a slight (?) edge over the Warriors. We simulated 20,000 7-game series between the two teams using these win probabilities (10,000 with Bulls having the home advantage and 10,000 with the Warriors having home court) and the Bulls won 66% of them. On average Bulls won in 6 games, while Warriors won in 6.5 games. Following is the distribution of the series length:
Obviously the above does not settle the debate by any stretch of imagination. As aforementioned using z-scores is just a rough way of making similar comparisons and unfortunately (or fortunately) we will never know how a game between the two teams would turn out. However, I hope it showcases how one can start thinking about comparing players/teams of different eras! Plus it is a fun way to understand (and teach) standardization.
]]>Let’s see why betting is not really different than gambling..ummm investing in the stock market. First we will briefly introduce the Kelly’s growth criteria. Kelly assumes that our goal through betting is to maximize the expected long-run percentage growth of our portfolio measured on a per gamble basis. Without getting into the details of derivation, the Kelly’s criterion determines the optimal bet fraction of your bankroll as:
where:
So basically, if you know the probability of winning the bet then being disciplined and betting a fraction of your bankroll will maximize in the long-run the expected growth of your portfolio. The Kelly criterion helps strike the right balance between risk and safety and most importantly in an easy manner. Of course, if you do not put any wager on the bet! Now of course it comes with drawbacks. For one, you need to be able to work out the real probabilities of bets. However, as I will explain later, for specific types of bets (i.e., the moneyline bets) this is no different than evaluating the calibration of your prediction model! Secondly, the Kelly criterion is inherently aggressive, since it can lead the bettor to wager even half of his total bankroll. However, one can tweak the approach and decide apriori a maximum wager on each bet (which needs to remain fixed for ALL bets), which basically means that the final amount bet would be . Let us see each of these points in more detail.
First let’s explain the moneyline bet which we will use from now on. Consider you want to bet on the Steeler’s season opener with the Browns. You might see something along the lines Steelers -260 and Browns +250. What this means is that if you take the Steelers to win you need to “risk” $260 to win $100 if Pittsburgh wins, while if you take the Browns you can win $250 if you risk $100 and Cleveland wins. Now that we got this out of the way let’s get into the investment strategy.
Real probability of winning a bet: Let us consider that you are betting on a moneyline bet. Say that your model predicts Steelers are going to win with a probability 80%. If your model is well-calibrated (i.e., the probability obtained is the true win probability for each team), then this is the probability of winning a moneyline bet on the Steelers. If you bet on the Browns, then you have a 20% probability of winning the bet. Therefore, if you trust your model — or even better if you have evaluated your model and is well-calibrated — you have the parameters and for the Kelly criterion. The only thing that remains is to calculate and . The is always 1, since when betting on the moneyline you are losing exactly what you have wagered. The depends on whether you bet on the favorite or the underdog. If the line for a favorite bet is , then your if you bet on the favorite. For a bet on the underdog with a line , your . Now you have all the parameters to calculate .
Applying Kelly’s criterion: One of the problems with Kelly criterion as alluded to above is its aggressiveness, represents the fraction of the total bankroll that Kelly suggests to bet. To make things a little less aggressive, one can set a fixed maximum amount of money to wager on a bet (e.g., $50). It is crucial for the strategy suggested here that this maximum wager per bet does not change (i.e., be disciplined to temptetations)!! So assuming a max wager per bet of , for each bet we calculate and . If $latex f_{favorite}>0$ we bet on the favorite , otherwise we pass on this bet. Similarly, if , we bet on the underdog, while if we do not bet on the underdog.
Let’s put this strategy into play — through simulations!!! I collected Vegas moneylines for the NFL seasons 2009-2015 and I used our well-calibrated football prediction matchup (FPM) model. I will not get into the details of the model, but in brief it is a combination of bootstrap and a logistic regression model, while details can be found here. Using the above strategy the following figure presents the Return on Investment (RoI) for each season. RoI is defined as:
The blue line presents the return on $1 of wager for the each season separately, while the red line is the cumulative/rolling return since 2009 and up to the corresponding season. Overall, through these 7 seasons you would have won 25 cents on every $1 you wager.
Furthermore, the average wager on a favorite was , while the average wager on an underdog was . Finally, the average win multipliers were: and .
What about the stock market? A 25% return on investment certainly sounds better than the 7% long-term return to the stock market. There are many similarities on the way the two markets operate but also some important differences that might make specific aspects of sports betting more appealing to some. In both markets the goal is to outsmart some one else (and in both cases we need to be disciplined). In the case of betting we try to outsmart the bookmaker (or your friend if you are placing a friendly wager), while in the case of the stock market we are trying to outsmart another person that will buy from (sell to) us stocks of company X. In the case of (moneyline) betting as explained above outsmarting means making better probabilistic predictions compared to the bookmaker, while for the stock market it means making better predictions on whether the price of a stock will increase or not. Both seem very similar tasks in principles but I have never tried to predict the stock market and there are very good reasons for this (with the main one that I obviously might not be that smart). However, I can guarantee you that a simple model like the one I presented for the NFL prediction will do poorly in predicting the movement of the stock market. The main difference between the two prediction tasks, is that sports games are kind of a closed system, i.e., the only variables that really matter are the players and their performance, while the stock market is a wide-open system where an event 4000 miles away might impct the stock market movement. Simply put, everything that affects the outcome of a game can be measured to some degree, while the price of a stock can be affected by any possible news – even if its fake news. Most glaringly, even for anticipated news we seem to have as good of a grasp as Malkiel’s blindfolded monkey. For example, after news like Brexit and the latest US presidential elections, pundits made sure to inform us about the upcoming crash in the stock market, which we by now know how it ended up. Of course, all of this does not mean that you cannot do well in the stock market; certainly you can and there are indeed experts that have done well fairly consistently, but from a purely predictability point of view, I think it should be obvious that predicting sports outcomes might be easier. And while we are on this topic, Lopez, Matthews and Baumer, have put together a nice analysis on the predictabiltiy and luck for different sports.
What does this all mean? This does not guarantee you gains! Let’s be clear about this. But it shows that if you are disciplined you can have a good return. Do you need to be a math wiz to predict games? No!! But you have to have a good understanding of what probability means (and this is another benefit of legalizing betting; people might understand probabilities better when they lose approximately 25% of the time on a -300 bet!). You could potentially use some of the prediction models that are out there (e.g., FiveThirtyEight’s which seems to be also well-calibrated). If you decide to go down Kelly’s road you should also remember that you need to be patient! You might lose more bets than you win! But this is not what ultimately defines return on investement. Kelly’s formula tries to find the optimal wager given your (or your model’s) belief on a game and the implied belief of the bookmaker from the moneyline. Also think about it — if you bet on a 20% underdog you will lose more than win (in terms of number of bets), but if your probabilities are correct, you will win more money than you lose – again if you are disciplined(!), and this discipline involves avoiding any sort of biases (e.g., for your favorite team).
And just to be clear; I am neither endorsing gambling, nor I say you should use the approach I described I am just saying that there is no reason to believe that betting on sports is different than investing in the stockmarket! I am very interested to see how this progresses.
]]>In order to explore our hypothesis we used play-by-play data from the 2014-2016 NFL seasons. In particular, for every game in our dataset we calculated for each team the utilization of passing as the fraction of passing plays over the total number of its offensive snaps. We further calculated the average expected points added for the passing plays for each team and each game. We have adjusted the expected points for strength of defense. The following figure presents the results (binned for better visualization), where to reiterate passing efficiency is the expected points added per passing play. As we can see there is a declining trend for the passing efficiency as we increase its utilization.
The correlation coefficient is . These results, while they account for quality of passing defense, they do not account for the quality of the rushing game as well as the overall passing ability of the team that can impact the results. Therefore, we build a regression model where the independent variable is the average expected points added per passing play (adjusted for defense) within a game, while the dependent variables include:
The table above presents our results where we can see that the utilization is still negatively correlated with the expected points added per passing play. The interaction term also shows that this correlation depends on the rushing ability of the offense. In particular, the effect of passing utilization on its efficiency is , namely, if the offense runs the ball better the negative relationship between and is less strong. In particular, with (the maximum observed value in our dataset), the corresponding coefficient is -0.42 — compared with a coefficient of -1.33 for the minimum value of in our dataset, i.e., -0.73.
So how much should a team run? Obviously the question depends on many factors but it should be evident that calling passing plays all the time is going to have diminishing returns. While the passing efficiency might still be greater than that of rushing even when , this does not mean that it is the best the team can do. What we a team is interested is maximizing the efficiency on a per-play basis regardless of the type of play, i.e.,
The following figure presents the passing utilization that maximizes the above equation for different values of and . As one might have expected for teams with better passing rating a higher utilization is recommended for fixed rushing ability, while better running game reduces the optimal passing utilization. Note that a rushing EPA higher than 0.3 per play per game is rather unrealistic, and so is having . For the average rushing EPA (marked with the vertical line), the optimal fraction of passing plays is 0.3, 0.47 and 0.63, for a bad, average and great passing offense respectively.
I’d like to note here that the results of the analysis are not and should not be treated as causal, that is, running more does not necessarily cause passing to be more efficient. It might as well be the case that teams that are trailing in the score turn to more passing and this bias the results. In some of my past analysis I have explored the possibility of similar reverse causality and there are not strong indicators for it. Furthermore, we have treated rushing as being constant regardless of its utilization. While rushing skill curves are weaker as compared to passing the final results will quantitatively (not qualitatively) change. However, it should be evident that there is a clear interaction between passing efficiency and utilization that makes rushing still a piece of the puzzle in the NFL.
]]>
We have also included the following interaction terms:
We have used play-by-play data to build this model from the 2014-15 and 2015-16 NBA seasons and following is the reliability curve we obtained.
As we can see the obtained reliability curve is practically the y=x curve, which means that our in-game win probabilities are well-calibrated.
We can then use this win-probability model to calculate the win probability added (WPA) for a lineup during every stint it played and then calculate its win probability aded efficiency (i.e., win probability added per 100 possessions) as: . After obtaining these raw win probability efficiencies we can use the Bayesian average to adjust the ratings in such a way that accounts for the different number of observations (i.e., possessions) we have for each team: , where is the (league) average total win probability added and the average number of observations for all the lineups. We have essentially considered that our prior belief for lineups we have never seen is that of a league average lineup. The more observations we obtain for lineup l, they will eventually outweight our prior belief and when .
But what is the reason for doing all of this? Does this win probability added indeed provide us with a different view of the lineups? To examine that we trained a win probability model using the play-by-play data from the 2014-15 NBA season and calculated the WPA efficiency and the net efficiency rankings for every lineup in the 2015-16 season. We then calculated the ranking correlation between the two rankings obtained from the different metrics. The correlation coefficient is 0.71 and there are two observatins to be made:
The two metrics agree to a large extend with regards to which lineups add value to a team. This is important since the net efficiency rating is in general considered a good lineup evaluation metric.
You might be wondering “Then if this WPA efficiency for 5-men lineup is so great then why isn’t used more?”. The major drawback of this approach is the fact that it is dependent on the in-game win probability model used, and therefore, different models might provide different ratings. In contrast, the efficiency ratings are purely based on points scored and allowed, which is unambiguous. Of course, this is not something that hasn’t happened before in sports analytics. For example, (and full disclaimer I am not a big baseball fan and hence, I am not very well aware of the analytics literature) in baseball there are many different versions of wins above replacement! What is important is the high level idea; how one develops it further and implements it is secondary – at least at this stage – but certainly is critical for the success of the “metric”.
]]>The ratings we got after the end of the regular season are:
Using these ratings we get the following projections (they will be updated after every game):
As we can see CSKA opens as the bit favorite followed by Real Madrid and the reign champion Fener. Many fans (at least in Greece) are looking forward to a Greek final, but the chances of this happening are only about 1.5%.
After the first two games of the series two teams have made the break, Real and Zalgiris. On the other hand CSKA and Fener have made a strong case for being present at the final 4 once more, while the chance of a Greek final is still around 1.5%.
Final Four
We are only days away from the final four and here are our updated projections for the semifinals and the finals:
Big Final
CSKA once more defied the probabilities and … did not win the title, so we are headed for a very balanced final between Fener and Real Madrid. Will Obradovic win his 10th title, or Real win her 10th title? Numbers do not help since they do not give any favorite (in fact Fener is a slight favorite with 50.5% win probability if this is considered a favorite). We are in for a great final!
]]>These ratings are obtained through a regression model (a simplified version of it can be found here) and as we can see Monaco and Tenerife are the top-2 teams with Riesen Ludwigsburg following at the third place. Using these ratings we simulate the playoffs 10,000 times to make our predictions for the final ultimate winner. For the final 4 pairings, we draw them randomly in every simulation (since after all the same will happen when the 4 finalists are known!). Following are our initial projections:
It is not a suprise that Monaco and Tenerife are the favorites, while the cluster of Nymburk, Strasbourg, Neptunas and AEK that are matching up together at the top-16 and top-8 is the most balanced one in terms of probability of making it to the playoffs!
With the first leg of the top-16 round coming up following is the distribution of the expected point differential for each of these games:
Each colored area gives the win probability for the corresponding team (or else positive values for point differential correspond to home team win, while negative values correspond to visiting team win). As we see the two games that are expected to be the most balanced are PAOK-Pinar and Bayreuth-Basiktas.
With the first leg of the top-16 round over, following are the updates in our predictions. As you will see there are some pretty big swings in favor of Banvit, Nymburk and Riesen Ludwigsburg.
Taking the results of the top-16 leg 1 into account following are the projections for each of the upcoming rounds:
Tenerife is still at the top of the race, while Ludwigsburg has emerged as the second favorite, while Monaco’s poor outing (compared to the expectations) at leg 1 dropped their chances for the trophy to 10%.
Quarter finals
The first round of this year’s playoffs involved two major upsets (based on the results of the first leg and the team ratings). The first BCL champions, and the favorite for the back-to-back championship, Tenerife got disqualified after a sensetional performance by Murcia! Before leg 2, Murcia had a mere 1% chance of advancing! At the same time AEK with the (practically) buzzer beater from Punter covered in Nymburk the 10 point deficit from leg 1 in Athens and qualified in the top-8 after having just about 12% chance of doing so! Quarter finals are here and following are the point differential predictions for leg 1 and the updated probabilities for the rest of the tournament. AS Monaco is back at the top of its game and our ratings. However, don’t count out anyone just yet! An exciting round of quarter finals is ahead!
After the first leg AS Monaco remains the favorite for the title but there have been some favorites emerging for participating in the final four. After leg 1 of the quarter finals here are the updated odds.
Final 4
The final 4 is here and after the pairings’ draw here are the projections:
AS Monaco is the big favorite but AEK – playing in front of its fans – has good chances to advance to the final and challenge the French team.
Final
Our predictions for the semi-finals were correct and the final is going to be between AEK and AS Monaco. Monaco is the heavy favorite with a 74% win probability but never count out AEK in front of its fans (and I hope I am wrong — just to reveal my personal preferences if you have not figured them out yet :)).
As you all might know by now AEK defied the odds and whon its 3rd European title. Closing out these predictions here is the in-game win-probability from the big final:
Looking forward to a new season!
]]>As you can see we can easily get the offensive and defensive rating (efficiency) for the lineup, as well as its net rating (simply the difference between the offensive and defensive rating). We also know the minutes played by the lineup and its pace. Using these two we can get an approximation for the number of possessions that the lineup played. The pace value is the number of possessions per 48 minutes for the lineup and therefore the specific lineup shown above played a total of (335/48)*96.5 = 673.5 possessions. When we want to compare two lineups we can check the ratings provided on the NBA’s website and simply see which lineup has higher (lower) offensive (defensive) rating. Right?
Well, not so fast! There are lineups like the one above that have played more than 600 possessions, while there are lineups that have played less than 10 possessions (e.g., Irving, Larkin, Morris, Rozier and Theis have played a whopping 3 possessions!). How confident are we that the lineup ratings we have obtained, are indeed their true ratings, especially for lineups that have played few possessions? We could calculate a probability that lineup A is better than lineup B by making an assumption for (or learning through data) the distribution of the actual performance of a lineup . For example, Wayne Winston in his book Mathletics indicates that when it comes to a lineup’s +/- rating, the actual performance of the lineup over 48 minutes is normally distributed with a mean equal to the lineups +/- rating and a standard deviation of points. Therefore, a lineup that has played a few only minutes will be associated with a high variance and we will be able to further calculate the probability that this lineup is better than another lineup of the team (for which we can also model its performance through a similar normal distribution). However, even if this probabilistic analysis were to be the most accurate representation of reality, when you are presenting your analysis to the coaching staff you should have a simple (yet concrete) message. Probabilistic comparisons of lineups are great but too cumbersome to digest, especially if you are not trained in probabilities and statistics. So is there a way that we can adjust the lineup ratings to account for the fact that different lineups have played many more or less possessions and hence, their true efficiency might be different than the one reported on the NBA’s website (or the one you calculated on your own from the play-by-play data)? Luckily the answer is yes!
In order to achieve our goal we will make use of the notion of Bayesian average. The idea behind the Bayesian average is that when we have a small number of observations (possessions in our case) for an object of interest (lineup in our case), the simple average can provide us with a distorted view. Consider the case of the lineup mentioned above with 3 (offensive) possessions observed. In this situation, all three possessions can easily end up in a made 3 point shot, which will lead to an offensive efficiency of 300 (points/100 posessions). However, it is also very possible that all of the 3 possessions end up with a missed shot, a turnover etc., leading to an offensive efficiency of 0! Simply put, when we have few observations it is very likely to obtain extreme values just by chance. So here is where the Bayesian approach comes into play. In the case of probability estimates, obtaining new evidence allows us to use Bayes theorem and update a prior belief we had for an event:
What does this have to do with our lineup ratings? Well we can adjust the ratings based on some prior belief we have for them. In our case this prior belief can be the team weighted average efficiency of a lineup (or the league weighted average efficiency of a lineups). In particular, considering the team weighted average, the Bayesian adjusted efficiency of lineup i is:
Essentially for every new lineup we begin with a prior belief that this is a (team/league) average lineup. Then every time we obtain a new observation (i.e., a new possession) we can update our rating for the lineup. It should be evident that as we accumulate enough observations for a lineup (i.e., is large compared to ), the impact of our prior belief gets smaller and smaller. For example, while the Bayesian adjusted rating of the lineup in the above figure is 111.6 (practically equal to its “raw” rating of 111.9), for a lineup with fewer observations there can be significant differences. For instance, the Celtics lineup Baynes, Brown, Ojeleye, Rozier and Smart have played 33 possessions with a raw offensive rating of 60.5. However, the Bayesian adjusted rating of this lineup is 78.1, since we have considered a prior based on 24 possessions on average for each Boston lineup and a 102.6 offensive efficiency. The following figure presents the raw and Bayesian-adjusted efficiency ratings for all the Celtics’ lineups. The size of each point corresponds to the number of possessions observed for every lineup. As we can see for lineups with many observations the two ratings have a good correlation. In fact, there is a negative correlation (-0.25, p-value < 0.001) between the absolute difference of the two ratings for a lineup and the number of possessions observed for the lineups, i.e., the fewer the observations the larger the adjustment.
Furthermore, the Spearman ranking correlation between the two ratings is 0.83, which means that while there is a good relationship between the two ratings, there are differences in the rankings that they provide.
As it should be evident one can do the same with defensive and net efficiency ratings. I hope we will start seeing these Bayesian adjustments in mainstream statistics.
The initial projections are given in the following:
Patriots and Steelers (closely followed by the Chiefs) are the AFC favorites, while the Vikings are the NFC top favorite (despite the Eagles being the top-seed). The most possible matchup at the moment is Patriots-Vikings, which appeared in almost 10% of the simulations, while the probability of having a Pennsylvania Super Bowl (Steelers-Eagles) is around 7.5%.
1/7/2018
Wild card round is over, and Bills, Panthers, Chiefs and Rams are eliminated from contention. The latest projections are as follows:
1/16/2018
Divisional round is over and 4 teams remain in contention:
1/22/2018
Super bowl matchup is set and the Patriots are opening as the favorite:
]]>You can observe a few differences (e.g., AEK is taking many more shots from the restricted area, while PAOK is taking more shots from the paint), but overall there is little that you can say with regards to “how much” different the shooting tendencies of teams (players) are. One could possibly provide information about the exact locations of shots and obtain some type of shot density and compare. This is certainly possible but cumbersome. On the other hand, the fact that the shooting charts are mainly similar, it might mean that there are underline patterns that all of the teams (players) follow to a different extend. If we could identify these patterns (some type of shooting dictionary) we could then describe a team/player through these patterns.
One way to identify similar latent patterns in data that can be represented through a matrix S is matrix factorization. With matrix factorization one tries to express the original data matrix as a product of two (or more) factor matrices (e.g., WH). These factors include latent patterns of the original data. There are various techniques to perform this task, but for our case we will focus on Non-negative Matrix Factorization. In our case our data can be represented through a matrix S whose columns represent locations on the court and the columns represent teams (or players). However, what is court location? One could use the actual x,y coordinates by overlaying a grid over the court and obtaining the counts of shots in every grid cell. However, this would give fairly sparse and noisy results in our case where we have a fairly small number of games for each team/player. Another approach is to use as location the 12 court zones (Restricted area, paint, midrange slot, etc.) and then the element (i,j) of the matrix S represents the number of shots taken by team (player) i from court zone j.
Using S as our data (shot) matrix, Non-negative Matrix Factorization (NMF) aims at identifying matrices W and H such that:
where dist(S,WH) represents a distance metric between the original data matrix S and the product of the factor matrices WH. Furthermore, we constrain the factor matrices to be non-negative. This allows for easier interpretation of the results since the data in the original matrix are also non-negative. With regards to the distance metric, we have used the Frobenius norm of the matrices.
The next step is to decide on the number of patterns we want to find. This essentially corresponds to the number of rows for matrix H (which is equal to the number of columns of matrix W). Choosing the number of patterns is not trivial and is essentially very similar to the problem of choosing the number of clusters in a clustering problem. One approach to choose the number of pattern is by examining how good the approximation of S is with the factor matrices product. However, the approximation is monotonically increasing as we increase the number of patterns and hence there is a tradeoff between finding trivial patterns and approximation quality. This is the same as the problem of bias and variance; obtaining a large number of patterns essentially provides us with an overfitted model where practically every pattern represents a team/player. For our purposes we have used a number of patterns k = 7, since it provides a good tradeoff between approximation and interpretability (non-overfitting). Figure 7 presents the quality of approximation for the case of player matrix as a function of the number of patterns k (the results for the teams’ matrix are similar).
Every pattern (i.e., a row of matrix H) is essentially a 12-dimensional vector, each element of which correspond to one of the court zones. The value of the element further captures the strength of the corresponding court zone in the pattern. For example, one of the patterns identified from the players’ matrix is the following:
Simply put this pattern includes shots mainly from the restricted area and a few from the paint. Once these patterns are identified, the other factor matrix W can be used to obtain the coordinates of a player/team with regards to the basis of the patterns identified. This essentially allows us to express a player/team as a linear combination of these shooting latent patterns.
The following figure presents the 7 player patterns identified and the corresponding coefficients for some of the players (the coefficients for all the players can be found here). The coefficients are proportional to the number of shots a player takes. For example, Manny Harris appears to be getting the majority of his shots from pattern 1 (midrange shots), while Abromaitis is a corner 3 (pattern 2) and restricted area (pattern 3) shooter.
Each one of these patterns is also associated with an expected point per shot. In order to identify this we need:
Then the expected point per shots for each pattern r is:
The following table shows the expected points per pattern, where as we can see patterns 3 and 6 are the most efficient ones as one might have expected, since they include shots from the restricted area and the (left) corner.
We performed the same analysis for the 32 team and the patterns we identified are presented in the following together with some of the coefficients for select teams (all the coefficients can be found here).
Furthermore, the points per pattern for the team patterns are:
The above present a fairly simple application of matrix factorization in basketball. It provides a better understanding of the offensive/shooting tendencies of teams. Even more insights can be obtained if two matrices are analyzed, namely, one for made shots and one for missed shots. In this case, we can really identify potential inefficiencies of upcoming opponents. For instance, we can identify specific shooting patterns that a team is not successful at and force them to take those.
With regards to the factorization itself, the KL divergence usually works better as compared to the Frobenius norm that we have used in our analysis. Furthermore, one can overlay a grid (e.g., 1×1 meters) and use the grid cells as the locations. This will provide a very long matrix, but the NMF will essentially reduce the dimensionality. However, in this case there can be more noise as compared to when using the court zones and in this case it is better to apply NMF over an intensity surface instead of the raw counts.
ACK: I would like to thank Basketball Champions League for providing me access to the data.
]]>Using data for all the shots taken from the 32 teams in BCL we calculate the expected point per shot for each team and for each are of the court. The court is divided in 13 areas as shown in the figure below.
Given the field goal percentage of a team from each area of the court, we can easily compute the points per shot that the team is expected to add whenever taking a shot from that area. For example, if the field goal percentage of our team from the paint is 55%, the expected points from a shot taken from the pain are 0.55*2 = 1.1. The following figure shows the expected points per shot from each area on the court when considering all the shots taken from all the teams in BCL.
Similar to the NBA, the most efficient shots are the ones taken from the restricted area (under the basket), followed by all the three point shots. The most inneficient shots are the ones from midrange. Now, among the three point shots, the most efficient ones are the corner threes! An idea that has been thrown around for the higher efficiency of corner threes has been the shorter distance to the hoop. In particular, in the NBA, the corner threes are at a 22 feet (6.7m), while above the break the distance is 23.75 feet (7.24m). This is a fairly large difference. In contrast, the three point line in FIBA competitions varies much less, from 22.15 feet (6.24m) above the break to 21.65 feet (6.60m) at the corners. If distance was a big factor for the increased efficiency of the corner threes in the NBA, this difference in the efficiency should not be as much pronounced in FIBA competitions. This setting is exactly the setting of a natural experiment. The fact that in FIBA competitions the corner threes are still more efficient as compared to threes above the break, allow us to reject the hypothesis that the shorter distance at the corners is responsible for the majority of the efficiency difference. On the other hand, in both the NBA and FIBA contests, the fraction of corner threes that are assisted is much higher compared to the threes above the break. The actual fractions differ significantly between the NBA and FIBA competitions , but this is mainly an artifact of how assists are counted in the different competitions. The following table presents the fraction of shots from each three-point area in BCL.
Area | Fraction of assisted shots |
Top of the key | 20% |
Left wing | 27% |
Right wing | 27% |
Left corner | 39% |
Right corner | 35% |
Overeall, above the break threes are assisted at a rate 25.4%, while corner threes are assisted at a rate 37.3%. A two-proportion z-test further allows us to reject a null hypothesis that the two rates are equal (p-value < 0.001). Given that assisted shots tend to be of higher quality (e.g., they tend to be open more frequent that not), the fact that corner threes are more efficient than above the break threes is not surprising. Of course, the question is why corner threes are more assisted, but this is a topic deserving its own in depth study and analysis, which we are currently performing.
Using the notion of expected points per shot we can have a quantitative way to evaluate the efficiency of a team. An efficient team will make shot choices that lead to larger expected points per shot. The above figure presents the league average expecte points per shot, which means that a team with similar efficiency as the league average, should attempt to take more shots under the basket and corner threes. However, this is not always necessarily true for all teams. Teams might not have the right personel for creating corner three opportunities. Similarly it seems inadvisible for a team to take may midrange shots, but there are several players (many of whom are future hall-of-famers – Dirk Nowitzki, Chris Paul etc.) who have made a career based on their efficiency from the midrange. Therefore, in order to estimate the expected points per shot for each team, we cannot use the league average numbers for the expected points per shot, but rather quantify these variables for each team individually. In particular, with f(z,t) being the FG% from area z for team t, n(z,t) being the number of shots team t has taken from area z and p(z) being the number of points awarded from zone z, the expected points xPTS[t] per shot for team t are:
The following figure presents the z-score for the expected points per shot for all 32 teams. As we can see the top-3 teams in terms of offensive efficiency (as captured by expected points per shot) are all from group A (having the first 3 spots in the group) !! Pinar Karsiyaka, EWE Baskets Oldenburgh and AS Monaco!
However offense is only half of the game. Defense can impact the efficiency of a team by either reducing the FG% from the different court zones, or by forcing the offense to take shots from court zones with low FG%. The following figure exhibits the z-score of the defensive expected points per shot for each team, that is the expected points of the shots allowed from the defense. Therefore the lower the better, i.e., the team allows less points per shot than an average team.
As we can see AS Monaco and Tenerife have the most efficient defense until now in the competition allowing more than two standard deviations less points compared to an average team. On the contrary EWE Basketbs Oldenburg allows more than 2 standard deviations more points compared to an average team.
Note that the above are raw numbers, i.e., they do not adjust for who a team faces. For example, scoring 1 point per shot against EWE Baskets Oldenburg is not as good as scoring 1 point per shot against AS Monaco. In order to adjust the xPTS (both offensive and defensive) for each team we solve the following optimization problem:
where x is a vector with the xPTS ratings for every team (offensive and defensive respectively), h is the home edge with regards to xPTS and m is the league average xPTS. Every game i will essentially provide us with two data points for the above optimization objective. Solving the above optimization problem we obtain a home edge h= 0.014 points/shot, and a league average xPTS of m = 1.038. The following table provides the results.
Note that for the defensive ratings a negative value is better (i.e., the team allows less points per shot than average). One could also get ratings for specific court zones. However, given the small number of games to begin with and the even sparser data with regards to shots in specific areas for specific games, we might not be able to provide a robust solution to the above optimization.
You can explore the (currently only offensive) efficiency of the BCL teams in the interactive app here.
Acknowledgments: I would like to thank Basketball Champions League for providing me with access to the data.
]]>