For basketball (and other sports in general) being a game, game theory has not been utilized enough – imo – to model interactions between teams, players, lineups etc. Given our objective, and the fact that teammates cooperate (at least we would hope they do) to achieve their goal of winning a game it is natural to look at tools used in cooperative game theory. An important concept – Nobel-important concept – in cooperative games is the **Shapley value**. Shapley values are a solution concept in such games. In brief, it assigns the surplus generated by a coallition of players, according to the contribution of each player to this total surplus. But wait, this sounds pretty much what we are trying to do for the players of a team in a single basketball game!

Let’s start with some definitions that will be useful for mapping our problem and data to the concept of Shapley values. In a coallition game we have a set of players and a *characteristic function *, that is defined over the power-set of ; . The characteristic function captures the *value* of a coallition , and it essentially represents the expected sum of payoffs from the cooperation of the players in . The Shapley value of a player then distributes the total *gain *of the game to the players, in a *fair *manner. Fairness here is related to specific mathematical properties satisfied by the solution. While this is interesting by itself, it is out of the scope of this post, but you can read more about this and Shapley values in general here). However, just for completeness, here is the *credit *each player is allocated according to the Shapley value:

Let’s now get back to our specific problem that we want to solve using the concept of Shapley values. In our case we have a team of basketball players, and the coach decides during the game which 5-player coallitions to play. For each one of those we also observe their value, which is the net rating the coallition (lineup) had during the game. Small sample sizes make its Bayesian adjustment more appropriate. So far so good, but from the definition of the characteristic function above, we can see that we need to define , for each possible subset of the players in a team (those that played in the game). For coalitions with fewer than 5 players this is straightforward, and we can essentially obtain the net ratings for 2, 3, and 4-men lineups (and the on-off ratings for single players). However, the question is how to define the value of coalitions with more than 5 players. For this case we can aggregate the results from any lineup consisting exclusively of players in the coalition at hand. For example, if we want to estimate the value of the characteristic function for a coalition of 7 players, we will calculate the average net rating for any 5-man lineup played consisting only of players from this set of 7 players.

Let’s see an example here. Of course, I will choose one of Pitt’s games and let’s go with the first conference game of the season against Miami. Despite the fact that Miami had only 6 or 7 scholarship players available for the game, they fought well (partially due to early foul troubles for Champagnie and Johnson) before Pitt eventually got away in the latter part of the second half with a 70-55 win. Using the excellent R package from Jake Flancer, bigballR, I was able to obtain the lineup information needed to calculate the **Shapley Player Value (SPV)** for that game (code is available here).

Player | SPV |

Abdul Karim Coulibaly | +9.6 |

Audiese Toney | +13.8 |

Femi Odukale | +3.6 |

John Hugley | -3.6 |

Ithiel Horton | +3.9 |

Nike Sibande | -4.0 |

Terrell Brown | -4.5 |

Noah Collier | +5.5 |

Justin Champagnie | +5.5 |

William Jeffress | -1.9 |

Onyebuchi Ezeakudo | (-13.2) |

Xavier Johnson | +6.1 |

These SPV values essentially consider all the different lineups that played for Pitt in that game and how they performed, and assigned a net rating to each player. According to SPV, Toney was the game MVP for Pitt (which also passes my eye test for that game). Coulibaly also played nice in that game and made contributions that would not be captured by bean counting his points and rebounds and had an SPV of +9.6. Now this is not to say that SPV is better than any other single-game metric. Far from it. How would even one go about showing this? SPV is just a “credit allocation” approach and there is no ground truth (if there was we would not need SPV or any other metric). However, I believe it looks at the contribution of players during a game more hollistically compared to other single game player ratings. Furthermore, Shapley values are able to deal with colinearity fairly well.

Some limitations of SPV is that it does not consider who was on the court from the opposing team when a specific player “coalition” was on the court. However, I would argue that for a single game – and for a descriptive metric – this might not be a big problem (if at all). At the end of the day, SPV allocates credit to players according to who contributed to the team for that specific game. If a player faced the second unit of the opponent but made significant contributions towards winning that specific game, so be it. SPV’s goal is not to predict the future performance of this player. For that, regularized adjusted plus/minus approaches are clearly better. However, there is an interesting question here on whether a similar approach can be used for estimating player ratings when we only have information about the lineups of a team (e.g., from the NBA stats website) and not which opponents they faced – and hence, we cannot run the +/- regression. One of the challenges is that the computation becomes intractable when more than 10-12 players are involved. For a single game this is rarely the case, but for lineups over a season the number of players involved can easily go way over this. Of course, time/possession limits can be applied and players with few minutes can be grouped together, but certainly more work is needed if one wants to apply the concept of Shapley values to season-long player ratings.

]]>To look into that we will start with a regression-based rating system for teams. As I described here one way to identify team ratings is through a linear regression, where the independent variables are “dummy” variables corresponding to teams and the dependent variable is the final point margin. When you learn this regression, the constant term corresponds to the home court advantage. Using the OLS is the “frequentists” approach, and even though I am not taking sides here (supposedly), let’s look at the Bayesian approach to linear regression.

In this case the dependent variable is a sample from a distribution . The mean of the distribution is the inner product of the coefficients to be learnt and the independent variables, while the standard deviation is also to be learned. The model coefficients are considered to be drawn from a distribution as well, which essentially allows us to get an estimate of their uncertainty. The goal of a Bayesian linear regression is to obtain the posterior distribution for the model parameters given the data at hand, . Using the Bayes rule we have: . The prior can incorporate domain knowledge. Alternatively it can be a non-informative prior, e.g., a uniform distribution. These models are estimated through Monte Carlo Markov Chain methods and a great library to do that in python is ``pyMC3`

`. You can download the code and data for the Bayesian linear regression here.

We start with the situation pre-covid and build the regression for identifying the ratings and the HCA. We use 10 chains of 2000 sample each. We also have input the following priors for the model parameters:

- Team ratings: normal distribution with average 0 and standard deviation 5
- HCA: uniform distribution from -5 to 5
- Model standard deviation : half Cauchy with beta equal to 10.

The output for every coefficient is the posterior distribution aforementioned. The following is the posterior distribution for the HCA during the regular season:

The expected value of the posterior distribution is 2.3, while the 3% credible interval of the distribution is [1.5, 3.0]. Essentially, a home team is expected to get about 2.3 points for its home edge (traditionally attributed to travel/rest and refereeing bias). Moving to the situation in the bubble, we used the same prior for the HCA and the model variance, but we used as priors for the team ratings the posteriors identified by the previous regression. This leads to the following results for the posterior distribution of the HCA in the bubble:

As we can see the average of the posterior is now smaller than 1 point, while the credible interval spans around both sides of 0, i.e., [-1.1, 2.9]. This essentially means that there seems to be some small effect remaining but it is rather small and not robust based on the credible interval of the posterior. What is this small (non-robust) effect? Many things are possible (with the most plausible explanation is a statistical “anomaly” from small sample size – compared to pre-covid data). Sure, there is no travel, but there are still referees. Now it is not clear how they would be biased based on some logos on the floor or some cyber-fans, but I looked into the data from the L2M, courtesy of a Hawks’ fan. In particular, I looked at the probability of an incorrect decision (call or no-call) based on whether the side disadvantaged from the call was the “home” or “visiting” team. In the case of the home team, this probability is 4.6%, while for the visiting team this is 6.2% (the following 7 lines of code – in R for inclusivity – will get you these results). The p-value for the difference is 0.11, so it is “marginally robust” (you can make your own interpretations), which seems to be in agreement with the home court advantage results (maybe a little bit, but not robust).

l2m <- read.csv("https://raw.githubusercontent.com/atlhawksfanatic/L2M/master/1-tidy/L2M/L2M.csv") l2m_bubble <- l2m[which(as.character(l2m$date) > "2020-07-01"),] l2m_bubble$incorrect = rep(0,dim(l2m_bubble)[1]) l2m_bubble[which(l2m_bubble$decision == "INC" | l2m_bubble$decision == "IC"),]$incorrect = 1 mod = (glm(incorrect~disadvantaged_side,data=l2m_bubble, family="binomial")) # probability of incorrect call when home team disadvantaged predict(mod,data.frame(disadvantaged_side = "home"),type = "response")[[1]] # probability of incorrect call when away team disadvantaged predict(mod,data.frame(disadvantaged_side = "away"),type = "response")[[1]]

**Making Predictions**

Now let’s see how we can use this model to make predictions. Let’s say we want to see how Denver is going to fare in LA (virtual LA) against the Lakers. We can get an estimate for the distribution of the expected points margin, which will be: , i.e., a normal distribution with mean 4.9 points and standard deviation 13.4 (the parameter of the model). This gives a win probability of approximately 64% for the Lakers in their “home court” and 61% when they cross half court to go to the high mile bench (i.e., practically the same probability). One can also sample the posterior distributions for each of the variables invovled and make the prediction this way — but as one might expect these two will much given enough samples.

]]>In the WSJ article the *importance *of a challenge is measured through the win probability added from it. Win probability added is a natural choice of course, but are there any possible shortcomings using it?

First let’s compare the two objectives of maximizing win probability – vs – maximizing expected points. During the beginning of a game, there is more uncertainty with regards to each outcome, with win probabilities being *closer *to 50% (or in general the pre-game probabilities that are based on team strength). As a result, any given possession will not swing the win probability by a large percentage. During the end of the game, the uncertainty is smaller, which also means that there is smaller uncertainty on how a given possession impacts the win probability. For example, an offensive foul in the first possession of the game, most probably will not change the win probability at all, since there is high uncertainty during that time of the game. However, the same charge in the last 5 seconds of the game being down 1 point, most probably will have a swing in the win probability of about 20-30%. This is an artifact of the finite duration of the game.

Depending on which view one takes, the corresponding on-court strategy might be different as well. For example, early in the game maximizing the win probability or maximizing the expected points, most probably will yield similar decisions. In the extremely simplified version of decision making, a corner three at the first possession of the game is a *better *option compared to a long-2 both in terms of win probability and expected points. However, at the end of a tied game these two objectives diverge. A long-2 is the better option in terms of maximizing win probability, while a corner three still provides the best choice in terms of expected points. So it should be clear that game clock clearly impacts the leverage of each call in terms of win probability. However, in terms of expected points the value of each call is not impacted by the game clock. A challenge that is worth 1.5 expected points, will be so either if it is during the first possession or the last one of the game.

At the same, a point scored in the first possession of the game counts the same (i.e., 1 point) as one scored at the end of the game. Their (win probability) levarege **at the time being scored **will be different – just as explained above. However, their contribution to the final points of the game is the same (1 point). So in close games – where challenges have the potential to make a difference – a point scored or saved in the first quarter is equally important as one in the last 2 minutes. For example, the WSJ article ranks the challenge used by POR at the game against DAL as the one with the largest swing at win probability (which is true). During that challenge, the referees had called a shooting foul against POR with 9 seconds left in the game. DAL was down 1 point and the challenge was worth approximately 1.5 expected points (Finney-Smith is about a 75% FT shooter) if won (without considering the ensuing jump-ball result if the challenge was succcesful). However, if POR had used their challenge earlier in the game for a situation that was at least worth 1.5 expected points, they would be 9 seconds before the end of the game up by 2.5 (expected) points, which would give **at that point** a very similar win probability as the one after the reversal of the call challenged. Note that this is not any type of hindsight bias — 2 points scored in the last 10 seconds give you the same lead as if these points were scored during the first half. Of course this makes the assumption that the game would have proceeded in the same way/path if a challenge was called earlier in the game. However, given how small of a change any given basket provides in the overall score, this assumption might not be too much of a stretch (of course cases where a star player is charged with early foul troubles etc. might be different, but these are also things that are hard to be included in the expected value from a challenge).

So overall, the fact that the highest win-probability added comes from late game challenges is to be expected given the high leverage in terms of win probability that late game situations have. However, looking the issue from an expected points lense, might also give us more insight on how one can approach this strategically.

The coach challenge fits well under problems involving the **optimal stopping theory. **The most popular example – and very relevant to our setting – is the *secretary problem. *You have one secretary position to fill and start interviewing (known before hand) canditates for the position. Candidates come in *random order* (in terms of their skills/quality). We have to make a decision on whether to hire the candidate immediately after the interview. Once we reject a candidate we cannot call him/her back again, while once a hire is made the process stops and no more candidates are interviewed. The question is when should we make a hire. This is similar to the coaching challenge problem (with some important differences we will discuss later); we get a series of calls that the coach can challenge (hire), and once you make a challenge no other challenges (hires) can be made.

In the case of the secretary problem, the optimal strategy is the following stopping rule: interview the first candidates without hiring anyone. Assume also that the best candidate among them has a “*skill rating” of . *Then interview the rest of the candidates and hire the first that has a “skill rating” better than (or the last candidate if this never happens). This simple rule selects the best candidate with a probability approximately 37% – in other words, the best candidate is chosen in 37% of the times.

Now the coach challenge problem has a few differences. For one, if we use the win probability as our metric of qualifying calls (candidates), then the order with which these calls arrive is not random. Based on the above discussion, one can argue that earlier calls might hold less value in terms of win probability compared to later calls (e.g., last 5 minutes). However, in terms of expected points the order can be assumed to be random. The expected points of a challenge needs to consider the probability of the call being wrong (which might not be trivial in subjective calls — e.g., block – vs – charge). The expected call from a call reversal then depends on the type of call. For example, an out-of-bounds reversal essentially will provide the expected points per possession for the opponent. A shooting foul will provide the expected points of two free throws from the player to take them. It should be clear that there is an upper limit on what is the maximum expected point from a challenge. For example, if a three-pointer was scored but an offensive foul was called on the shooter and the coach challenges it to be a shooting foul, a successful challenge will yield 3+ points (say 3.8 points). So there is a limit on what value you can get from the challenge in terms of expected points. These cases might not appear very often but still there is an upper bound on the value, which does not exist in the secretary problem. If there is an absolute maximum that you can obtain, then once you observe this maximum you know there is no way to find a better situation and you should make the challenge. So if we assume that the max expected points to be yield are 3 (again in reality getting this expected value from a call is extremely unlikely) and that the number of calls that a coach can challenge are 20 (this number can be learnt from data and we can also consider only calls that have say more than 40-50% probability of being overturned). Then we can use the following decision process:

- For the first “challengable” calls do nothing unless if one of them yields the maximum expected points. In this case, challenge.
- After the first 7 challenges, and with being the maximum expected points for the first 7 challenges, if or , challenge. Otherwise, continue.

Now this is of course just a roadmap on how one could start thinking of the problem on when to challenge. High-profile, high-levrage challenges like the one in the Portland – Dallas game, are outliers and personally I think that keeping the challenge in case a call/situation like this appears is a *bad* strategy (it is essentially based on the **availability heuristic **– it is easier to recall high-leverage successes than run-of-the-mill ones).

One of the limitations of the challenge process at the moment is the inability to challenge non-calls. There are practical reasons for this (i.e., continuation of play etc.) but next iterations of the rule (if it is to remain) need to consider this. This is particularly important, since based on the L2M since 2015, 92% of the incorrect ref decisions are non-calls, and hence, non-challengable !

Again this text is just a dump of ideas and philosophical-level thoughts about the win probability and expected points. I am looking forward to hearing to your comments.

]]>There is also a difference in the FT% with regards to the splits. Following are the percentages from last season over the whole league:

As we can see there are some differences (statistically significant as well) between consecutive free throws in a FT trip. Others have noticed similar behavior. Furthermore, we see that overall the FTs at a 3-point shooting foul have higher percentage since most probably the players that are awarded these fouls are better shooters. With the new rule change the unit of interest is the FT trip. Last year there were on average around 3 trips for 1 FT, 10 trips for 2 FTs and 0.4 trips for a triplet of FTs per team per game. Now all of these trips will be awarded a single FT that will be worth 1, 2 or 3 points respectively. What will be the impact on scoring? Assuming (and this is a big assumption that I will expand a bit on later) that teams keep getting the same type of FT trips and have the same percentages (the ones that matter here are the percentages of the first FT of the trip), the following figure shows the expected number of points per game per team from FTs compared with what they get now:

As we can see almost all teams are expected to get less points from the charity line if this rule was to be introduced in the NBA. The degree of reduction depends on many factors (type of trips for a team, actual FT%, degree of FT split percentages etc.). However we could start looking at pairs of teams and examine who would get a *benefit *in their matchup. For example, Memphis is barely expected to see any decrease in its expected points from FTs, but Brooklyn will see a decrease almost equal to 1 point. So in a matchup between BKN and MEM the new rule would give Memphis an 1 point advantage.

Of course here come the assumptions. The rule change is going to change the way teams foul (I think) and hence, who and what kind of trips a team gets. How? I don’t know. But now you can imagine a league-average player (even including the splits) currently has around 6% chance of not scoring from a pair of FTs. With the new rule he will have a 25% of not scoring. This can make Euro fouls a thing again. Of course, someone will point that there is also a 75% chance of scoring 2 points as compared to the 56% chance of that happening today with the current NBA rule. True; that’s why I said I don’t know how this will change the fouling behavior. Moreover, teams might put more emphasis on practicing FTs (which I would believe and hope do anyway but even more so now) – not to mention personnel movement. What is clear though, is that with the current levels of FT% the teams are expected to score less from the line. Scoring is another variable on fan satisfaction (as is the duration of the game). Of course the reduction will be less than 2 points per game but fans do like more points (maybe not from the FT line anyway). Also we have not separated the rule for the last 2 minutes – that will be the same as is now – so this will further reduce this impact on scoring (it might also bring an uptick of fouls just before the last two minutes so teams can *benefit* from the increased variance from the new rule — but we cannot know until we see).

We will see how things will go in the G league this season but it is certainly an intriguing change.

]]>So nylogens provided me with a set of pairwise comparisons and the users choices. For example, I know that nets and were shown to a user and the user chose . Bradley-Terry (BT) model uses these comparisons between *objects* to identify the *ability* for each object . Once these abilities are identified, then the probability that object will be chosen over object is given by:

Given nylon-specific features (e.g., chain or no, elastisity of the net etc.), the ability of each object can be further expressed as:

This – theoretically – would allow one to truly design a nylon that everyone would like, since we would be able to estimate how a given feature would correlated with the nylon’s *ability *or *gem score *as I termed it. Since we do not have similar features we simply estimate the ability of each net, i.e., a number for each one of them. I collected 285 pairwise comparisons (feel free to contribute more data by visiting nylongems and I will update the results), and following are the gem scores for each nylon:

Unfortunately, my personal favorite is ranked last, while me second favorite (the one with score -0.48) does not rank any better.

There are certainly more advanced pairwise-comparisson models but Bradley-Terry model is used in a variety of settings, including sports analytics, social sciences etc. For example, we use a similar framework to understand how people perceive biking safety and what street features increase the (perceived) safety.

You can find all the code (for the flask app and the data analysis) on my github.

]]>To explore this we used a detailed shot dataset that includes information such as touch time prior to shot, dribbles prior to shot, distance of closer defender etc. One information missing is whether the shot was assisted or not. However, we can estimate whether a shot is a pull-up or a catch-and-shoot (i.e., *assisted*), by using the touch time and the number of dribbles taken before the shot. The following figure shows the fraction of shots from each court zone that are assisted.

As we can see three point shots – and in particular corner 3s – are shoots taken after a pass more than 90% of the time! In fact, this is what it seems to mainly drive their higher efficiency, rather than the closer distance to the hoop. We also calculated the average distance of the closest defender for assisted and unassisted shots and as we can see from the following figure, assisted shots are in general more open. In particular, the average distance for the closest defender of an assisted shot is 40% further as compared to the case of an unassisted shot (both the t and Kolmogorov-Smirnov tests reject the null).

So what are the differences in FG% between assisted and unassisted shots per zone? The following zone presents the results, where we also present the p-value from the z-test for proportions.

As we can see in all cases, assisted shots have better chance of going through the net regardless of the court location. Given that midrange shots are assisted in the lowest possible rate, could that be the driving force behind their inefficiency? The short answer is no. Even if all shots from the midrange were assisted, their expected points per shot would still trail that of …unassisted threes. The following court map shows the expected points added (over unassisted shots) per assisted shot for each court zone.

Where is the most value added? Well, in the most efficient locations to begin with, threes and in the restricted area. The “rich-gets-richer” in these locations with some help from teammates. What do all these tell us? Well, we can use the expected points added (over an unassisted shot) from a location to try and divide the credit of an assisted bucket between the passer and the shooter. Let’s assume that an unassisted shot from location provides points per shot, while an assisted shot provides points per shot. Then a simple way to divide the credit between the shooter and the passer for an assisted bucket from is to assign:

points

to the passer, where is the points earned from location . Hence, the shooter is credited with . This can also be personalized to pairs of passer-shooter players but then you might run into problems with noise. For some players with small numbers of shot attempts, there can be a lot of noise in the FG%. With the above approach, you might end up with negative point value for the assist (of course a player could indeed be better at pull-up shots compared to catch-and-shoot in which case I guess he should take the whole credit (?))! In these cases, Bayes is our best friend, since we can use a prior to alleviate this problem. The prior can be the league-wide percentage increase from an assist in a zone (or any other choice). We can also use Bayesian average for the points per shot directly, similar to the way we used it for lineup ratings. For example, if player has a points in 15 total shots, while the league-average is points in 48 shots (our prior), then we can adjust , to:

Of course, more advanced approaches could be used that account for the exact locations of players, the progression of a possession etc., but a simple division of credit like the above can cover a long distance.

The code for this analysis can be found at my github page.

]]>As aforementioned a major concern is what will happen to onside kicks. The solution that seems to be emerging is to give the scoring team a fourth down in their own territory. However, what the field position and yardage to cover for a first down should be? Depends on the objective. Do you want to give the scoring team the same chances of recovering (and maybe scoring after the recover) as in the NFL’s onside kick? If yes, you need to set the distance to a value that will lead to conversion rate of about 18.5% – which is the onside kick recover rate. Following I present the conversion rate for 4th (and 3rd) downs in the NFL (granted AAF might be a level or a half level below NFL in terms of talent but these are fairly good numbers as a starting point), where the onside kick recover rate is represented with a horizontal red line.

It seems that 4th and 10 will give an advantage to the offense compared to the current NFL onside kick recover rate. So maybe a 4th and 15 will be more *fair* (again it depends on what the league is after in terms of allowing teams to get the ball back).

Another variable is where do you position the chains for this 4th down attempt. The idea would be to position them at the team’s own 35 yard line. However, what is also important is where the ball was typically recovered from an onside kick. That was the teams own 47 yard line, so if we assume a 4th and 15, this gets us to the team’s own 32 yard line, which is not that far from the 35 yard line.

Of course, here is where it can really get interesting. The league could allow the team to exchange part of the yardage-to-go for field position (and vice versa). Depending on the talent of the team’s offense or the opponent’s defense this can be a strategic decision. The league’s exchange converter will be a league-average, so individual teams might be able to exploit this. These exchange converters can be calculated from NFL data (and possibly adjusted for the AAF).

The other part that needs to be considered is what happens after a score if the team does not want to attempt an onside kick. Where does the opponent get the ball? The simplest thing (and what I am assuming will end up happening) would be to get it at their own 25 yard line, similar to a touchback. However, this eliminates the ability of a good kicker to pin the offense close to their own yard line (or even a good return team get better field position). So here is another suggestion: have the scoring team choose the extra point kick distance and exchange it with field position for the ensuing drive (similar the starting position for a two-point attempt). For example, if I want to pin my opponent closer than 25 yards to their own territory I might take me kick PAT 10 yards out. Again we will need an exchange converter but again this can be done with appropriate data. For example, using some of the data from my paper presented in the Workshop on Machine Learning and Data Mining for Sports Analytics back in 2015 following is the success rate of a field goal as a function of distance:

The vertical line represents the current NFL kick PAT and we are interested in values greater than this. For the range of distances between 32 and 55, the drop seems to be a bit linear and in particular a linear model explains 85% of the FGs conversion rate. In particular, each 1-yard increase (beyond 32-yards) drops the expected points per FG by 0.014 points. Furthermore, the following is the probability of TD, FG and failed drive given the starting field position as taken from the original paper (so the touchback line is still the 20-yard line ):

If we do the calculations each yard behind the touchback line (we used the 25-yard line) reduces the expected points from this drive by 0.027 (the linear model explains 94% of the variance). So it almost seems that there could be a simple 1-to-2 exchange rule between PAT and field position at the league average level. Every 2 yards behind the current PAT kick, will bring the offense back by 1 yard. Now it is clear that if you want to pin the offense all the way down to their own 1 yard line, this would mean increasing the distance of your PAT by about 50 yards, which will give you practically a 0% chance of making the extra point (falls out of the linear range we used above). This is still a decision the team could make based on their kicker and how he compares to a league-average kicker (and considering the opponent’s offense, time left, score differential, its own defense etc.). The team could even have the opportunity to choose closer distance for the extra point and give the opponent better field position (but I would not expect that to be the case). A similar idea can be used for (no) “kickoffs” after a field goal score. I.e., you can exchange distance for starting field position. Obviously a different exchange converter is needed. All of these could be similar to penalties – i.e., can be accepted or denied by the other team.

The only thing that has not been covered here is what happens with the return unit. In the above only the scoring team (kicking team) makes decisions and can decide where they want to pin the opponent’s offense. How can the (no) returning team *simulate *a return? A possibility would be again to enhance the above mechanism with an exchange between field position and yards-to-go for the first first down of the next drive. This could actually mean that the scoring team might choose to give the opponent’s offense a better field position for longer yardage-to-go (?) or to spice things up in a FG attempt the defense might also be allowed to offer the offense closer distance for the FG in exchange to yardage for better starting field position (offense can obviously accept/decline). Again data can come to the rescue for this. But well, I am not going to spill all the beans here, but if AAF is interested I am available for hire But jokes aside, it is very interesting to see where this no kickoff rule will go and what will be the impact on collisions and player safety.

Where am I going with this? Well unfortunately in academia people tend to choose their research topics based on where there is funding, since they are constantly pressured by their supervisors (that is, deans, provost, chancellors etc.) to bring in research funds. This is understandable but sometimes we need to make short-term sacrifices (in this case do some unfunded research) to see the long-term potential (i.e., cultivate the land for research funding in the area of interest). I think this time has come for sports analytics. With more academics being hired by league offices and analytics departments of teams I expect them to be more open to these possibilities and initiatives. Tech giants like Google, Facebook and Amazon offer university grants for topics that are tangentially related to their focus. I think that this is a great opportunity for league offices and teams to get answers from academia to questions they might have been trying to answer. After all they are trying to do so through hackathons (which I do not think is the best way to do it – in my opinion). On the other hand, academia can get access to data and funds that will allow them answer deeper questions (similar to the ones people like Thaller, Massey, Romer and others have tried to answer) that advances science as well. This kind of industry grants are not large (typically between 50K-150K) but they are enough to fund one or two years of research and have a long-lasting impact in the organization. In fact, Pitt made some initial effort this year to promote research on sport tech. And as the call mentions: “While the immediate goal is to improve athletic performance, the solutions developed are expected to have **broad applications and the opportunity to positively impact people of various ages and physical conditions**“. I hope we will see more of that and not only from Pitt and other universities but from professional sports entities too. Actually Texas Rangers took a (very minor) step towards this offering a small scholarship to undergrads that came up with an “innovative analytics approach”. Of course the ratio of the reward to the potential benefits for the Rangers is a bit skewed but still a start…

I can see this being a win-win situation but there are two things that need to be done:

- Make sure academics understand that sports analytics is not a “cult” but a field that offers data that can help answer deeper scientific questions in a variety of fields and problems
- Make sure league offices and teams understand what academics can bring to the table and it is not just for winning the next game (which can very well be of course) but it is mainly for better understanding processes and improving practises for the long-term

Let’s see how things unfold in the years to come…!

]]>Therefore, from now on I will not be blogging! Or let me say it better, I will only be blogging actual research that I have performed and is completed and peer-reviewed in a more compact way. People that will be interested they can then read the actual research paper. Similar to this post or this post or this post or this post. I will also be posting blogs that explain an analytical technique (using sports data), like this post or this one. (or of course my predictions/opinions). But what I will not be doing is presenting “new research”. I really think that blogging has turned the corner and has actually negative returns for people creating them and reading them, and I can see this from my students! Today there are peer-reviewed journal that will accept and publish applied research too. So you can publish there (the review process will help your analysis too). So I will make a plea to all (sports) bloggers out there: blog responsibly!

]]>So let’s start with some basics on z-score. The z-score of a data point tells us how many standard deviations above/below the mean of the whole dataset this observation is and is calculated by:

where is the mean of the dataset and is the corresponding standard deviation. This is a way to standardize the data, since if one takes the z-scores for all the observations in a dataset they will have a mean of 0 and a standard deviation of 1. Standardizing observations allows us to make comparisons of data that are in different scales, and can thus be used to make comparison of players/teams of different eras. To understand this better before delving into the Bulls-Warriors debate let’s see a nice example from the Mathletics book. Let’s say that someone asks you to compare Roger Hornsby to George Brett based on their batting average. Hornsby had a .424, while Brett had a .390 batting average. A naive comparison will conclude that Hornsby was more impressive since his batting average was higher. However, in the 1920s when Hornsby played the average batting average .299 with a standard deviation of .0334, while in the 1980s, when Brett played the average batting average was .274 with a standard deviation of .0286. If we calculate the z-scores for the two players we have:

Based on the z-scores, Brett was more than 4 standard deviations better than an average batter in his era (of course Hornsby was still extraordinarily better than his average competition, but not as extraordinary as Brett). In other words, standardization puts the data into context relative to the era’s competition and is a nice tool to keep in mind for such comparisons. Let’s now see how we can use z-scores to compare the Bulls and the Warriors.

Obviously there is not one statistic that can answer this kind of question. Furthermore, there is clearly not a deterministic question, since no team has a 100% win probability against any opponent (unless if the game is over), let alone when you compare two all-time greats. Hence, in principle what we would like to know is what is the win probability of a Bulls (90s) – Warriors (10s) matchup. Earlier in this blog we presented a simple, yet accurate, model for calculating pre-game win probabilities for NBA matchups, namely, the Basketball Prediction Matchup (BPM). While you can see the details at the corresponding post, in brief, BPM relies on a Bradley-Terry model on Oliver’s four factors, namely, effective field goal percentage, turnover rate, offensive rebound rate and free throws to field goal attempts ratio. So what one could do is to take each teams four factors, throw them in BPM and obtain a win probability. Not so fast though! The game has changed dramatically the two decades that separate the two teams. For example, in the 90s the NBA was a *big* man’s league, while now it is a *small* man’s league. Not to mention the explosion of 3-point shooting. This latter point particularly has a strong effect on one of the four factors, that is, the effective field goal percentage! And while we are at that the 90s Bulls played two seasons with a shortened 3-point line; it would not be fair to make direct comparisons across eras with similar differences. In order to be a little more smart, we will calculate the z-score for the 90s Bulls four factors and then use these z-scores to project them to contemporary NBA. Now, you might be wondering, is this accurate? Most probably not completely – after all every model is wrong, but since some are useful I think this is a good start. Now we had two options here; (i) to project Warriors factors to the Bulls era, and (ii) to project Bulls factors to contemporary NBA. I chose to do the second since we are conditioned to think contemporary but most importantly because BPM was trained using data from the past couple of seasons.

Using data from basketball-reference.com the following table shows Chicago’s z-scores for the four factors during their 6 championship years (note that negative z-scores for the turnover rate is good):

We will begin by considering all possible team matchups (e.g., 1990-91 Bulls with the 2017-18 Warriors etc.). Hence, in order to project the Bulls performance from season to season we will use the corresponding z-score from season and the league averages of season . For example, the projected eFG for the Bulls for the 2014-15 NBA season would be:

Using these projections we can now use BPM to obtain win probabilities. The following show the win probabilities for the 1990s Chicago Bulls against the contemporary Warriors for home and away games respectively:

As we can see – both from the z-scores and the win probabilities – the 1997-98 Bulls might be the least competitive against the Warriors. Potentially this is due to the bad start of the season playing without Scottie Pippen for the first half abd practically being around .500. However, overall the Bulls would be favorite in 83% of the matchups, while the average win probability for a home game for the Bulls is 60%, while for an away home it is 55%. This is not a terribly telling stat, but the Bulls seem to have a slight (?) edge over the Warriors. We simulated 20,000 7-game series between the two teams using these win probabilities (10,000 with Bulls having the home advantage and 10,000 with the Warriors having home court) and the Bulls won 66% of them. On average Bulls won in 6 games, while Warriors won in 6.5 games. Following is the distribution of the series length:

Obviously the above does not settle the debate by any stretch of imagination. As aforementioned using z-scores is just a rough way of making similar comparisons and unfortunately (or fortunately) we will never know how a game between the two teams would turn out. However, I hope it showcases how one can start thinking about comparing players/teams of different eras! Plus it is a fun way to understand (and teach) standardization.

]]>