Shooting charts have been an integral part of a basketball game’s summary for quiet some time now. Fans like visual information and can easily grasp it. However, if one wants to understand the shooting patterns of teams/players and how they differ (and how potentially this can drive pre-game scouting and in-game strategy) heuristic comparison of shotcharts might not take you far. For example, the following 3 shot charts are the charts for the 3 Greek teams participating in the Basketball Champions League this year (AEK, Aris and PAOK) through week 6.
You can observe a few differences (e.g., AEK is taking many more shots from the restricted area, while PAOK is taking more shots from the paint), but overall there is little that you can say with regards to “how much” different the shooting tendencies of teams (players) are. One could possibly provide information about the exact locations of shots and obtain some type of shot density and compare. This is certainly possible but cumbersome. On the other hand, the fact that the shooting charts are mainly similar, it might mean that there are underline patterns that all of the teams (players) follow to a different extend. If we could identify these patterns (some type of shooting dictionary) we could then describe a team/player through these patterns.
One way to identify similar latent patterns in data that can be represented through a matrix S is matrix factorization. With matrix factorization one tries to express the original data matrix as a product of two (or more) factor matrices (e.g., WH). These factors include latent patterns of the original data. There are various techniques to perform this task, but for our case we will focus on Non-negative Matrix Factorization. In our case our data can be represented through a matrix S whose columns represent locations on the court and the columns represent teams (or players). However, what is court location? One could use the actual x,y coordinates by overlaying a grid over the court and obtaining the counts of shots in every grid cell. However, this would give fairly sparse and noisy results in our case where we have a fairly small number of games for each team/player. Another approach is to use as location the 12 court zones (Restricted area, paint, midrange slot, etc.) and then the element (i,j) of the matrix S represents the number of shots taken by team (player) i from court zone j.
Using S as our data (shot) matrix, Non-negative Matrix Factorization (NMF) aims at identifying matrices W and H such that:
where dist(S,WH) represents a distance metric between the original data matrix S and the product of the factor matrices WH. Furthermore, we constrain the factor matrices to be non-negative. This allows for easier interpretation of the results since the data in the original matrix are also non-negative. With regards to the distance metric, we have used the Frobenius norm of the matrices.
The next step is to decide on the number of patterns we want to find. This essentially corresponds to the number of rows for matrix H (which is equal to the number of columns of matrix W). Choosing the number of patterns is not trivial and is essentially very similar to the problem of choosing the number of clusters in a clustering problem. One approach to choose the number of pattern is by examining how good the approximation of S is with the factor matrices product. However, the approximation is monotonically increasing as we increase the number of patterns and hence there is a tradeoff between finding trivial patterns and approximation quality. This is the same as the problem of bias and variance; obtaining a large number of patterns essentially provides us with an overfitted model where practically every pattern represents a team/player. For our purposes we have used a number of patterns k = 7, since it provides a good tradeoff between approximation and interpretability (non-overfitting). Figure 7 presents the quality of approximation for the case of player matrix as a function of the number of patterns k (the results for the teams’ matrix are similar).
Every pattern (i.e., a row of matrix H) is essentially a 12-dimensional vector, each element of which correspond to one of the court zones. The value of the element further captures the strength of the corresponding court zone in the pattern. For example, one of the patterns identified from the players’ matrix is the following:
Simply put this pattern includes shots mainly from the restricted area and a few from the paint. Once these patterns are identified, the other factor matrix W can be used to obtain the coordinates of a player/team with regards to the basis of the patterns identified. This essentially allows us to express a player/team as a linear combination of these shooting latent patterns.
The following figure presents the 7 player patterns identified and the corresponding coefficients for some of the players (the coefficients for all the players can be found here). The coefficients are proportional to the number of shots a player takes. For example, Manny Harris appears to be getting the majority of his shots from pattern 1 (midrange shots), while Abromaitis is a corner 3 (pattern 2) and restricted area (pattern 3) shooter.
Each one of these patterns is also associated with an expected point per shot. In order to identify this we need:
- The coefficients c(i) of each area zone i for each of the patterns
- The expected points per shot p(i) from each zone area i that we have calculated in a previous post
Then the expected point per shots for each pattern r is:
The following table shows the expected points per pattern, where as we can see patterns 3 and 6 are the most efficient ones as one might have expected, since they include shots from the restricted area and the (left) corner.
We performed the same analysis for the 32 team and the patterns we identified are presented in the following together with some of the coefficients for select teams (all the coefficients can be found here).
Furthermore, the points per pattern for the team patterns are:
The above present a fairly simple application of matrix factorization in basketball. It provides a better understanding of the offensive/shooting tendencies of teams. Even more insights can be obtained if two matrices are analyzed, namely, one for made shots and one for missed shots. In this case, we can really identify potential inefficiencies of upcoming opponents. For instance, we can identify specific shooting patterns that a team is not successful at and force them to take those.
With regards to the factorization itself, the KL divergence usually works better as compared to the Frobenius norm that we have used in our analysis. Furthermore, one can overlay a grid (e.g., 1×1 meters) and use the grid cells as the locations. This will provide a very long matrix, but the NMF will essentially reduce the dimensionality. However, in this case there can be more noise as compared to when using the court zones and in this case it is better to apply NMF over an intensity surface instead of the raw counts.
ACK: I would like to thank Basketball Champions League for providing me access to the data.