What attributes explain a team’s win-loss performance? Of course this is the holy grail of sports (hoops) analytics, since knowing the answer to this question will allow a team to create a plan that will exploit this knowledge. The famous 4 factors from Dean Oliver are one such example, where a small number of statistics are enough to explain the number of wins for a team through the season. The 4 factors explain what aspects of the game are important for winning, that is, offensive rebound rate, free throw to FGA ratio, effective FGP and turnover rate.
In this post I am taking a different approach and I am focusing on player interactions that do not lead to specific observed statistics at the (advanced) boxscore, but capture other latent aspects of the team. In particular, one of the attributes widely discussed is the bench depth and the versatility of the players. The hypothesis goes that deeper bench and more versatility in the skills of individual players will translate to a team that can use several lineups against opponents. Both the depth of the bench as well as the versatility can be quantified using as a proxy the substitution relationships of the players in a team, that is, who-subs-for-whom.
In particular, using the play-by-play data for all the games during the 2015-16 season, I created for each team a (directed) network, where the nodes of the network are players and there is an edge from player i to player j iff i subbed for j. Each edge is also associated with a weight that represents the number of times that the specific sub was observed. Networks, while having been used in the analysis of sports, they have almost exclusively focused on analyzing the passing behavior of teams. Nevertheless, network science provides a powerful set of tools for analyzing a variety of relationships between players – or even teams. Just as an illustrative example, the following networks represent the substitution patterns for the Celtics, Spurs, Cavaliers and Warriors for the season 2015-16.
After obtaining the network for every team, simple metrics can be computed for every network, such as the clustering coefficient, the degree assortativity, the entropy of the network, etc. These metrics reveal information for the substitution relationships between the players of a team. For example, a negative degree assortativity represents a network where connections mainly exist between nodes of high degree with nodes of low degree. Networks that exhibit these type of structure have been shown to be less resilient to the removal of their high-degree nodes. This can have significant implications for a team whose substitution network exhibits similar patterns, since losing one of their high degree nodes/players can be detrimental to the performance of the team as a whole. The following figure presents the scatter plot for the win percentage with each of the independent network variables examined.
I then regressed these variables (except the network density that is almost constant across the teams) with the win percentage for the teams in order to examine whether any of the substitution network metrics are significant at explaining any of the variance observed for the win percentage of a team. Following is the obtained results from our model.
As we can see the degree assortativity and the diameter of the network are significant (at the 0.05 level) explanatory variables for the win percentage of a team (for the season 2015-16 at least). Of course, the R squared value is only 0.33, i.e., these network parameters explain only 33% of the observed variance, but this is expected since substitution patterns are important but they do not include any information about the quality of these substitutions and their impact on game (advanced) statistics such as Oliver’s 4 factors. The direction of the effects is for both independent variables negative. In particular, a smaller substitution network diameter is correlated with higher win percentage. Smaller diameter translates to smaller average distance between players, which can be representative of a set of players that can play multiple roles in a team, hence, increasing its versatility. The negative direction of the degree assortativity might be contradicting to the network science theory that describes the relationship between resilience and assortativity. In particular, lower assortativity is associated with lower resiliency, but in our case teams with lower assortativity are correlated with larger win percentage. Note that almost all of the teams have negative assortativity. The reason here is that the underpinning mechanics of this network are different compared with those of technological networks. In particular, a smaller (negative) degree assortativity means that there are a few high degree nodes connected with lower degree nodes. This will appear when a team is build around a small number of “super-stars” that are surrounded by a supporting cast of role-players. This seems to be the recipe for success in the NBA. Furthermore, removal of the high degree node from the substitution netwokr is happening only when there a serious injury at the corresponding player, which although it can happen it is not as frequent as deliberate attacks in technological networks.
Of course this analysis is very basic and preliminary. For one, data from multiple seasons need to be analyzed. However, I wanted to showcase how networks can be used in sports analytics beyond the straightforward application on modeling the passing behavior of teams. More advanced network science notions and tools can further provide deeper insights. Stay tuned for more!