Team sports are won by teams. This mean that you need to have not just a good starting line-up but depth in the bench as well. How much depth? And how much time should you give to the bench to carry the team? Or are all these just a myth? In this blog post I set to examine a few things about this using data from the past 4 regular NFL seasons.
In particular, using the snap counts for each player we quantify how much they are involved in their teams plays. Let’s first start with a simple frequency count for all the players in the league. The following figure presents the results for last season (2015-2016).
As we can see the majority of the players get few snaps (maybe to just fill up the roster (?)), while there are a few of them that get an extremely large number of snaps. For instance, the top-5 players with respect to snap counts last season were:
- Malcom Jenkins (Philadelphia) with 1,359
- Eric Reid (San Francisco) with 1,270
- Rodney McLeod (Saint Louis) with 1,266
- Walter Thurmond (Philadelphia) with 1,265
- Ben Jones (Texans) with 1,229
What we will be looking for is how uniformly or concentrated the snaps were distributed among the players of team. For this we will build the snap probability mass function for every team, . The entropy of this distribution will provide us with a quantifiable measure of diversity. In particular, given a discrete distribution , the entropy is given by:
The higher the entropy the more “diverse” the distribution, i.e., non-negligible probability mass is observed for many different values. In fact, the highest entropy for the probability distribution of a random variable is obtained when the distribution is uniform, while the least entropy is for a highly concentrated distribution, i.e., , for and 0 otherwise. Therefore, in our case, a large translates to a team in which multiple players take a significant number of snaps, while smaller values of correspond to teams that rely more heavily on specific line-ups.
Using the entropy of the snap count for each team for every year we correlate it with the final win-loss percentage. As we can see all years exhibit a declining trend, that is, teams with lower entropy tend to have better standing! The correlation coefficient is -0.41 (p-value < 0.0001). Winning teams do not change!
A simple linear regression model – using the data for all the regular seasons – for the winning percentage as the dependent variable and the team entropy as the independent variable gives , where the p-value for both coefficients are less than 0.0001. As we can see the coefficient for is negative, meaning that higher entropy translates to lower overall season performance for the team. Of course, the of the model is fairly small (around 25%), which means that the team entropy explains a small fraction of the total variability of the win-loss percentage. The above figure presents the raw data points from all seasons, while each line corresponds to the regression model for each season individually.
The intercept of the model essentially captures the fact that there is a baseline entropy for every team, since the interchange on the field of the offensive and defensive units (even if each of these units remains unchanged) will create some non-trivial entropy. In fact, the following figure presents the PDF of the team entropy, which as we can see resembles a normal distribution with a mean of approximately 5.45. In other words one can think of each team starting with a baseline entropy and then depending on how much they use their back-up units this moves up or down. Given that the win of a team is dependent on both offensive and defensive units, we chose to build the models using both units in the calculation of entropy. One could think of creating separate models for the offensive and defensive performance using as the dependent variable the total number of points scored or allowed respectively.
The question that these results lead to is why higher entropy doesn’t correlate with higher win-loss percentage? There might be various reasons, whose plausibility we cannot explore using the data we have, but we discuss in the following. One possibility is that teams that do not perform well to begin with they try to “stir the pot” by making often changes in the line-ups in an effort to identify the winning combination. Furthermore, the salary caps makes it hard in general for a team to acquire a large number of super-stars that would justify a high entropy for a winning team. Therefore, even though player rotation is ideal, it seems that in practice, once the coaches find the player combo that wins, they rarely switch things up. Of course, one of the limitations of this analysis is that no causal link can be established! I.e., it is not clear whether the lower entropy leads to higher winning percentage or the higher (initial) winning percentage leads to lower entropy since there is no reason to change a team that wins. Regardless of the direction a strong negative correlation between the two variables exists (-0.41, p-value < 0.0001), which shows that “too much” player rotation is not necessarily a good thing.