Last year everyone was caught up in predicting whether the Warriors will break the record for wins over a regular season (which they did). While there are certainly very sophisticated models for projecting wins over a season, I am taking a rather simple – yet fairly accurate – approach. After all as Occam’s razor dectates and Einstein articulated “everything should be made as simple as possible, but not simpler”. Hence, I decided to build a simple linear regression model to project the number of wins expected by a team (our dependent variable) given their current performance on a number of metrics (our independent variables). What are these independent variables though?
To pick the features of my model I took faith in Dean Oliver’s “four factors”. In his book “Basketball on paper”, he identifies 4 factors that best summarize the performance of a team in basketball. These factors are:
- The Effetive Field Goal percentage (eFG%). eFG adjusts the raw field goal percentage to account for the fact that 3 point shots are worth 1 more point. The formula for eFG is (FG+(0.5*3PtA)/FGA)
- Turnover percentage (TOV%) is the percentage of possessions that ended with a turnover. TOV% is calculated as: TO/(FGA+(0.44*FTA)-OR+TO)
- Offensive rebounding percentage (OR%) is the fraction of offensive rebounds grabbed from the team over all the possible offensive rebounds that could have been obtained, i.e., OR% =OR/(OR+DRopp)
- Free throws factor that aims at measuring both the ability to go to the charity line as well as convert the charity. It is calculated as the fraction of the free throws made over the field goal attempts.
The above factors are related with the offense of a team. Considering the defense we obtain other 4 factors, i.e., the opponents eFG%, the opponents TOV%, the opponents free throws percentage and the defensive rebounding percentage for the team (DR%). These factors have been identified by Oliver to be the most critical statistical categories for NBA teams. Oliver himself provided weights to each of this categories in order to represent the importance of every aspect of the game. In particular, 40% is shooting, 25% turnovers, 20% rebounding and 15% free throws. Instead of using these weights – which by the way there is no reason they are not correct – I let the data provide the importance. In particular, I collected from basketball reference the above 8 factors (4 offensive and 4 defensive) in conjuction with the pace (this is an estimate of possession per 48 minutes) for every team and build a linear regression model where the independent variable is the number of wins for a team. I collected the data since the 2010-11 regular season (excluding the 2011-12 season due to the shortened season because of the lockout) and following is the summary of the learned model.
As we can see the independent variables capture a very large part of the variance in the dependent variable, i.e., 93% of it. I also calculated the coefficients for each independent variable over the different seasons in order to see whether there are any changes in the importance of these factors. As we can see from the following figure where the confidence intervals are also presented, there are not any significant changes across seasons.
Using the above model the projected wins for this year are as in the following figure: