Teams in NFL are divided into two conferences and 8 divisions (see map). Given that the regular season includes only 16 games for each team it is clear that a team cannot play all the rest of the teams. There is an extremely complicated algorithm for obtaining the schedule of the regular season (various game-related and non game-related constraints are considered), but there are some specific rules that identify the opponents:
- Each team plays 6 games with the other division teams (3 home and 3 on the road)
- Each team plays 4 games (2 home, 2 away) with the teams of a given division in the same conference – determined by rotation
- Each team plays 4 games (2 home, 2 away) with the teams of a given division of the opposite conference – determined by rotation
- Each team plays 2 games (1 home, 1 away) against teams from the two remaining divisions in its own conference – determined by division standings of previous season
Given that it is not possible for every team to play every other team in the league, the question is whether there are teams that travel more for their road games and
ultimately whether this has any impact on the winning percentage of the teams. This brings up the more fundamental question on whether the divisions formed by NFL teams are efficient with respect to traveling fairness. Of course the home city of a team has its own implications on the whole issue (e.g., an imaginary team in Hawaii will have to travel a lot for its road games no matter what) and it will be hard to achieve complete fairness. Nevertheless, given the specific distribution of teams in the cities, what is the optimal forming of the divisions.
Let’s start by examining the distance traveled for each road game for each team during the last 7 seasons. The following figure presents these distances in a box plot for each team.
As we can see there are teams that tend to travel large distances such as Seattle and San Francisco, while there are others that travel smaller distances (e.g., Pittsburgh and Cleveland). One might have expected this since the majority of the teams are in the central and east part of the country and hence, teams from the west coast are expected to travel more. Furthermore, the AFC north division (where Pittsburgh and Cleveland play) is more geographically contained and hence, there is not much traveling involved for their divisional games. Using the distances traveled for the last 7 NFL seasons we can make pairwise comparisons between teams. In particular, we performed a standard t-test between all pairs and the following matrix presents the results. Each point correspond to the difference between the distance traveled by the row team minus the distance traveled by the column team. It should be evident that the element (i,j) of this matrix is the negative of the element (j,i). Also the grid points with the red color correspond to comparisons that cannot reject the null hypothesis, that is, the two teams travel the same distances!
As we can see, teams such as SF, SEA, ARI and OAK travel the most (the corresponding rows have light-colored grid points that based on the legend correspond to larger travel distances). One might think that this is kind of unfair since teams that travel a lot might be affected in their performance. Hence, I did a very simple experiment. In particular, for every team and for each one of the last 7 regular seasons, I calculated (a) the average distance traveled for its road games and (b) its final winning percentage. Then I computed the correlation between these two metrics. The following figure presents the scatterplot and it should be obvious that there is no relationship between them! In other words, less travel does not translate to more wins!
Even though there is not a relationship between road game distance and winning percentage, is there a better division of teams such that the travel distances are minimized all across the board? We start by performing a hierarchical clustering of teams based on their location in order to see what kind of clusters/divisions we get based on the geography of the teams (I have considered the Rams being still in Saint Louis).
As we can see at the two top-level clearly divide the teams in “west” coast and “east” coast teams. The current conference/division structure of the NFL does not do so (e.g., SEA is at NFC, while OAK is at AFC). Focusing at the dendrogram level that gives 8 clusters, i.e., the 8 divisions, we can see that the these divisions have different number of points (teams) in them. However, we need to constraint the clusters to 4 points each if we want to keep the current structure of the league. Therefore, the solution obtained by the hierarchical clustering can form the initialization of an iterative algorithm that tries to find 8 clusters of 4 points each, such that the average pairwise distance of the points in each clusters is minimized. Then these 8 clusters can further be agglomerated in 2 conferences in a similar manner. Based on this objective the optimal division of teams is the following:
This division reduces the inter-division (geographic) distances by 73% as compared to the current division. Of course, I am not suggesting that these should be the divisions in the NFL, since I understand that there are way more factors to be considered except from the geography. Nevertheless, who would not want to see the Steelers beating the Patriots twice a year!