Which is the best team in the league? How does my team fare compared to the rest of the league? To answer similar questions fans turn to the power rankings that are regularly provided by various sports networks or the rankings provided by the league standings (i.e., win-loss records). The rankings provided by sports networks are based on a variety of factors. Some of them are objective, but some might also be very subjective, based on the opinion of the expert analysts. Hence, me and Vagelis – also known as the “tensor whisperer” – set out to examine objective ways based on network science tools, alternative to the league standings, for ranking the teams!
To have network ranking we first need a network structure. We define the , as a weighted directed network where an edge points from node to node if team has won team . The edge weight is equal to the point differential of the corresponding game. The following figure depicts an example of a for the NFL 2015-2016 season.
The captures the relations of who-wins-whom but also, and most importantly, it incorporates information about the schedule strength (who has played with whom). Then important teams can be found using centrality metrics borrowed from the network theory. Maybe the most well known centrality metric is Page Rank, which essentially emulates a random walker over the network. In the above visualization the size of the nodes is proportional to their Page Rank score, while the width of the edges is proportional to their weight (only edges with weight greater than 7 are visualized for clarity).
Based on this network the worst five teams of last year were: Cleveland, San Diego, Dallas, Tenessee and Baltimore – which sounds about right 🙂 – while the best five performing teams were: Seattle, Atlanta, Arizona, Carolina and Pittsburgh – this sounds again about right except from Atlanta! Why is Atlanta so high when their actual record was only 8-8? Because they were the only team to beat the Carolina Panthers and essentially the centrality of the Panthers in the network was only diffused through the “random walker” to Atlanta. One possible way to overcome similar glitches of Page Rank is by adjusting its parameters appropriately. Nevertheless, in this blog post we just want to present the idea without getting into the nuts-and-bolts of the approach.
How good is this ranking? Is it better than the win-loss percentages? To answer this question we turned to a prediction task. In particular, using the network up to week we used the ranking obtained from Page Rank (and the one from the standings) to predict the outcome of the match-ups of week . The prediction rule was simple; if team was ranked higher than team then was predicted to win! The following table provides the prediction accuracy for the Page Rank (termed with the fancy name ) and the winning percentage.
As we can see the network-based ranking appears to capture the power of the teams much better as compared to the winning percentage – which is also actually used to decide which teams advance to the playoffs and which not! Just to make a point, the way more sophisticated Cortana system of Microsoft, had a performance of 62.9% with a standard error of 3% for the 2015 season and 66.4% for the 2014 season with a standard error of 3.2% (the system did not exist earlier)! Take that Cortana!
What’s next in power ranking? While Page Rank performs well – and certainly captures the power of each team better than the winning percentage – there is still room for improvement. An idea we have put forward here with Vagelis is the use of tensors for ranking! A tensor is a multi-dimensional matrix and can capture more complex (network) relations. In brief, we define the a structure that captures the temporal evolution of the win-loss relationships, and in particular the element , if team won team by points. One way to identify a ranking is through tensor factorization, where the tensor is decomposed to a sum of components as is visualized in the following figure.
The good thing about tensor factorization is that we can process the tensor together with other, side, information (e.g., performance information for every team such as yards per game, # of turnovers, # of interceptions etc.). This information can be captured by a simple 2D matrix, where each row corresponds to a team and the columns correspond to the performance features. The challenging problem is how one goes about combining the factors obtained in order to get a ranking of the teams! We will let you know as soon as we do it 🙂
In addition to team ranking, tensor decomposition is a very powerful technique that can help us identify latent groups of similarly performing teams across different dimensions; combining that with the 2D side information, we can correlate those team groups with performance statistics that mostly explain their behavior and use that to further enhance predictions. More on that in the near future! 🙂
It should be clear that the simple standings used by the various leagues for deciding which teams will play into the playoffs can be significantly improved. Of course, it will be hard to convince team X with higher winning percentage but lower Page Rank (or other network rankings) than team Y, that it should give its playoff spot to Y. Nevertheless, with the data analytics craze in the business today everything is possible!