Soccer Ratings: An Optimization Approach

Ranking and rating teams has been one of the major tasks in the sports analytics community. One of the most elegant approaches I have seen is the one presented by W. Winston in his book “Mathletics“. In this post I will adopt the general approach presented and add a few elements more. Our case study will be this year’s Greek SuperLeague performance.  I know that this is not the best league in the world (maybe not even in Balkans for crying out loud) but it was a purely personal interest.  Also it does not matter what data you use for demo as long as they are not cheery picked for “good results”!

Every team will be associated with a tuple of ratings, an offensive and a defensive rating.  An average team will have a rating of 1.  Hence, a team with an offensive rating of 1.5 means that scores 50% more than an average team in the league. Similarly, a defensive rating of 1.5 means that the team allows 50% more than an average team.  Therefore, a rating greater than 1 is good for the offense and bad for the defense. As it will be evident in the following, these ratings are adjusted for opponent strength.  Even though each team in the league plays every other team, running the score up against a “bad” team will be accounted for accordingly.

Let us assume that we have the offensive rating o_i of team i and the defensive rating d_j of team j.  When i plays against j in its own field the expected number of goals scored by team i is m_h \cdot o_i \cdot d_j, where m_h is the (adjusted) average number of goals scored by a home team. So one way to identify the team ratings (and the adjusted average number of goals scored by a home/away team) is to minimize the mean squared error of the forecasts of the number of goals scored by each team in each game observed.  For example, if we assume that team i scored 3 goals against team j in its own field, then the squared error will be (3-m_h \cdot o_i \cdot d_j)^2.

Formally, let o be the vector that has the offensive rating for each team, d be the vector that has the defensive rating for each team and be the vector that has the adjusted average number of goals scored by a home and away team.  Also let y_{i}^{ij} be the number of goals scored by team i when i played home against team j.  We can obtain the team ratings and adjusted average goals per game by solving the following minimization problem (constraining also the average ratings to be equal to 1):

Screen Shot 2017-06-16 at 8.17.27 AM.png

The adjustment here – as compared to the original approach presented in “Mathletics” – is the inclusion of the two variables for the adjusted average number of goals scored by the home and away team.  This essentially captures the home edge of the league (in Winston’s example, the data used were from the 2006 World Cup, which is held in a neutral field and hence, there is not really a home edge).  The code for setting up the data (from the Greek league) and solving this optimization problem is on github and the results are presented in the following table:

Screen Shot 2017-06-16 at 8.52.20 AM.png

As we can see Olympiacos who won the championship this year had the best offensive and defensive rating. AEK which finished second after the playoffs advancing for the qualification rounds of the Champions League (fourth in the regular season) had the second best offensive rating, while Panionios had the second best defensive rating and finished fifth in the regular season and after the playoffs (possibly due to the below average offensive rating).

The figure below visually depicts the location that each team has in the offensive/ defensive rating spectrum.  The red box represents the area that teams would like to be in (i.e., offensive rating > 1 and defensive rating <1).  As we can see Olympiacos is “Pareto optimal” in this space, with AEK, PAOK and Panionios (who is not within the “red box”) forming the Pareto frontier if we remove Olympiacos.

Screen Shot 2017-06-16 at 5.38.37 PM.png

OK so now we have the ratings, so what? We can use these ratings to estimate the probability for each outcome for an upcoming mathcup. In order to do so we model the number of goals scored by each team with a Poisson distribution. The mean of the distribution is obtained from the team ratings and the average goals per game.  For example, when AEK is playing Olympiacos at home, the mean for the expected number of goals scored by AEK is 1.4478*1.6083*0.49953. Similar for Olympiacos this would be 0.836*1.6307*0.70549. We can then simulate thousands of times the matchup and compare the number of goals scored by each team in each simulation to estimate the probabilities.  For example, the matchup above (AEK-Olympiacos) has a 40% chance being won by AEK, 31% by Olympiacos and 29% being a draw. If the teams were matching up at Olympiaco’s stadium, then Olympiacos would be a 61% favorite, while AEK would have a 15% chance of winning and 24% probability of being a draw. The MATLAB code for simulating the game matchups is also on the github page.

In conclusions, rating the offense and defense of a team in a league can help us better understand their performance and also obtain probabilities for the upcoming matchups.

Champions League 2016-17

Ioannis Christovasilis was kind enough to provide me with the full results of the group stages for the champions league. Using the above approach I calculated the ratings for the teams that are presented in the following figure.

Screen Shot 2017-06-18 at 6.41.08 PM.png

As we can see the Juventus (the finalist of this season) was on the Pareto front of the offensive/defensive rating, together with Borussia Dormund (which had a great run in the group stages, with impressive offensive games – including 2 games with 6+ goals – and 2 draws with the eventually champion Real Madrid) and Atletico Madrid (reached the semi finals).  As we can also see, Barcelona was very good offensively in the group stage but average defensively, while the same is true for Real.  The following table has the full ratings.

uclratings.png

Based on this rating if Real Madrid and Juventus were playing at the end of the group stages, Juventus would have been the favorite with 69% chance to win in a neutral stadium. However, obviously the teams did not play the final in January, so following is the final team ratings:

clfinalrat.png

As we can see Juventus had the best defense (not including the final), 77% better than an average team. Furthermore, its offense was 40% better than the average offense. Real Madrid had a very good offense too, with 60% above average, while it had also a good defense (not as good as Juventus) with 47% better defense than an average team.  These ratings gave an advantage to Juventus for the final 62% over 38%.  However, as we know by now, Juventus was not able to deliver its superb defense during the final.  The interesting thing to see is that there are teams that one might argue their ratings are better than Real’s or Juventus’. For example, Dortmund had excellent offensive rating and 30% better than average defensive rating.  However, it only reached the quarter finals. This is a result of the matching; Dortmund played Monaco, another team with fabulous offensive rating and a good defensive rating (22% better than average). One team can only advance and hence, we see teams with good rating to not reach as far in the tournament as one might have expected.

PS: PhaethonPrime, has kindly provided the Python code for the same optimization problem here.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s