Why Attempting to Predict the Winner of Euro 2020 Is a Loser's Game
International soccer has stumped sophisticated model-builders, economists, and NASA astrophysicists. This summer shouldn't be any different.
Back in 2009, Gerald Skinner was working for NASA. He was studying gamma-ray and x-ray astronomy, he had an email address at the wonderful domain: milkyway.gsfc.nasa.gov. He was co-writing papers with titles like “X-ray interferometry with transmissive beam combiners for ultra-high angular resolution astronomy”, “1A 0535+262 is now three times as bright as the Crab”, and “Background Studies for the Black Hole Finder Probe EXIST”. Skinner spent his career trying to figure out how to see the things we cannot see and know the things we cannot know.
But there’s one topic that totally stumped him: international soccer. In 2009, Skinner co-wrote another paper, “Soccer matches as experiments: How often does the 'best' team win?” As you might expect from a prodigious astrophysicist, the paper was elegantly designed. He and his co-author, Guy Freeman, looked at individual games and also the larger World Cup as if they were experiments that attempted to determine the better soccer team. They then asked the question: Is this a well-designed experiment?
On a game-to-game basis, they found that, no, it is not a well-designed experiment. Unless the match is a blowout, there isn’t enough information to suggest that the best team did, in fact, win: “For differences less than 3–4 goals the result lacks the 90% confidence which within quantitative disciplines is frequently considered a minimum acceptable level of confidence in the outcome of an experiment”. However, the World Cup -- with its groups of four teams who all play each other -- provides an opportunity for more experiments: “If the result of each match provided a valid comparison of the relative abilities of the two teams, the situation that A beats B beats C beats A should never arise”. At the 2006 World Cup, they found that across the 355 triplets the tournament presented, 17 percent of them were what they call an “intransitive triplet” -- A over B over C over A. If the matches were decided purely by random chance, then you would expect there to be an intransitive-triplet rate of 25 percent. In other words: World Cup results were just slightly better than random.
The paper concludes, in a sense, with Skinner and Freeman defeated. They half-heartedly suggest two methods to improve the design of soccer matches: make the goal bigger or keep playing the game until the scoreline reaches a state to “yield a chosen level of confidence”. If you’ll allow me to reverse engineer a NASA-approved prediction for this summer’s Euros, it would be something like: Who cares, dude? The experiment doesn’t work.
Of course, a lot of people care -- and a lot of people are still trying to predict what will happen at this summer’s Euros. Beyond the most popular prediction method -- betting markets -- there are all kinds of bespoke models attempting to provide some kind of signal amidst all the noise.
Over the past decade or so, Daniele Paserman, an economics professor at Boston University, has held various prediction contests for the Premier League, Champions League, World Cup, Euros, and other tournaments at his blog “Futbolmetrix”. He invites people to submit their predictions but also self-submits a number of other models using things like betting odds and transfer values. Despite all that, he said that hasn’t “been able to detect any obvious patterns that jump out” in terms of what kind of models perform best.
“On one hand, using knowledge from past performances, player quality, etc., almost everybody can do better than random guessing”, he said. “On the other hand, in some competitions the favorite's probability of winning the competition is about 1 in 6. Those were the odds for Brazil and Germany before the start of the 2018 World Cup. So predicting the winner is about as easy as predicting what number you are going to roll on a six-sided die”.
Before the last World Cup, I predicted, on a video, that Germany, the defending champ, would get eliminated in the group stages. I figured I would be wrong, but also figured that Sweden, South Korea, and Mexico was a tough enough group of opponents that one of the co-favorites to win the whole thing could fail to get out of the group. I was right, but it was an educated guess -- mostly a guess and mostly dumb luck:
Given the randomness inherent in a low-scoring sport, soccer is hard enough to predict as it is. According to analysis in the book The Number Game, the betting favorites win way less often in the English Premier League than they do in, say, the NBA or NFL. But international soccer, in particular, adds in a number of other confounding factors. Whoever wins the Euros this summer will only play seven games, four of which will be single-elimination. All it takes is one bad day -- of play or luck or both -- for a top team to be eliminated. For example: Manchester City won the Premier League this year, but they only won about 70 percent of those matches.
The scarcity of matches also applies to the dataset we have for each team.
“Success and performance on the international stage can be difficult to measure, for a number of reasons”, said AJ Swoboda, managing director of the Americas at the consultancy 21st Group. “One of the biggest is the higher degree of variance. The number of international matches is significantly less than at the club level. Over the last five years, England has played about 70 games, compared to about 300 for Manchester United”.
There’s just way less information about each team’s level of performance, and then there’s a question of how useful that information actually is. National teams are playing uneven schedules, their rosters of players are changing for every set of matches, and they’re not always trying to optimize their chances of winning each match. The world no. 1, Belgium, went 10/10 in qualifying for Euro 2020, scoring 40 goals and conceding three. They won every match by at least two goals and reached Skinner’s three-goal threshold in eight. But those matches were all played in 2019, and the slate of opponents didn’t include any particularly strong teams -- sorry Russia, Scotland, Cyprus, Kazakhstan, and San Marino.
“These are really challenging issues,” said Jesse Davis, a professor who leads a group of soccer researchers at KU Leuven, a university in Belgium. “The sample sizes are so small, we have tended to accept that there will be inaccuracies. A great quote attributed to the statistician George Box is: ‘All models are wrong but some models are useful’”.
Davis and Pieter Robberechts have given it their shot. They’ve built a model based around an Elo rating system, which gives each team a rating based on their historical performance, and then adjusts that rating with every match, based on the rating of the opponent and the final scoreline. Importantly, their model rates all the teams both defensively and offensively, which helps account for another complicating factor with international soccer: it’s a different game. Given that the players rarely play together, it’s a lot harder for coaches to pull off the kind of complex positional attacking play you’ll see in the Champions League or the kind of coherent and demanding pressing defenses that seemed to be a requirement for all of the best club teams in the world -- before this season.
“Defense is definitely more important in the international game because it seems to be easier to develop a good defense structure in a short amount of time,” Davis said. “There is also much less of an averaging effect in a short tournament. Hence, there is less time for a higher risk, higher reward strategy to pay off -- a high press where you generate good scoring opportunities and perhaps concede few opportunities but those opportunities that you do concede are very dangerous. Giving up an extremely ‘dangerous’ chance can have a much bigger effect in a three-to-seven game sample than a 38-game one”.
Germany fans are nodding their heads right now. Davis said that one of his model’s biggest weaknesses is that it doesn’t account for squad depth, which could be especially important this time around with so many players playing so many minutes over the past calendar year. At 21st Group, their model combines both historical performance and roster quality. They produce ratings for every professional player across a subset of leagues based on an estimation of how much they increase their team’s performance level, so they’re able to port those ratings over to their national teams.
“Looking ahead at the tournament this summer, there are not too many surprises or clear favourites,” said Aurel Nazmiu, a data scientist at 21st Group. “We believe there are five teams with roughly a 10-percent chance or greater to win: Belgium (14), England (12), France (12), Germany (12), and Spain (11). Back in 2016, we saw Portugal win when the market only gave about a four-percent chance of winning, so don’t write off nations like Portugal, Italy and the Netherlands. Denmark is perhaps the only real surprise in our top 10, so maybe a repeat of when they won1992?”
In 2016, UEFA expanded the field from 16 to 24 teams. It used to be four groups of four, with the top two teams in each group advancing to the quarterfinals. Now, it’s six groups of four with the top two teams in each group plus the four best third-place teams advancing to a Round of 16. If last time ‘round is any indication, the new format has introduced even more randomness into the competition. This shift of structure complicated Davis’s process as they built their model by back-fitting it to previous tournaments: seeing what would’ve predicted success in the past and applying it to today. And even that process has its own complications, as it’s so hard to tell whether you’re chasing a true predictor of performances or just random noise.
“Evaluating single-elimination tournaments is much more difficult than standard match prediction”, Davis said. “An unheralded team (e.g. Croatia in the 2018 World Cup) may have an excellent tournament whereas a strong team may have a poor one. Additionally, soccer is a sport where the better/more dominant team is more likely to lose (e.g. an unlucky bounce or a wondergoal). These ‘mispredictions’ balance out over the course of a club season, but how do you evaluate your predictions if your favorite goes out in the first knockout round? This has a cascading effect as the opponents in subsequent rounds will likely have a more favorable matchup. I do not have a great answer, but I think it is an interesting research question”.
Given its emphasis on team performance, Davis’s model gives Belgium an overwhelming 29-percent chance of winning the whole thing. In competitive matches since the 2018 World Cup, they’ve won 20, drawn one, and lost two while scoring 77 goals and conceding 17. The only other teams with a greater-than-10-percent chance of winning it all are Spain and France at 14 percent a piece, according to the model built by Davis and Robberechts. However, they, too, are feeling themselves some Denmark.
“We were surprised that Denmark (7 percent) had a slightly higher chance than England (6 percent) of winning for a couple of reasons. England performed well at the last World Cup and has lots of young talent. Moreover, England is ranked higher than Denmark on all three metrics we look at. It seems that Denmark’s potential quarter and semifinals match-ups work out a bit more favorably for them than England’s ones do. However, Denmark surprisingly has better offensive and defensive ratings than France.”
The final confounding factor in these events is the actual luck of the draw. Last World Cup saw powerhouses Argentina, France, Brazil, Uruguay, Belgium, and Portugal end up on the same side of the knockout bracket. That no doubt helped propel Croatia to the final on the other side. In Euro 2016, Portugal finished third in their group, didn’t play any of the favorites until the final, and got matched up with unheralded Wales in the semis. This year, they won’t be so lucky.
“France, Portugal and Germany’s odds are all depressed by being drawn in the same group,” Davis said. “This is compounded by the fact that it is unlikely that three teams will advance out of their group.”
In terms of predicting a winner, the betting markets don’t differ too much from either the 21st Group model or the one Davis and Robberechts built. Per the DraftKings sportsbook, France are the favorites at 4.8-to-1 despite their tough group draw, and then England, Belgium, Germany, Portugal, Italy, and Spain all somewhere between there and 9.5-to-1. No one else is above 15-to-1.
“In the end, most models tend to do pretty well, and the differences between the predictions are not very large”, Paserman said. “Who ends up winning is often due to random factors -- the ball is round and all that. Oh, and one thing that I learned early on: Once I did put a small amount of money on teams that my model predicted to do a lot better than the betting odds. That ended in a complete disaster, and I haven't done it since”.
A word of advice for Euro 2020 in 2021: Don’t let one bad experiment lead you to another.