Judging the Judges: A General Framework for Evaluating the Performance of International Sports Judges
The monitoring of judges and referees in sports has become an important topic due to the increasing media exposure of international sporting events and the large monetary sums involved. In this article, we present a method to assess the accuracy of sports judges and estimate their bias. Our method is broadly applicable to all sports where panels of judges evaluate athletic performances on a finite scale. We analyze judging scores from eight different sports with comparable judging systems: diving, dressage, figure skating, freestyle skiing (aerials), freestyle snowboard (halfpipe, slopestyle), gymnastics, ski jumping and synchronized swimming. With the notable exception of dressage, we identify, for each aforementioned sport, a general and accurate pattern of the intrinsic judging error as a function of the performance level of the athlete. This intrinsic judging inaccuracy is heteroscedastic and can be approximated by a quadratic curve, indicating increased consensus among judges towards the best athletes. Using this observation, the framework developed to assess the performance of international gymnastics judges is applicable to all these sports: we can evaluate the performance of judges compared to their peers and distinguish cheating from unintentional misjudging. Our analysis also leads to valuable insights about the judging practices of the sports under consideration. In particular, it reveals a systemic judging problem in dressage, where judges disagree on what constitutes a good performance.
READ FULL TEXT