Elo vs. Glicko-2 vs. TrueSkill: The Math Behind Modern Matchmaking
As competitive gaming shifts from local tournaments to global digital arenas, the algorithms that rank players have evolved from simple single-number systems to complex Bayesian probability curves.
By Factlen Editorial Team
- Statistical Innovators
- Focus on confidence intervals and volatility to track rapid skill changes.
- Multiplayer Engineers
- Require systems that can handle teams, free-for-alls, and massive player bases.
- Traditional Competitors
- Value transparency and simple, zero-sum point exchanges.
What's not represented
- · Casual gamers who prefer unranked play
- · Game designers balancing queue times versus match quality
Why this matters
Understanding how these algorithms evaluate your performance helps demystify the often-frustrating experience of competitive matchmaking, revealing exactly why your rank moves the way it does after a win or loss.
Key points
- Matchmaking algorithms have evolved from single-number systems to complex probability distributions.
- The classic Elo system is transparent and self-correcting but struggles to quickly rank new players.
- Glicko-2 introduces 'uncertainty' and 'volatility' metrics to rapidly adjust ratings during streaks or after inactivity.
- Microsoft's TrueSkill uses Bayesian inference to handle team-based games and massive free-for-alls.
- Modern systems require significantly fewer games to find a player's true skill level, eliminating the traditional 'grind'.
Matchmaking is the invisible engine of modern competitive play. Whether you are queuing up for a quick game of online chess or dropping into a massive multiplayer arena, an algorithm is quietly calculating your worth. The goal is always the flow state: matching you with opponents just skilled enough to challenge you, but not so dominant that they crush your spirit.[6]
Achieving this balance requires translating human ability into mathematics. Over the last sixty years, the algorithms tasked with this translation have evolved from simple, single-number ledgers to complex probability distributions. Today, three major systems dominate the landscape of competitive ranking: the classic Elo system, the highly responsive Glicko-2, and Microsoft’s team-focused TrueSkill.[6]
The grandfather of them all is the Elo rating system, developed in 1960 by physics professor and chess master Arpad Elo. Designed to replace the flawed Harkness system, Elo introduced a rigorous statistical foundation to competition. It assumes that a player’s performance in any given match is a normally distributed random variable, and their rating represents the mean of that distribution.[1]
The mechanics of Elo are elegantly simple. It is a zero-sum exchange: if a player with a 1200 rating defeats a player with a 1600 rating, the underdog takes significantly more points than if the roles were reversed. The system is entirely self-correcting over long periods; if your rating is artificially high, you will eventually lose to lower-rated players and bleed points until you reach your true level.[1]

In a side-by-side trade-off analysis, the case for Elo rests heavily on its transparency. The 'for' argument highlights its mathematical simplicity—players can easily calculate their expected point gains or losses before a match even begins. The evidence for its enduring utility is clear: the World Chess Federation (FIDE) still uses Elo to rank the greatest grandmasters on Earth.[1]
However, the 'against' argument for Elo exposes a critical flaw for digital gaming: it treats all ratings as equally precise. The algorithm does not know if a player has logged ten games or ten thousand. Because it lacks a mechanism to measure its own uncertainty, Elo is notoriously slow to adjust when a player rapidly improves or when a highly skilled player opens a brand-new account.[1][2]
This sluggishness birthed the concept of 'Elo hell,' where players feel trapped in a low rank, requiring hundreds of games to grind their way out. To solve this, Harvard statistics professor Mark Glickman developed the Glicko system in 1995, and later refined it into Glicko-2 in 2000. Glickman realized that a single number was insufficient; the system needed to know how confident it was in that number.[2]
Glicko-2 introduces two new dimensions to a player's profile: Rating Deviation (RD) and Volatility. Rating Deviation measures the system's uncertainty. If you play every day, your RD shrinks, and your rating stabilizes. If you take a six-month break, your RD expands, meaning the system will allow your rating to swing wildly once you return until it re-establishes your baseline.[2]
Volatility, the unique addition of Glicko-2, measures consistency. If a player suddenly starts stringing together massive upset victories—perhaps they spent a month studying strategy, or perhaps they are a highly skilled player on an alternative account—the system detects this erratic performance. It spikes their volatility, which in turn increases their RD, allowing their rating to skyrocket to its proper place in a fraction of the time Elo would require.[2]

Volatility, the unique addition of Glicko-2, measures consistency.
When evaluating Glicko-2, the primary 'for' argument is its speed of convergence. By mathematically quantifying its own ignorance, Glicko-2 can place a new player in their correct skill bracket with remarkable efficiency. The 'against' argument centers on its complexity; the math is opaque to the average player, making the exact point exchanges feel somewhat arbitrary.[2][6]
The evidence supporting Glicko-2 is overwhelming in the realm of one-on-one digital competition. Major platforms like Lichess and various competitive esports have adopted it specifically because it boasts significantly better predictive accuracy than classic Elo. It effectively eliminates the grind, rewarding sudden improvement and punishing inactivity appropriately.[3]
Yet, both Elo and Glicko-2 share a fundamental limitation: they were designed for zero-sum, head-to-head, two-player games. When the video game industry exploded with team-based shooters and massive free-for-alls, these pairwise systems broke down. How do you distribute rating points when a team of four defeats another team of four, especially when one player carried the team while another contributed nothing?[4]
To solve the multiplayer problem, Microsoft Research developed TrueSkill in 2005 for the launch of Xbox Live. TrueSkill abandons the traditional point-exchange model entirely in favor of Bayesian inference. It models every player's skill as a Gaussian bell curve, defined by a mean (Mu, the perceived skill) and a standard deviation (Sigma, the uncertainty).[4]
When a 4v4 match occurs, TrueSkill does not look at individual duels. Instead, it mathematically adds the four bell curves of Team A together to create one massive team skill distribution, and compares it to the combined bell curve of Team B. After the match, it updates every individual player's Mu and Sigma simultaneously based on the final standings, adjusting for the statistical likelihood of the outcome.[4]

For TrueSkill, the 'for' column is dominated by its unparalleled flexibility. It natively handles teams of varying sizes, uneven matches (like 3v4), and multi-team free-for-alls. The 'against' argument is its computational weight and proprietary nature; while the math is patented by Microsoft, the sheer processing power required to run Bayesian updates for millions of players simultaneously is non-trivial.[4]
The evidence for TrueSkill's efficacy is its widespread adoption across the industry. It powers the matchmaking for massive franchises like Halo and Gears of War, and its underlying Bayesian principles have been adapted by games like World of Warcraft, which transitioned away from Glicko-2 to better handle the complexities of arena team combat. TrueSkill is remarkably fast, requiring only about 46 games to accurately rank a player in an 8-player team environment.[4][5]
Ultimately, choosing the right algorithm requires understanding the environment. The classic Elo system fits well when simplicity, legacy, and transparency are paramount, such as in over-the-board chess or local club tournaments. It does not fit well in fast-paced online environments where players expect immediate and accurate placement.[6]
Glicko-2 fits well when the competition is strictly one-on-one and the platform needs to rapidly identify true skill while accounting for player inactivity. It is the gold standard for digital chess, fighting games, and solo competitive ladders. It does not fit well when the core gameplay relies on large, dynamic teams.[6]

TrueSkill fits well when matchmaking involves complex multiplayer dynamics, varying squad sizes, or free-for-all structures. It thrives in the chaos of modern esports, where predicting the outcome of a match requires untangling the combined efforts of multiple players. It does not fit well for simple pairwise comparisons where the computational overhead of Bayesian inference is unnecessary.[6]
The journey from Elo to TrueSkill mirrors the evolution of gaming itself. As competition shifted from quiet tournament halls to global, 24/7 digital arenas, the math had to adapt. By embracing uncertainty and probability, modern matchmaking ensures that the invisible engine keeps running, keeping players challenged, engaged, and always chasing the next fair fight.[6]
How we got here
1960
Arpad Elo develops the Elo rating system to improve upon the Harkness system in chess.
1995
Mark Glickman introduces the Glicko system, adding a 'rating deviation' metric to measure uncertainty.
2000
Glicko-2 is published, introducing a volatility metric to account for sudden changes in player skill.
2005
Microsoft Research develops TrueSkill to handle team-based matchmaking for the launch of Xbox Live.
Viewpoints in depth
Traditionalists
Advocates for the simplicity and transparency of the Elo system.
Proponents of the classic Elo system, including many over-the-board chess federations like FIDE, argue that a rating system must be easily calculable and transparent to the players. In this view, a player should be able to look at a simple table, see their opponent's rating, and know exactly how many points are at stake before the match begins. They view the hidden variables of modern systems—like uncertainty and volatility—as opaque mechanisms that can make players feel disconnected from their own progression.
Modern Matchmakers
Engineers prioritizing rapid convergence and player retention.
For developers of digital games, the priority is getting a new player into a fair match as quickly as possible. This camp argues that Elo is fundamentally broken for online environments because it requires dozens of games to find a player's true skill, during which time the player is either being crushed or ruining the game for lower-level opponents. They champion Bayesian systems like Glicko-2 and TrueSkill because these algorithms mathematically quantify their own ignorance, allowing them to make massive rating adjustments early on and lock in a fair rank in a fraction of the time.
What we don't know
- How proprietary tweaks to these algorithms by individual game studios affect long-term player retention.
- Whether future algorithms will incorporate in-game biometric data or hardware latency into skill calculations.
Key terms
- Zero-sum
- A system where any points gained by the winner are exactly equal to the points lost by the loser.
- Rating Deviation (RD)
- A measure of how uncertain the system is about a player's true skill; a higher RD means greater uncertainty.
- Volatility
- A metric in Glicko-2 that measures the degree of expected fluctuation in a player's performance over time.
- Bayesian Inference
- A statistical method that updates the probability of a hypothesis as more evidence or information becomes available.
Frequently asked
Why do I lose more points than I gain?
If you lose to a player with a lower rating than yours, the system penalizes you more heavily because you were statistically expected to win.
Why does my rank reset or decay?
Modern systems increase your 'uncertainty' metric when you stop playing. When you return, your rating may fluctuate more wildly until the system is confident in your current skill.
Is 'Elo hell' a real mathematical concept?
In pure Elo systems, it can be. Because Elo lacks an uncertainty variable, a highly skilled player stuck in a low rank must grind through a massive number of games to slowly pull their rating up.
Sources
[1]WikipediaTraditional Competitors
Elo rating system
Read on Wikipedia →[2]Glicko.netStatistical Innovators
The Glicko-2 System for Rating Players in Head-to-Head Competition
Read on Glicko.net →[3]LichessStatistical Innovators
Rating Systems
Read on Lichess →[4]Microsoft ResearchMultiplayer Engineers
TrueSkill Ranking System
Read on Microsoft Research →[5]TrueSkill.orgMultiplayer Engineers
TrueSkill: The video game rating system
Read on TrueSkill.org →[6]Factlen Editorial TeamMultiplayer Engineers
Synthesis by Factlen editorial team
Read on Factlen Editorial Team →
Every angle. Every day.
Get meta stories with full source coverage and perspective breakdowns delivered to your inbox.







