About the Ratings - HEMA Ratings

The ratings at a glance

HEMA Ratings uses the Glicko-2 algorithm, created by Harvard statistics professor Mark Glickman, which is a modernization of the Elo system known from chess. You can read more about the algorithm on Glickman's website

When you look at the site, you will see two numbers, your weighted rating and your rank.

Your rank is simply where you are sorted in comparison to everyone else. If there are 1000 people in a given rating category and you're ranked at #200, all it means is that there are 199 people with a higher weighted rating thank you.

Your weighted rating consists of two numbers, your score and your deviation. These two numbers can be summarized as the algorithm saying "Based on your performance, this is how good I think you are (score) and this is how certain I am about this number (deviation)."

We subtract 2x your deviation from your score to create your weighted rating, which is the number we display on the website. This number can be summarized as the algorithm saying "Based on your performance, I am 97.5% certain you are at least this good."

Since most HEMA competitors have relatively few fights, we use the weighted rating to compensate. Instead of presenting a number that means "we think you're this good, but we are really uncertain" we use a number that means "we are very confident that you are at least this good".

When two people face off, the algorithm takes into account the score and deviation of both fighters and looks at the outcome of the fight. Fighters get rewarded for winning and punished for losing (duh), and the reward or punishment is bigger or smaller depending on the ratings difference between the two fighters. To put it simply: if you beat someone the algorithm didn't expect you to beat, you get a much bigger reward and vice versa. If you beat someone the algorithm expected you to beat, you get a much smaller reward and vice versa.

FAQ

How should I read the ratings?

The number you should care about is your weighted rating, rather than your rank. Your weighted rating is the number that says something directly about your performance ("I'm 97.5% certain you're at least this good") and your rank is k your weighted rating compared to everyone else.

If your weighted rating goes up but your rank goes down, all it means means is that some people climbed faster and higher than you

What does the "Confidence" column mean?

The confidence thermometer on the ratings overview is a way of showing the rating deviation. The lower the deviation, the higher the confidence.

Does where I place in a tournament affect my rating?

Not directly, although someone who wins the whole tournament is obviously likely to rise more than someone who gets knocked out in the pools. A fighter's rating is purely based on match results, calculated based on the rating of you and your opponent at the time you fought them and the outcome of the match. In other words, defeating a given fighter in the pools is just as valuable as defeating them in the final.

Imagine a scenario where two fighters both face off in the final fight, but one of the fighers has fought significantly tougher opponents on their way to the final. The HEMA Ratings algorithm will reward the fighter with wins against higher-rated opponents more than the fighter with the lower-rated opponents.

What is the win chance

The win chance displayed on the rated matches on your profile is an estimation of the probability that you'd win the fight. This number is based on both you and your opponent's rating and deviation at the time of the fight.

Your first few fights are likely to show 50% or close to 50% unless your opponent's rating is very high or very low, simply because it's hard to estimate how you'll perform when you're new to the ratings. As you compete and the system gets more confident about your rating, more and more of your fights will have a probability that's either significantly higher or lower than 50%.

Why did I rise/fall more/less than expected based on my tournament performance?

It's important to know a couple of things about how the ratings are calculated:

The system only cares about your individual fight performance relative to your opponents' rating.
As a result, the system doesn't know or care about your overall tournament placement. Defeating someone in the pools is just as valuable as defeating them in the final.

Beating fighters the system knows have performed well over time is much more valuable than beating fighters who either haven't performed well or who are new to the system. Likewise, losing to fighters who have performed poorly over time is going to affect your rating much more than losing to someone who's performed well or is new to the system.

Intuitively this should make sense. If you're a mid-level fighter you are expected to lose to the highest-rated fencer in the world, and that loss probably says very little about your odds of beating other mid-level fighters. Similarly, the highest-rated fighter is expected to defeat a mid-level and won't get a high point boost for performing as expected. Now, if the mid-level fighter was to beat the top-level fighter, they would both see that affect their rating much more significantly.

In essense, the system rewards or punishes you much more for performing against expectations.

You can get an idea about what the system expected from you based on the win probability of a given fight, which is shown on your fighter profile. Anything above 50% means the system thinks you "should" win, and the higher the number is the more certain it is about that prediction. Winning a fight you have a 90% chance of winning won't improve your rating much, nor will it punish your opponent much.

Why does my rating go down when I don't compete?

First, it's important to understand the fundamentals of the rating system we use.

As described above, in the Glicko rating system your rating actually consists of two numbers, your score and your rating deviation (RD).

The lower the RD, the less uncertain the system is about your performance. As you compete, your RD will normally go down, unless you perform very unevenly, I.E. losing to lower rated fighters and defeating higher rated fighters. When you don't compete, however, your RD will rise slightly every month because the system is becoming increasingly uncertain about where you actually belong.

Since the deviation is part of this weighted rating, the monthly increase in deviation will translate to a slight drop in rating for the months you don't compete.

Do scores matter?

The system neither knows nor cares about whether a win was 1-0, 10-0 or 10-9. While it would be possible to create an algorithm that takes score into account, the diversity of HEMA rule sets means that it would be an exercise in comparing apples to oranges.

Why is my Rank a "-" rather than a number?

This happens when a fighter is inactive, I.E. hasn't competed in a given rating category for 24 months.

What is "Island Effect"?

"Island Effect" is what happens when you have a rating category with little or no overlap between subgroups.

For example: imagine that there's a large group of active sabreurs in Norway, South Africa and Australia. All three scenes organize multiple tournaments over many years, but never travel abroad to compete with the two other nations. All scenes have a fighter who sticks out as the best beating everyone else in their country and ending up with a weighted rating of 2000.

The question now is, who's better, the Norwegian, the Australian or the South African sabre champion? The truth is that without "cross-polination" between the scenes it's impossible to know because the three scenes are essentially "islands" in the sea of sabre with independent ratings. It's possible that they're equally good, but it's just as likely that one scene is way ahead of the others, and you can't know which is which before there's crossover between the islands.

Why is there an island icon next to a fighter's rating?

It means that the fighter is on an island.

An island is defined as a group of fighters that is completely separated from the biggest group of fighters over the past 24 months. We determine this by taking every single fight in that rating category for the preceding 24 months, and constructing a graph with fighters as vertices and fights as edges. You can think of it as a complex spider web where each thread is a fight between two fighters. If a group of fighters form their own little spider web with no connection to the main web, they're considered to be on an island.

How can you combat Island Effect?

Travel!

The good thing about the algorithm we use for rating is that not everyone needs to fight everyone in order for the results to have an effect. If, for example, a few of the top rated fighters from an island travel to another island, they will either come back with a reduced (if their island was worse) or an increased (if their island was better) rating, which will in turn affect the fighters from their own scene. If the top three fighters from an island come back home 200 points lower after having taken a solid beating, but still beat everyone else on their island, everyone else on that island will also drop.