Table of contents
What's a rating system?
A rating system is a method for assessing a player's strength in games of skill. We strongly believe historical fencing is a game of skill.
We use the Glicko-2 algorithm, created by Harvard statistics professor Mark Glickman. You can read more about the algorithm on Glickman's website or in the "geek stuff" section below.
You can read more about the algorithm further down under the heading "geek stuff"
What can you use this rating data for?
What you get out of it is up to you, but we've identified three main uses of the ratings:
- Seeding tournaments. By using the ratings it's possible to get fairer pools that offer a more even tournament experience for all competitors.
- Tracking your individual progress. It can be hard to get an objective measure of how well you do in a given tournament. Maybe you always make it to the quarter final before being knocked out by the eventual winner of the whole tournament? By using the ratings you can still track your progress in a relatively objective way.
A fighter's rating is purely based on match results, calculated based on the rating of your opponent at the time you fought them and the outcome of the match. The rating system has no clue about how close matches were or how well you fenced.
Do the ratings say anything about my skill or value as a researcher/coach/person
I've discovered a mistake in your data
Please get in touch with us through our contact page and we'll try to get is sorted out.
You should add [awesome feature]
Thanks for the suggestion! We're slowly adding new features, but since this is just a hobby project we have limited time to do all the cool stuff we'd like.
I want to help!
Great! We're always looking for people who can submit data from tournaments, help improve the quality of the data we have in the system, or simply to reach out to tournament organizers to ensure that their results get added to the ratings. Get in touch with us at our contact page and we'll figure out how we can work together.
If you want to support us you can consider donating to our Patreon.
Where does your data come from?
We graciously use data provided to us by HEMA tournament software such as HEMA CM, HEMA Scorecard and a few other bespoke systems. The initial dump of a few thousand matches from HEMA CM is what got us started on this project in the first place.
Furthermore, we have had awesome help from many helpers who have either collected the results from events, organized events, recorded results from videos, provided us with paper records from old tournaments, etc.
Privacy and use of data
What do you use this data for?
The main thing we use the data for is what you can see here on the web site. We will also post about new events on our Facebook page, and sometimes we will post other interesting analyses there.
Who do you share data with?
We sometimes share data with people looking to understand the data better to learn something about competitive HEMA, such as Sword STEM. When we do this the data is anonymized in one form or another so that individual fighters can't be identified.
We may also ask people outside of the core team to help us with data analysis, such as looking at the "island effect" (see below), in which case they will get a limited view into the data we have available.
Finally, we may share data with software vendors in order to help improve their and our data. In that case, it will be data points like "Mike Fencerton is the same person as Michael Fencerton" or "These fighters have these IDs in HEMA Ratings".
I don't wish to be on HEMA Ratings
Please get in touch with us through our contact page and we'll get it sorted out as soon as possible.
What tournaments do you accept into the system?
The guiding principle is that it needs to be a real and serious HEMA tournament that's open to all applicable fighters.
We've established some guidelines:
- It needs to be HEMA. No sports fencing, LARPing, reenactment, etc.
- There must be members from at least two different, independent clubs competing. Chapters, sister clubs, similar arrangements are not counted as individual clubs.
- The event should in be open to all eligible fighters. We will obviously allow beginner tournaments, certain invitational tournaments, federation-sanctioned national championships and similar events, but we will exclude tournaments that aren't open to fencers of a given club, federation, etc. One goal of HEMA Ratings is to integrate the community, not to create "islands".
- It needs to be judged/reffed. While a judged competition that permits some self-calling is allowed, tournaments that exclusively rely on the "Honour System" as their primary scoring methodology do not meet our criteria.
- The outcome of the individual matches must be distillable into a win/loss/draw format. Double losses or other "non-standard" outcomes can be included, but they won't count towards the fighters' rating.
- Each match needs to symmetrical and stand on its own. This means that points can't "carry over" between matches, no fighters can start with more points than the other, one fighter can't be disadvantaged with a different scoring system, etc.
- For rating purposes the weapons need to be symmetrical. No dagger vs. pollaxe or similar. That being said, we can still import mixed weapons tournaments to put them in the fighter's record.
The criteria are a work in progress, but you should be able to get the "spirit of the law" from what's written above. If you're unsure if your tournament can be added, don't hesitate to ask.
If your tournament meets the criteria and you can provide us with the data, we'll be happy to include it in the system. We have a template for data entry that we're happy to send you, so get in touch and we'll send you everything you need.
What happens to tournaments that are submitted with errors or don't meet the criteria?
Most errors we get are due to data entry errors, spelling errors, or other simple human error and these are normally easily resolved by speaking with the organizers and figuring out what happened.
On rare occasions we have had to delete or reclassify a tournament due to a misunderstanding of the criteria or the weapon divisions.
If it turns out that submitted events don't meet the criteria we will reach out to the organizer and/or submitter and try to figure out what happened. If we find out that the results were submitted with malicious intent (I.E. cheating) we will delete the results and go public about having done so and why.
Okay. I get what a rating is. How does this work?
The key assumptions here are at work are the following:
The performance of each player in each match is a normally distributed random variable. Although a player might perform significantly better or worse from one game to the next, we assume that the mean value of the performances of any given player changes only slowly over time.
Performance can only be inferred from wins, draws and losses. Therefore, if a player wins a game, he is assumed to have performed at a higher level than his opponent for that game. Conversely if he loses, he is assumed to have performed at a lower level. If the game is a draw, the two players are assumed to have performed at nearly the same level.
Suppose two players, both rated 1700, played a tournament game with the first player defeating the second. Suppose that the first player had just returned to tournament play after many years, while the second player plays every weekend. In this situation, the first player’s rating of 1700 is not a very reliable measure of his strength, while the second player’s rating of 1700 is much more trustworthy.
Our intuition tells us that that:
- the first player’s rating should increase by a large amount because his rating of 1700 is not believable in the first place, and that defeating a player with a fairly precise rating of 1700 is reasonable evidence that his strength is probably much higher than 1700
- the second player’s rating should decrease by a small amount because his rating is already precisely measured to be near 1700, and that he loses to a player whose rating cannot be trusted, so that very little information about his own playing strength has been learned.
What does this mean in practice?
If you win fights, you gain points. If you lose fights, you lose points. If you perform better than expected (win against higher rated opponents), you will gain a lot of points. If you perform worse than expected, you may lose a lot of points.
As you compete your rating deviation gets smaller as the rating system has a more accurate estimate of your rating. If you don't compete your rating deviation will grow.
Why does my rating go down when I don't compete?
First, it's important to understand the fundamentals of the rating system we use.
In the Glicko rating system your rating actually consists of two numbers:
- Score - This is the system saying "I think you're this good ..."
- Rating Deviation (RD) - This is the system saying "... and this is how confident I am that I'm right". One RD is equivalent to one standard deviation.
The lower the RD, the less uncertain the system is about your performance. As you compete, your RD will normally go down, unless you perform very unevenly, I.E. losing to lower rated fighters and defeating higher rated fighters. When you don't compete, however, your RD will rise slightly every month because the system is becoming increasingly uncertain about where you actually belong.
Since most HEMA practitioners don't compete that much, everyone has a relatively high deviation, meaning that the system is never very sure about your performance. In order to smooth out some of the error that comes with this, we've decided to implement a "weighted rating", which is the score - 2 * rating deviation. This is essentially the same as saying "I'm 97.5% confident that your score should be at least this high".
Since the deviation is part of this weighted rating, the monthly increase in deviation will translate to a slight drop in rating for the months you don't compete.
Why do you have separate lists for different weapons? Why do you have separate lists for Men and Women?
The ranking systems assumes all fighters in the list could face each other in any given tournament, and that past performance is predictive of future performance. We don't believe past performance in a Rapier tournament is a strong indicator of how well someone would do in a Longsword tournament.
What is "Island Effect"?
"Island Effect" is what happens when you have a division with little or no overlap between subgroups.
For example: imagine that there's a large group of active sabreurs in Norway, South Africa and Australia. All three scenes organize multiple tournaments over many years, but never travel abroad to compete with the two other nations. All scenes have a fighter who sticks out as the best beating everyone else in their country and ending up with a weighted rating of 2000.
The question now is, who's better, the Norwegian, the Australian or the South African sabre champion? The truth is that without "cross-polination" between the scenes it's impossible to know because the three scenes are essentially "islands" in the sea of sabre with independent ratings. It's possible that they're equally good, but it's just as likely that one scene is way ahead of the others, and you can't know which is which before there's crossover between the islands.
How can you combat Island Effect?
The good thing about the algorithm we use for rating is that not everyone needs to fight everyone in order for the results to have an effect. If, for example, a few of the top rated fighters from an island travel to another island, they will either come back with a reduced (if their island was worse) or an increased (if their island was better) rating, which will in turn affect the fighters from their own scene. If the top three fighters from an island come back home 200 points lower after having taken a solid beating, but still beat everyone else on their island, everyone else on that island will also drop.