Mountains of a Metagame

How High is Mount Everest?

Anyone with an internet access can do a quick Google search and tell you that Mount Everest, the highest mountain on the planet, is somewhere around 8850 meters tall. However, let me ask this another way:

How high is Mount Everest for you?

If you were dropped at the foot of the mountain, how high could you climb? For the vast majority that number isn’t anywhere near 8850 meters.

Official expeditions to Mount Everest started in 1921. At that time, Nepal had closed its borders, so Everest could only be approached from the northern side in Tibet. On the first expedition, mountaineers George Mallory and Guy Bullock got up to 7000 meters before returning.

After three decades of advancements in both technology and technique, in 1953, sir Edmund Hillary and Tenzing Norgay began the ascent from the Nepalese side of the mountain and reached the top through the so-called South Col route. I would argue that there and then, for the first people in history, Mount Everest was 8850 meters tall.

A Metagame of Mountains

Now, imagine that a metagame in a Magic format is a range of mountains. Next, imagine that your win rate is equal to the altitude you can get to.

In theory, there is always a best deck to play, just like there will always be one mountain that is higher than the rest. Even with a perfect player and a perfect deck, there will always be a cap in how high your win rate can possibly be, just like the highest mountain has a summit from which you can’t climb any further. None of us are perfect players though, and we probably can’t reach any of the summits.

When you’re preparing for a tournament, your job is to find the highest altitude possible. Naturally, most people interpret this so that they should be trying to figure out which one of the mountains is Mount Everest and how to conquer it.

That’s a nice ideal to strive towards. However, I believe that most players go too far in trying to attain it, and more often than not they just end up wandering aimlessly in the mountains. I also believe that figuring out metagames is incredibly hard and that there are multiple traps that can easily ruin your whole testing process if you’re not careful. Many of these traps I have found out the hard way, and I hope that by sharing what I’ve learned I can help you tread through the treacherous terrain of testing with a tad less trouble.

What is a Good Matchup Anyway?

First of all, if you want to know what the best deck in the format is, you have to know what the metagame is going to look like and how all of the matchups play out.

There’s just one big problem: figuring out which deck is favored in a matchup is much more complicated than people give it credit.

For example, a couple years ago I was testing the Marvel vs. Mardu matchup with a friend of mine for an upcoming GP. In the first 10 matches we played, I crushed him from the Mardu side of the matchup 9-1. Then we switched decks, and in the next 10 matches I crushed him 9-1 from the Marvel side.

What conclusion am I supposed to draw from the results? Which deck is favored in the matchup?

We return to the heights of mountains in Magic being relative, not absolute. Mount Marvel was simply much higher for me than it was for my friend. Now, clearly the numbers are so imbalanced that I also got luckier on both sides, but it doesn’t change the fact that my sideboarding plan and understanding of the matchup from the Marvel side was much better than his.

During the first set of ten matches, I paid close attention to how the games played out, what the common patterns were and what my friend could’ve done differently to avoid getting blown out by some of my cards. Based on that, I devised a better sideboard plan that had major differences on the play and on the draw. By carefully analyzing what happened in the first half of the playtesting session and reacting accordingly, I gained a significant edge.

The most obvious example of deck strength relativity comes from mirror matches. One of the most misguided notions in the Magic community is that mirrors are supposed to be coin flips. That couldn’t be further from the truth! Do you think Carlos Romao won his World Champion title by flipping coins in the 《Psychatog》 mirror? No, he smashed his way through the tournament by having a better plan for the matchup than everybody else in the room. Instead of wasting counterspells on 《Fact or Fiction》s he saved them for the opponent’s 《Psychatog》s. His cards were, for the most part, the same as his opponents – he just had a better idea for how to use them.

Other times, changing even one card can give you a huge edge. On the first week of Standard Golos Ramp mirrors, Brad Nelson added a 《Fae of Wishes》 to his deck to tutor for a 《Jace, Wielder of Mysteries》, and didn’t drop a single game 1. The addition of that one innocuous card gave him inevitability over his opponents – at least until other players came up with even better techs for the matchup.

Another clear example is that whenever I start testing a new Modern deck, they invariably feel like they suck. I have been playing Dredge for so long that I have achieved a high level of proficiency with it, and my win rate with it is usually somewhere around 65-70% depending on variance and metagame positioning. Whenever I pick up a new Modern deck, my win rate is usually around 60%.

Even if the cap of the new deck would be higher than it would be for Dredge, it would still take me months of work to get to the same level of proficiency with that deck. In the short term, it’s almost always better for me to just play the deck I have so much experience with. There are exceptions, of course – last summer I ditched my 《Narcomoeba》s for Hogaaks, because that deck was just busted. But there’s no reason for me to switch to anything that isn’t potentially broken.

Another thing to note about the Mardu vs Marvel playtesting session is that drawing conclusions from small sample sizes is dangerous. They will almost always be too small to be reliable, unless you have the time to play hundreds of matches in each matchup. Streaks of good or bad luck on one side or the other can easily make any matchup seem much better or worse than it is in reality. Either one of the 10 match sets alone would have made the matchup seem lopsided on one side, but combining them tells a very different picture.

Streaks of bad luck aren’t that dangerous, as most of us are all too adept at identifying spots where we got unlucky. The good ones, however, are much more deceptive. As a wise man once said:

“It’s easy to confuse ‘What Is’ with ‘What Ought To Be’, especially when ‘What Is’ has worked out in your favor.”

— Tyrion Lannister

An important lesson that every good Magic player eventually has to learn as they become better is this:

Winning more than half the time doesn’t mean that you have a good matchup.

Choosing to play a bad deck based on positive results is a tale as old as tournament Magic itself. Not only because variance can be deceiving, but because there are differences in play skill, sideboarding tactics and understanding of the matchup. Someone like Luis Scott-Vargas (LSV) can have positive win rate on the MTG Arena ladder or in Magic Online (MTGO) leagues even with decks that are complete garbage.

Case in point, LSV was recently streaming Azorius Control in Pioneer and said “I think every matchup is good. I just, I have not been losing with this deck” (You can watch the stream here). While I’m not saying that Azorius Control falls to the complete garbage category, claims that it has a good matchup against everything should be taken with a healthy dose of skepticism.

I believe that most matchups in modern Magic are close enough that by playing better than your opponent you will win more than half the time, regardless of which side of the matchup you’re on. In ladder testing, if you keep losing a matchup you can be pretty sure that it’s bad, but winning isn’t necessarily a sign that it’s good.

It’s especially easy and common to fall into this trap if you’re a good player and you’re trying to find a way to beat the most popular deck. It’s easy to find evidence for something that you want to be true, even if (or especially when) it’s not optimal. The key word here is opportunity cost. If you’re fine with having a 55% win rate against the field, then by all means keep playing your pet deck. But if you put as much effort into learning the top deck, you could combine a high win rate in the mirror match with better matchups against the rest of the field. Just like with investments, a portfolio that has a positive interest rate is only optimal if you can’t get a better one elsewhere.

Similarly, the more intelligent you are, the more likely you are to hold on to erroneous beliefs instead of updating them. The problem with being smart is that you often win debates even when you’re wrong. Now, you might be a bit less likely to be wrong in the first place, but in the cases where you are, it’s easy to walk away from conversations feeling that you’re right even when you weren’t, and that you have no need to update your belief system.

Please Disagree with Me

Speaking of debates, one of the most important lessons I have learned over the years is this:

The more you disagree with someone, the more it means you should work with them.

When you have a deck that you want to play, the best way to practice a matchup is to play against someone who is at least as good with their own deck. The easiest way to find someone like that is to find someone who disagrees with you about who is favored in the matchup and how much. If you have trouble finding them, all you need to do is open up your Twitter feed. Regardless of the matchup in question, there’s always someone who has tweeted that the other deck is favored. Always.

Testing with that person will often take your understanding of the matchup to a whole new level. They’ll exploit cracks in your armour that you weren’t even aware of, while simultaneously protecting themselves from the angles of attack that you’re used to winning with. You might find out that the matchup isn’t favorable after all – or at the very least, not as favorable as you thought it was. In most cases, the opposing mountain has the potential to be much higher than you thought it did.

An example of this is before MC Barcelona last summer, a friend of mine played a lot of Izzet Phoenix and kept telling me how good 《Ravenous Trap》 is against Hogaak. Presumably my opponents at MC Barcelona had come to the same conclusions he did and thought that Traps and 《Surgical Extraction》s were enough to beat the deck.

However, at the actual event I cast multiple 《Thoughtseize》s against my Phoenix opponents and didn’t even take the Surgicals and Traps from their hands, because I simply didn’t care about them. I sequenced my self-mill cards so that they didn’t have priority until after Hogaak was already on the stack. After that they could Trap me all they want. Maybe their plan worked in MTGO leagues, but it didn’t work against MC-level opposition.

The authenticity of the dissent is crucial here. What often happens in testing is that you want to try your team’s Super Secret Weapon against a Boring Stock Deck, but the friend you’re testing against is also more excited about the Super Secret Weapon and kind of wants that deck to be good. They still oblige and play the Boring Stock Deck side, but they’re never going to give their 100% if they haven’t actually worked on the deck and don’t really like playing with it. Thus, they also won’t play well with it, and the matchup will seem to be better for the Super Secret Weapon than it really is.

There are ways to mitigate this to some degree: for example, we have sometimes tested matchups so that whenever you win a match with the Boring Stock Deck, you get to switch sides and pick up the Super Secret Weapon. That provides an incentive for the Boring Stock Deck player to try harder to beat the Super Secret Weapon so that they can get back to having fun with the more exciting deck.

But even though that might alleviate the problem a bit, it won’t solve it. By far the best thing you can do is to find someone who has put the hours in with the Boring Stock Deck, has an up-to-date version of it and genuinely likes playing with it. This is why madmen like Kasper Nielsen, who for some mysterious reason likes playing stock decks, are invaluable team mates even if they might not break formats very often with new, shiny decks.

Note that it’s not only about in-game play, the part about having an up-to-date version is perhaps even more important. When you’re testing on the MTG Arena ladder or MTGO leagues, you’re often playing against opponents who just copied their lists from last week’s tournaments and are still getting familiar with the decks. Many of those players haven’t adapted at all to the next step of the metagame, and beating those players is not the goal if you want to do well at a high-level tournament.

When the tournament you’re preparing for starts, you won’t be the only one who has modified your deck based on the latest results from other tournaments – everybody else will be as well. That’s why it’s important to get in practice against other players who have already taken the next logical steps.

What happens if you don’t? For example, right before the most recent Mythic Championship Qualifier Weekend, a pro player wrote an article about GW Adventures and how it beats the Food decks. And if you tested against the Food lists people played in the previous week’s events, maybe the Green-White deck did have a good matchup. However, in order to beat the Food mirrors, most of the Food players turned to black cards like 《Noxious Grasp》 and 《Massacre Girl》 for that weekend. Once the 《Oko, Thief of Crowns》 decks switched 《Aether Gust》 for sideboard quality removal spells in the main deck and had access to the blowout potential of 《Massacre Girl》, the matchup got a whole lot worse.

Also, whenever there’s a fresh format like Pioneer or new Standard, where all team members are starting from a clean slate, I think switching decks between sets of matches should be mandatory. Not only is it important for counterbalancing the disparities in player skill and matchup proficiency, it also leads to a more comprehensive understanding of the matchup for both players. When you’ve seen the matchup from each side, it’s easier to form opinions on what truly matters and how good specific cards are. Fresh perspectives can also result in better sideboard plans or generate ideas on how to improve the decks.

Seeing Through Variance

Even though variance means that you shouldn’t read too much from the numbers themselves, it doesn’t mean that results don’t matter at all. There are a few things you can do to separate the signal from the noise. For the first one, I’m going to repeat myself:

The more you disagree with someone, the more it means you should work with them.

This isn’t only a good habit because of disparities in matchup proficiency, but also because discussions can reveal how differently matches can play out. What you may have found to be common play patterns in your 20 matches might not have occurred at all in the other person’s matches. The key thing to focus on is why a deck would be favored in the matchup. How likely are the play patterns that give one deck the advantage? How about the play patterns that give the other deck an advantage? Are there ways to avoid those situations or play around them?

Maybe you have approached the matchup from completely different directions. Even in the case of Mount Everest, the breakthrough only came in the 1950s when Nepal opened its borders and climbers were granted permission to ascend from the southern side of the mountain.

Sometimes this requires a bit of math. The hypergeometric calculator is a nice tool for figuring out how common it really is for Pioneer Mono-Green to have an Elf on turn 1, or how likely it is to hit an 《Ulamog, the Ceaseless Hunger》 from a 《Aetherworks Marvel》 activation. Frank Karsten’s articles are also a great resource. If you happen to have any coding skills, it’s fairly simple to make some simulation scripts to solve more complicated odds even if you can’t do the math properly.

Another framework that I’ve found particularly useful is to try to establish the range of your deck’s draws from the worst ones to the nut draws, and then analyze the relative strengths of your draws to your opponent’s. You don’t have to be able to beat their nut draws, but if you only win when you get one yourself that’s a pretty bad sign. Or if your bottom 30% draws beat up their top 30% draws, that’s a pretty good one.

I remember a test session with Marvel where I played against Zombies, and I kept churning out turn 4 Ulamog’s in way more than half of the games. While it was fun (at least for me), I also knew that the test session would probably be more harmful than useful, as it didn’t provide relevant matchup data but would still skew my feelings towards liking Marvel more than I should. It was almost the same with Hogaak, until we eventually came to the conclusion that the deck is actually just broken and that turn 2 Hogaak wasn’t the nut draw, it was the norm. If I recall correctly, Frank Karsten posted an article that the deck gets turn 2 Hogaak about 60% of the time, which is just insane.

Establishing the range of draws was particularly helpful for me during the Hogaak Mythic Championship. On the day decklists were due, I had quite a bit of experience under my belt with Hogaak, but the post-Bridge versions hadn’t actually been particularly good for me. I was still considering whether I should just play Dredge. But after talking to my team mates about the deck, watching their matches and doing some math, I came to the conclusion that even my large-ish sample still wasn’t big enough and that my experiences were skewed towards the lower end of the deck’s range. Thus, I decided to trust my team mates and submitted Hogaak, which in retrospect I’m very glad I did.

Even if you do all of this, it will still be hard to figure out which decks are favored in matchups, but that’s fine. If the matchup seems complex, it’s probably close enough that it doesn’t really matter. Matchups like that are precisely the ones where the better player will be advantaged, and where it’s important to really understand the play patterns and how to break them. To me, the most important takeaways from testing sessions aren’t the results or who’s favored, it’s figuring out how to get higher on the mountains either by playing better or by making better versions of the decks. The fact that even small changes in a deck can significantly alter a matchup reduces the importance of results even further.