Using Hypergeometric Distribution to Improve Your Decisions

Foreword

I wrote most of this article prior to the publication of Frank Karsten’s article on the same topic. I’d recommend you start there, as he explains the basics of the hypergeometric distribution much more lucidly and extensively than I have. However, I believe I still have some insight to add to the discussion. While there’s definitely some overlap between our articles, I hope you’ll still find some new perspectives and applications here.

Introduction

In this article, I’m going to explain what the hypergeometric distribution is, how to use it, and go over some basic applications in Magic. If you haven’t heard the hypergeometric distribution before, it’s the single most important statistical concept in Magic: the Gathering, as well as every other card game, and this article will mark a significant turning point in your theoretical understanding of the game.

The Hypergeometric Distribution

In technical terms, the hypergeometric distribution describes the probability of getting k successes in n draws without replacement from a population of size N with K objects that constitute successes.

If that sounds intimidating, imagine a giant jar with a bunch of balls in it. Some of the balls are red and some of the balls are blue. Let’s say we’re happy if we get a red ball and unhappy if we get a blue one. The hypergeometric distribution tells us the probability we get a certain number of red balls when we take a given number of balls out. We could use it see how our chances improve when we take more balls out, when we put more balls in to begin with, and so on.

Now imagine we have a deck of cards, and some of those cards are spells and some of those cards are lands. Or let’s say some of those cards win us the game, and most of them don’t.

Using the Hypergeometric Distribution

Before we dive into applications, I should explain how to look up the hypergeometric distribution. The easiest way is with an online calculator that gives you the numbers for specific k, n, K, and N. This is my preferred calculator. The interface looks like it’s from the 1990s, but it’s clean and it does the job.

If you prefer Excel or Google Sheets, the formulas there are HYPGEOM.DIST(k, n, K, N) and HYPGEOMDIST(k, n, K, N), respectively. (I assume the omission of the period gets around some copyright law.)

A slightly more involved way is to use scipy.stats in Python. To do this by hand, you need to install Python and then install Scipy (‘pip install scipy’). Then you can start a Python instance (type ‘python’ into the command line), import the Scipy stats package (‘import scipy.stats as sps’), and run ‘sps.hypergeom.pmf(k, N, K, n)’ to get the probability of getting exactly k successes and ‘sps.hypergeom.cdf(k, N, K, n)’ to get the probability of k or fewer successes.

I mention this because Python makes it easy to automate lookups. Scipy also has a ton of other useful functions for working with the hypergeometric distribution, which you can find in the documentation. (Note that the documentation uses different variables than I’ve been using, but the concepts are the same.)

Application 1: Lands in Limited

A Limited Deck with 17 Lands

If you have 17 lands in your limited deck, how likely are you to hit your 4th land drop? In this case big N is 40, the number of cards in your deck. Big K is 17, the number of lands. By turn 4, on the play, you’ll have seen 7 + 3 = 10 cards, so little n is 10. Lastly, little k is 4, the number of lands you’d like to draw. If we punch these numbers into Stattrek, we get that the probability that X >= 4 is 0.707 = 70.7%. That means you’ll miss your 4th land drop in limited 3 games out of 10. If you’ve never done this math before, you might find this number surprising.

What about 3 lands? To get that number, we just need to adjust k to 3. Things are much rosier there: P(X>=3) = 0.904. That means we’re over 90% to hit our third land drop.

Now, what happens if we’re on the draw and see an extra card? Then n = 11, and the probability we hit our third land drop is 0.942 and the probability we hit our fourth land drop is 0.799. The impact of the extra card is dramatic: you’re 9.7% more likely to play your 4th land on curve.

A Limited Deck with 18-19 Lands

How big is the impact of playing an 18th land? That pushes big K to 18 and increases your chances of hitting your 4th land drop to 0.767 on the play. A solid increase, but not as big as the impact of being on the draw.

These numbers tell us that we should seriously consider choosing to draw in limited, that we can board out lands on the draw, and that we should lean heavily towards 2- and 3-mana plays. Still, don’t freak out about these numbers. Being conservative decreases your probability of a dysfunctional draw, but also decreases the efficacy and frequency of your good draws.

If you play 19 lands to bump your chances of hitting your 4th land drop to 0.819, you’re also greatly increasing the chance you’ll draw more lands than your opponent in a close game. If you take the draw every game, your opponent will hit their curve 10-20% of the time and get a free win. When you hit your curve, your opponent will have a much better chance of stabilizing.

There are no easy answers here. Still, knowing the numbers lets you make better decisions. Instead of dismissing missing your 4th land drop as bad luck, think seriously about the optimizations you could be making.

Application 2: Keep or Mulligan?

Let’s say you have the following hand in Guilds of Ravnica limited, on the play.

Should you keep or mulligan? Intuition says keep, because if you draw any land in your first two draw steps then your turn 3 《District Guide》 will power out a turn 4 《Rosemane Centaur》. If you miss your third land drop though, there’s a reasonable chance you’ll lose the game without playing a spell. What does the math say?

In this case, assuming we’re a typical 17-land deck, then N is 33 (the cards remaining in our deck), K is 15 (the remaining lands), k is 1 (since we only need 1 land), and n is 2 (the number of draw steps before our 3rd turn). The hypergeometric distribution tells us we’re 71.0% to draw another land before our fourth turn. Since our hand is good when we hit and our chances of hitting are solid, it’s clear from the math we should keep as well.

What if instead of a 《Forest》, we had a 2nd 《Plains》?

That would mean we need to draw exactly a 《Forest》 to be able to cast our 《District Guide》 (or any of our spells, for that matter). If we have 9 《Forest》s in our deck, that reduces k to 9 and our chances of hitting to 47.7%. That’s a pretty steep drop. If we believed we needed to hit a 《Forest》 to have any chance of winning, then we should probably mulligan.

In practice, I believe that this is still a close decision because mulliganning in limited is so costly and because we may still have a chance to win even if we miss on a 《Forest》 for a turn or 2. We could also draw some cheap white spells to buy time, and even drawing a 《Plains》 will increase our number of acceptable draws significantly. The decision is still highly contextual and you should use your intuition to determine exactly how important the math is. Again, the numbers should be an important component of your decision-making, but not the only component.

Application 3: Playing to Your Outs

Let’s say you’re late into a game of limited. The board is messy, but you have a couple more creatures than your opponent. You determine that even if your opponent blocks with all their creatures, you’ll deal them exact lethal. However, if your opponent has a removal spell, a bounce spell, or a flash blocker, they’ll survive your attack and be able to win the game on the crack back. Should you attack?

There are numerous factors at play here, most of which are beyond the scope of this article. To make this example tractable, I’m going to make a number of simplifying assumptions.

The first is that if you don’t take this attack, you’ll never get another attack that’s nearly as favorable for you. (Let’s say they have a full grip of cards, but only 3 lands untapped.)

The second is that your opponent will give you exactly 3 draw steps before she wins if you don’t make this attack, and she has no interaction for the 2 cards in your deck that will win the game on the spot. (Let’s say she has a big flier, you have 2 lethal 《Lava Axe》s in your deck, and she isn’t playing any counterspell.)

With these assumptions, you are exactly trading the probability you win on this attack for the probability you hit one of your 2 outs in 3 draw steps.

Then if you have 22 cards left in your deck (N), your 2 outs (K) in 3 draw steps (n) represent a 26.0% chance of winning the game if you pass (since you need to draw at least k=1 of your outs). If you think your opponent is less than 26% to have interaction, then you should attack and try to win the game on the spot. If you believe she’s more than 26% to have interaction (perhaps because she has so many cards in hand), then you should wait and try to hit your outs.

Decisions of this nature are some of the most complicated available in Magic, and I could write an entire article just on all the subtleties at play in them. But the hypergeometric distribution can and does guide my thinking in these spots to this day. Any time you risk losing the game, what you give up is exactly your chances of drawing a card or series of cards that will win you the game. If you can’t think of any outs, you should make attacks in spots like these even if you think it’s extremely likely your opponent has interaction because the value of your outs approaches zero. If you can think of a bunch of outs, then you should correspondingly play more conservatively.

The hypergeometric distribution lets you know exactly how valuable your outs are.

Application 4: 《Hollow One》 and 《Burning Inquiry》

I’d like to close on a slightly more involved example: How likely are you to cast a 《Hollow One》 when you play a 《Burning Inquiry》 on turn 1 with a 《Hollow One》 in hand?

Figuring this out requires answering a bunch of smaller questions, all of which we can consult the hypergeometric distribution one. Firstly, after you draw 3 cards, what are the probabilities you end up with 1, 2, 3, or 4 《Hollow One》s in hand? Secondly, given you’ve drawn a certain number of 《Hollow One》s, how likely are you to keep at least 1 after discarding to 《Burning Inquiry》? Or, better, how likely are you to end up with n copies for n = 0, 1, 2, 3, 4?

Doing this by hand would require at least 24 hypergeometric queries, which is the perfect setting for writing a script instead. I’ll leave interpreting the script as an exercise to the reader.

An Opening Hand with 1 《Hollow One》 on the Play

Starting with 1 《Hollow One》 in hand, we get the following output:

The numbers on the second line are the probabilities we end up with n 《Hollow One》s, where n is the index of the entry. So there’s a 33.07% chance we end up with 0, a 60.87% chance we get to cast 1, and so on.

Clearly, the numbers aren’t great. We’ll pass without a play around a third of the time, and there’s only a 5.95% chance to spike and wind up with 2 《Hollow One》s in play. The probabilities we play more than 2 are close to 0.

An Opening Hand with 2 《Hollow One》s on the Play

What happens if we start with 2 《Hollow One》s?

These numbers are much rosier. We’ll end up with 1 or more 《Hollow One》s over 90% of the time, and we even have a 2.04% chance of playing 3 and essentially ending the game on the spot.

An Opening Hand with 1 《Hollow One》 on the Draw

How does being on the draw change things?

In this setting, the primary benefit of being on the draw is having an extra card in hand to protect our 《Hollow One》s, so it makes sense that the difference isn’t large. Still, the chance we cast at least 1 《Hollow One》 improves by around 3.9%.

Overall, these numbers support the conventional wisdom of playing 《Flameblade Adept》 instead of 《Burning Inquiry》 when you only have 1 《Hollow One》 but not when you have 2. Although 《Flameblade Adept》 is more vulnerable to removal than 《Hollow One》 and you expose your 《Burning Inquiry》 to a discard spell by holding onto it, the 33% chance of doing nothing is an unacceptable risk.

However, when you have 2 《Hollow One》s, the upside of deploying 2 or more of them as well as the significantly reduced probability of playing 0 pushes the decision in the other direction. It’s worth noting that even when you start with 2 《Hollow One》s in hand, you’ll still only cast 1 most of the time.

A Word of Caution

Astute readers will have noted that using the hypergeometric distribution as I’ve described in this article often makes a number of simplifying assumptions. For example, I don’t take into account mulligans or card selection, and I weight draws that are entirely lands the same as draws that contain exactly 4. As always, it’s important to recognize that statistical tools, particularly useful ones, are rarely perfect. Nonetheless, these numbers can and should guide our thinking and decision-making.

Conclusion

This article has only touched on some of the many applications of the hypergeometric distribution in Magic. If you want to apply it to constructed, just change what N and K are. If you want to find your chances of drawing a land or a card that costs two mana or less, just bump k up. If you’re wondering how likely it is that your 《Ancient Stirrings》 finds the card you need, that your Scry 2 gets you a land, or how many discard effects you should play, the hypergeometric distribution is there for you.

If all this is new to you, then you have an exciting period of growth ahead of you. Even if it isn’t, hopefully you found something useful here.

As always, if you have any questions or comments, please reach out on Twitter: @nalkpas.

Allen Wu@nalkpas