Diverse Metagames and GP Copenhagen

Wizards loves to talk about the openness and diversity of Modern. Both the GP Charlotte and GP Copenhagen Day 2 Metagame Breakdowns made claims like this about the format, stating that Modern was “packed with a diversity of archetypes” and that the breakdown “shows the immense diversity of the Modern format.” Although we Modern players like to tout our format’s diversity, sometimes the Day 2 metagames don’t look as diverse as we want them to. GP Copenhagen was no exception to this, with a Day 2 field that was 15.8% “Splinter Twin” in its many variants, following a trend set the previous weekend at GP Charlotte where 17.7% of the Day 2 field was on some variant of Twin. Can fields like this really be labeled “diverse” or “open”, even if some rogue Lantern of Insight, Death’s Shadow, and Ad Nauseam decks are showing up?

Aether Vial art

The purpose of this article is to try and quantify the hazy definition of “format diversity”, especially as it pertains to Grand Prix events. Drawing on data from GPs since October 2014, this article compares different Day 2 metagames across Modern’s recent history. We can then situate GP Copenhagen and Charlotte in this broader picture, seeing if they are really as diverse as Wizards claims. As with any statistics article, I want to take us through not just the results but also the method of reaching those results and why that method makes sense. Hopefully, this approach will help you conduct similar analyses of your own.

Quantifying Diversity

If you’re like most Modern players, you’ve probably characterized a metagame as diverse, stagnant, warped, or some other similar adjective at one point in time. But it’s often unclear what underlies these Treasure Cruisecharacterizations, and the terms often come off as arbitrary. Not even Wizards gets it right, especially without the benefit of hindsight. Here’s a real gem from the Day 2 metagame of a recent Grand Prix, quoted from the mothership no less: “All in all it looks like Modern is still a very diverse format. And rumor has it one player is even running a Sliver deck today.” Recognize the GP? That’s from the paragon of diversity itself, GP Milan, a December 2014 event during the height of Treasure Cruise[/mtg_Card]’s and Birthing Pod‘s reign. Maybe Wizards was just being optimistic (and to some extent, the people who write those metagame articles aren’t necessarily representing Wizards-wide views), but if that kind of misevaluation can happen during December 2014, it makes us question the reliability of terms like “diverse”.

This gets at the importance of having consistent, transparent, and supported benchmarks for loaded terms like “diverse” and “warped”. Ideally, these benchmarks would be quantitative markers that we could look to in any given metagame, e.g. if a deck is over N% of the event, then that tournament wasn’t very diverse. But there are a few dangers here. First, we can’t just choose a number that “seems” high, like 15% or 20%. This kind of gut-instinct approach gets us in the exact same trouble we were in when we used terms like “diverse” or “open”: it’s important to follow our intuition, but we need to balance that against being arbitrary. Another option for calculating the N% cutoff is using metagame averages. This is by far the most common approach I see in other metagame breakdowns, but it also has the chance to be the most misleading. A hypothetical Metagame 1 with five decks with 23%, 7%, 5%, 5%, and 2.5% shares is totally different from Metagame 2 with 11%, 10%, 8%, 7%, and 6.5% shares, even though both metagames have the same average deck prevalence of 8.5%. But it’s clear that the first is completely warped around some monstrous 23% deck and the other is quite balanced.

Deathrite ShamanGiven these dangers, I don’t want us to think of a firm N1% cutoff. Instead, I want us to look at a range of values in between a lower N% and upper N2% bound of a metagame. That is also to say, decks in this metagame tended to fall between N1% and N2% of the Day 2 metagame. How do we define these bounds? By using the variance between all the different deck metagame shares. As an example, let’s look at the hypothetical metagames in the paragraph above. The hypothetical Metagame 1 has extremely high variance, with one deck at 23% and the next highest at 7%. Metagame 2, however, is much more clustered around the average 8.5% value. So instead of looking at single 8.5% cutoff, we construct a range of values around that average metagame share. For the more balanced Metagame 2, that would be a very reasonable 7% – 10%. For Metagame 2, it’s a much wilder 2% – 15% range. Metagames with narrow ranges tend to be much more balanced, where many decks are viable and nothing is dragging the range up. But if we get a metagame with a large range, that suggests we have some problematic decks polarizing the metagame. A Deathrite Shaman kind of problematic.

With this method set up, we can now turn to all the GPs for the past few year and see how GP Copenhagen and GP Charlotte stack up.

Metagame Share Ranges and GP Day 2s

Let’s apply this method to the Day 2s of these past tournaments. As some of you more statistically-inclined readers might recognize, this is another way of using the same confidence intervals that we use on the Top Decks page. The big difference today is that we are applying it to Grand Prix events and not to the general metagame. This distinction is important for three reasons. First, it means we are working with a population of decks and not a sample, which changes both the math itself and also our understanding of the numbers: there’s no uncertainty in what made Day 2 because we know all the decks. Second, it means we have fewer decks and “cases” (i.e. our N) than in the overall metagame. This makes the numbers harder to extrapolate from, but also concentrates the population around the decks that matter most at GPs (the big dogs like Twin, Affinity, etc.). Finally, GP dynamics are very different from those at a local event, which means a lot for things like breakers, random bad matchups, etc. This is one reason I don’t often perform this kind of analysis on Top 8 decks: the difference between 18th and 4th can often just be bad luck.

Using this approach, here are the metagame-share ranges for all GPs since July 2014. I have adjusted and edited Wizards’ breakdowns to both separate archetypes and expand categories. Also note that I exclude Pro Tour Fate Reforged because Modern decks made Day 2 based only partially on their Modern performance. For each event, I give the prevalence low-end, the high-end, and then the +/- margin around the average.

GP Day 2 Metagame Share Confidence Intervals

EventMeta %
Meta %
Meta %
1. GP Boston1.7%3.5%.9%
2. GP Madrid2.1%4.3%1.1%
3. GP MIlan2.4%6%1.8%
4. GP Omaha2.3%5.1%1.4%
5. GP Vancouver1.8%5.3%1.7%
6. GP Charlotte1.4%3.3%.9%
7. GP Copenhagen2.3%4%.8%

If we were to read this table for GP Boston, we would see that the middle range of deck prevalences is between 1.7% and 3.5%, with a range on that margin of .9%.

Birthing PodLooking over this table, we can quickly identify some themes. Day 2 metagames that were part of less balanced formats have much larger interval margins than the more balanced ones. GP Charlotte, which had dozens of strange decks and tier 2-3 contenders on Day 2, has one of the lowest margins at just .9%. By contrast, GP Milan, which took place at the height of the Pod/Cruise season, has a much higher margin at 1.8%. Higher margins suggest very polarized metagames with lots of upper-end outliers (e.g. Pod and Delver in December 2014). Lower margins suggest much more open metagames where lots of decks are clustered around a central range.

The second indicator I notice is in the relative sizes of the low-end and high-end ranges. The larger the high-end range, the more polarized that event was to the most-played decks. Here, we see GP Milan with its warpage towards Delver and Pod (high-end range of 6%), and GP Vancouver with tons of Abzan and Twin (high-end range of 5.3%). By contrast, more open metagames like GP Boston and GP Charlotte have much smaller high-end ranges, 3.5% and 3.3% respectively. We can also see this in the low-end ranges. When decks have really large low-end ranges, like GP Milan’s 2.4%, this suggests there wasn’t a lot of action happening at the bottom of Day 2. Compare this with GP Charlotte, with a low-end range of 1.4%: there were a ton of less-played decks bringing down the range.

GP Copenhagen and Day 2 Diversity

Based on all this, where does GP Copenhagen fall in the mix? Or GP Charlotte, another recent event that was lauded as one of Modern’s most open fields in months?

Master of the Pearl TridentFrom the perspective of metagame-range margin, GP Copenhagen is actually the most diverse, followed closely by GP Charlotte. With a .8% and .9% margin respectively, these events were not at all polarized around a few decks. This is in stark contrast to something like GP Vancouver, where a huge subset of the field was on Abzan and that brought up the margin significantly. Looking back to Copenhagen and Charlotte, this quantitative assessment fits our qualitative understanding of the different events. Any event where you have Merfolk, Scapeshift, Ad Nauseam, Griselbrand, etc. as viable decks is a very diverse one. It’s when you are stuck on the top-tier decks like Abzan, Jund, Affinity, etc. that the margin widens and the Day 2 starts to look much less diverse. So in that regard, both GP Copenhagen and GP Charlotte were quite successful.

What about the low-end ranges? Remember that low-end ranges are suggestive of how many less-common decks made Day 2, i.e. decks like Martyr Proc, Mill, Mono U Tron, etc. with only a handful of pilots (or even just 1). Surprising no one, GP Charlotte is the hands-down winner here, with a low-end of 1.4%. This perfectly reflects all the tier 3 or lower decks we saw at the event, and all the buzz around Charlotte as being so diverse. GP Copenhagen, however, has a much larger low-end margin at 2.3%. To me, this indicates a metagame where there weren’t a lot of low-end outlier decks, with most people piloting more established builds in tier 1 or tier 2. The Day 2 metagame breakdown for Copenhagen also indicates this, with a lot of familiar faces and not a lot of decks with only 1-2 pilots. This points to GP Copenhagen being less diverse at the bottom than it otherwise could have been. We don’t see the same crazy decks that we did at Charlotte, although there are some standouts here like Dredgevine and Death and Taxes.

The last indicator of Day 2 diversity is the high-end range value. This is again where we see the influence of polarizing decks: GP Vancouver and GP Milan have the largest high-end values because of their collected companywarpage around Abzan and Delver/Pod. GP Copenhagen and GP Charlotte, however, are much better. Again, Charlotte stands out as being the most diverse, with a really small high-end value compared with the rest of the GPs in the table (3.3%). Copenhagen is right behind with a 4% high-end value. Just looking over the events, these assessments make perfect sense. Neither GP was dominated by one particular deck-type, even if they did have archetypes that saw more play than others. In Copenhagen’s case, this does lead to the question of Twin decks and their metagame role (more on this point in a second). But first, the Twin share isn’t nearly as problematic as we have seen in past metagames, even if we do group them all. And second, the rest of the event was much more open around decks like Merfolk, Grixis Control, Naya Company, Tron, and a number of other strategies that haven’t received a lot of press until recently.

Deck Supertypes and Next Steps

TwinWhen classifying decks, one of the most controversial decisions is whether or not to group decks by supertypes. Should we talk about Splinter Twin decks or keep them separate as UR Twin, Temur Twin, and Grixis Twin? Is BGx one archetype? Or is there something to be said for variation between Jund, Abzan, and BG Rock? These kinds of decisions obviously have a huge impact on how the math works in metagame analysis. A Day 2 might be 15% “Twin” decks, but that also might be split pretty evenly between Temur, Grixis, and UR Twin. Making matters worse, it’s unclear how this factors into Wizards’ assessments of format diversity. Did Kiki Pod’s small share factor into the ultimate Birthing Pod ban? My guess is it didn’t: Wizards probably would have been thrilled to not have Abzan/Melira Pod decks and just have Kiki Pod ones.

In this article, I split up all the supertypes into distinct decks, but I also want to re-run this analysis at the end of the month with supertypes instead. Although there are appreciable differences between individual decks within a supertype, this often suggests deck diversity more than it suggests card diversity. And even there, if all the decks are built around the card in the same way, it might not even suggest deck diversity at all! This gives us a good opportunity to re-run the analysis with a different frame after all the June GPs have wrapped up (Singapore is this weekend).

We’ll be doing more GP Copenhagen review all week long, and this is a great starting point in situating Copenhagen in the broader Modern context. By many counts, GP Copenhagen looked like a diverse and open event, although it was certainly no GP Charlotte. Overall, Modern is looking healthier than it has in a long time, although there are still some lingering questions about how Twin-style decks might be shaping the metagame. We’ll have to amass more data before we can answer that question, and I’m excited what the rest of the month holds for the Modern community.

14 thoughts on “Diverse Metagames and GP Copenhagen

  1. I don’t know about Twin decks, but the Junk (some ancient scrolls mention the word Abzan), Jund and… the BG so-called-“Rock”-for-who-knows-what-reason should definitely not be grouped up, as they play like completely different decks on completely different plans to win the game, utilizing a plethora of tools to get the job done, each one deck-specific. While they do share the common BG core, they can’t and shouldn’t be counted as BGx variations.

    Twin is a little bit of an oddball in that regard, since even the Goyf version transitions into a controlling game post-board, but generally even if the decks share a card pool, those that take a different approach to the game should be classified on their own.

    1. Having played and analysed all Twin versions, I can honestly say the game plan between the three is completely different. Well at least when you aren’t trying to T3 Exarch into T4 Splinter Twin.

      But I don’t think play style should factor into this discussion. Any deck whose core is discard into Goyf into Lili should be considered together for the same of determining how varied the meta is. Just like most players wouldn’t consider playing the different Twin decks as having played varied games of magic, I doubt subtribe would say they played a bunch of varied decks if they played against Jund and Junk (nuances of the third color aside).

      You could potentially make an argument to group the CoCo decks, but given the various colours and the way they played out, I think it’s OK to look at them separately.

      1. @justaguy: I think it depends on the purpose of the discussion. If we are talking about banning a card, then I agree we can consider grouping SOME of these decks, depending on the card in question. For instance, Wizards definitely grouped Treasure Cruise decks into one category, because there was no single version of Delver with a large metagame share. It was only the combined Temur/Jeskai/UR supergroup that was dominant. The same was also true of Deathrite Shaman, although in that case, most players were using DRS as part of an Ajundi shell anyway.

        1. I think it’s important to group to understand the defining characteristics of the format.

          It’s equally disingenuous to say a meta us diverse when it is colour variations of the same core.

          Twin is 15% of the meta, no matter how you slice it. And that’s fine, because we understand that within that there are variations.

          Similarly we have 3 (well really 2 because straight BG is non existent at the moment) variations of The Rock which made up 11% of the day 2 meta for Copenhagen.

          The format is healthy because there’s a solid core of decks holding the format together while allowing room for other decks to play with.

    2. @Nickolay: Totally agree with this. Jund and Abzan in particular have very different play styles, even if they share core cards. Path/Rhino/Souls/Hierarch vs. Bolt/Terminate/Confidant/Kolaghan’s totally changes how the deck plays. Post-sideboard, it’s even more noticeable. Although it’s convenient for a lot of the ban-maniacs to group these decks to prove a point, there are a lot of differences in how you play the decks and, more importantly, how you play against the decks. The same is mostly true of Twin too, especially in the differences between Grixis and UR Twin.

  2. Can I just offer a word of thanks for the incredible amount of work that goes into these articles/analysis. I’ve just discovered this site, but it is by far the highest quality I’ve seen for my favorite format.

    Keep up the incredible work.

    1. Thanks! It’s fun to do these kind of in-depth articles, especially for a format as rich and fun as Modern. Let us know if there’s anything we could be doing better, or anything you want to see more/less of!

  3. One thing I’d add is that if we’re talking about a healthy format, we should also want a high low %.

    In other words, a format of 100 decks of 1% each is equally unhealthy in my eyes. In this format you can’t prepare, sideboarding is a nightmare and your performance is largely down to pairings.

    You want stability in the format, as well as the ability for rogue decks to come and attack the meta. The current meta is close to this.

    1. I think as long as you have a larger low-end % AND a reasonably large high-end %, that will mostly account for the situation you are describing. We do want some number of 1% decks in the Day 2, but as you said, we don’t want 100 of them.

  4. As an industrial engineering student, with lots of statistics subjects on my back, and as a modern lover, this kind of articles are pure gold. Please keep it up. I think a good follow up for this kind of articles would be to use binomial laws to demonstrate with numbers which is the righteous number of lands for each deck, how much can we rely on mana dorks, how important (or not) is the fifth one mana cantrip for twin… In modern lots of people just netdeck their lists (me included) and it would be awesome to perfectly tune decks just by looking at numbers (the same way you have been able to pull out the best deck in modern some posts ago).
    Mail me if you need/want help with something like this.

    1. There’s definitely a lot of room to explore in applying stats to decks themselves. I’m particularly interested in the relationship of cantrips and other low-CMC spells to the delve creatures. Lots of decks are going to try and jam in Tasigur/Angler, especially Tas, and it would be interesting to see what kind of support those cards need.

  5. i think that a lot of this diversity is the fact that there isnt a “best deck” anymore, and the sideboards cannot be prepared to fight all the combo decks

    1. I actually believe Twin is still the “best deck” in the format, but in reality, Modern is really a format that rewards deck knowledge and familiarity. So the “best deck” becomes the one you are most experienced with, so long as that deck also fulfills some minimum power requirements. What ends up happening is that a lot of players jump around decks to fit the flavor of the month or keep up with the new hype, which means you have inexperienced pilots playing decks that may or may not be good. But you also have a group of players who stick to their tried-and-tested deck, or metagame against the popular stuff with some tier 2-3 sleeper pick. All of this is at play in the current format diversity.

Leave a Reply