Matchups and Win Rates: Top Tier Decks (Part 2)

The Deep Dive dataset is back! In three earlier articles, I analyzed a collection of MTGO dailies to determine the matchups and win rates of different top-tier decks. Unlike the publicly published MTGO dailies used to inform the Top Decks page, our Deep Dive dataset includes all finishes from a sample of dailies, not just the 4-0/3-1 ones. It also includes all the matchups between those different decks, not just their overall standings. Today, we are returning to the Deep Dive to see how different win rates and matchups are doing. This includes both overall deck win rates and individual matchups between decks, all with an even larger sample size than before. And as many of you can guess, one of the best decks from last time is still on top. In fact, it’s more vigorous than ever.

Amulet of Vigor Art Cropped

As in this last articles, I’m going to focus on the top-tier Modern decks as defined in our Top Decks page, paying special attention to the MTGO stats because the Deep Dive dataset is MTGO-based. I’ll also include a brief discussion of the dataset itself and all the different pieces that go into it. Then we’ll dive right into the deck win rates and their matchups. All in all, this analysis gives us an important quantitative perspective on which decks are strong in the format, and which are strong against each other. So whether you are thinking of bringing these decks to the major events in June, or are just preparing to face the diverse Modern field, this article will give you a statistical foundation with which to start your testing and decision making.

Special thanks to MTGS users pizzap and Rickster for their work on the dataset. Also to Kim Josefsen, a regular reader who contributed to the work.

Dataset and Methods

The MTGO Deep Dive dataset compiles a semi-random selection of dailies and the different decks and finishes within those dailies. In my last article on the topic, the dataset included 16 dailies. Now, we are up to 28, consisting of just over 5700 matches. For each daily, I analyze deck performance to determine a deck’s collective Match Win Percentage (MWP) across different events. I also calculate matchup win rates between the different decks. This gives us a sense of both the “true” overall MWP of those decks (calculated over hundreds of games), and the “true” matchup win-rates between different decks. Also, note these are MATCH win percentages, not GAME win percentages (GWPs): a 2-1 win is counted the same as a 2-0 win for the purposes of counting an MWP. I focus on MWPs instead of GWPs because GWP numbers don’t distinguish between pre-sideboard and post-sideboard games. MWPs at least capture this over the course of a match.

I adjust all MWPs and win-rates for byes, drops, splits, mirror matches, and other MTGO/statistical oddities that would skew the dataset. In addition, I assess all MWPs for statistical significance relative to the “weighted average MWP” of decks across the dataset. This produces different P values for each deck’s MWP. The statistical tests and the resulting P value checks the likelihood of any given deck’s MWP value falling within expected variance relative to the average MTGO MWP. A P value greater than .10 would suggest the deck is not truly above or below average relative to the MTGO-wide MWP: it’s just within the expected spread. But a P value of less than .10 (or even better: less than .05) would suggest the deck is a legitimate outlier, and a true under- or over-performer.

Overall Win Rates: Tier 1 and Tier 2 Decks

To get our deep dive started, here are the MWPs for all the tier 1 and tier 2 decks in the format, along with their statistical significance. I also show the number of appearances each deck made throughout the dataset, and the total number of matchups used to calculate the overall MWP. This gives you some sense of the sample size, N, for each of the calculations, and in turn a sense of how accurate those calculations might be. All tier 1 and tier 2 decks are taken from our Top Decks page: visit the page to see how we define which decks belong in which tier (note the page is being updated on Wednesday, 6/3).

Below are the tier 1 decks as defined on the Top Decks page. As a point of reference, the average MTGO-wide MWP for all decks (weighted based on prevalence) is 50.1%. 

Deck# of Deep Dive
appearances
% of Deep
Dive Metagame
# of Deep Dive
matches
MWPP value and
significance
Abzan975.3%30252%.51
UR Twin1427.7%44252.3%.26
Burn19010.3%56950.8%.38
Affinity1085.9%33957.5%.01 (**)
Jund864.7%26349.4%.83

I don’t want to go into too much detail on these overall MWPs — that’s coming in the next section. For now, it’s enough to say most decks are hovering right around that 50% marker, with just Affinity standing out as an overperformer. With a P value of .01, the deck’s 57% win rate is significantly higher from the MTGO average of 50%. More on that later.

Next, here are the tier 2 decks.

Tier 2: 2/5/16 - 2/16/16

DeckOverall
Metagame %
MTGO %Paper %Day 2%
Merfolk3.3%2.4%4.1%2.1%
RG Tron2.7%1.9%3%2.9%
Griselbrand2.3%0.5%2.6%3.7%
Abzan1.8%1.4%1.5%3.7%
Naya Company1.7%1.9%1.9%0.8%
Gruul Zoo1.6%0.5%1.9%2.5%

Amulet Bloom is clearly knocking the MWP ball out of the park, but I’m going to be discussing that in the next section so let’s ignore it for now. Instead, let’s focus on all the decks below Grixis Delver, decks without sufficient N to include in a matchup analysis section, but with enough overall matches to extrapolate a net MWPs. One of the challenge in working with the Deep Dive dataset is always obtaining a large N for any given deck. You are at the mercy of what people are playing for any given daily, so if people stop playing a deck (poor Infect!), we stop seeing matchups for it. This means a lot of these MWPs have a lower appearance and match N than I would like. But we can still make some general observations from what we are seeing here, because most decks still have over 100 matches.

collected companyLet’s start with the Collected Company decks: Abzan Company and Elves. The Abzan Company MWP is just terrible right now, whereas Elves is right around the average. I think there are a few elements at play here. First, Elves is a fast, linear, minimally-interactive combo deck. Those kind of decks tend to be very successful on MTGO, where tournaments are just four rounds and you can gamble on good matchups. Heck, those decks tend to be very successful in Modern period, for the very same reasons. When you screw up against Elves, you probably lose on the spot. When you screw up against Abzan Company however, you can still play a fair game of Magic (unless the Company player combo’d, but that’s harder to do now than it was in the days of Pod). This favors Elves in the MWP contest. The second element explaining these differences is in the decklists: it’s much easier to optimize an Elves list than an Abzan Company one. There is substantial consensus about what goes into Elves — not so for Abzan Company and its many variations. This suggests Company players might be bringing suboptimal lists into dailies, which would help account for the lower MWP.

Lightning HelixThe other deck I want us to think about is UWR Control. We don’t have quite enough matches to determine if this deck’s MWP is actually as high as its pointing here, but early signs indicate it might be. UWR Control has a lot of tools for this metagame, including ample early removal, lifegain, countermagic to get you through the midgame, and resilient finishers. My guess is UWR Control still suffers from many of the same problems it suffered from in past months (chiefly it’s a reactive deck in a metagame rewarding proactive strategies), but I also think it’s a better deck than people give it credit. Myself included! I’ve written off UWR Control before, but it seems like it’s better positioned now than in the past. After all, as Bolt becomes better, decks like UWR Control become more viable, particularly with redundant Bolt effects like Helix and Electrolyze. Cryptic Command also becomes much better in slower/fairer formats. With decks like Grixis Delver, Temur/Blue/Grixis Moon, Jund, Abzan Company, and other similar decks rising through the metagame ranks, the format is becoming much friendlier to Command.

Before turning to the in-depth analysis of certain decks, one final word on the MWP tables above: don’t look at the tables and say “UWR Midrange only has a 47% MWP. It’s clearly a bad deck!” Instead, consider those MWPs in relation to their P values and their N. In almost all cases, the decks are right within expected variance around the MTGO-wide average of 50%. This suggests ALL of the decks are actually decent choices, although some (cough Amulet cough) might have more going for them.

In-Depth Win Rate and Matchup Analysis

Some of our decks have hundreds of appearances and matchups, which lets us perform a much deeper analysis on their performances. In this section, I break down some of those key decks to discuss both their overall MWPs and their matchups against each other. Not all top-tier decks are included here! Some decks didn’t have a large enough N to draw results from, either overall or within different matchups. But for those decks I do show, I’ll give a detailed discussion of the results and how I make sense of them.

Remember: quantitative data is just one datapoint you need to consider when doing any kind of evaluation or data analysis. Make sure you combine the numbers here with your own experiences and the other sources/experience you may know of. I’ll offer a bit of commentary in each section to try and help people make sense of the numbers and put them in the larger Modern context.

Again, for reference, our weighted average MTGO-wide MWP is 50.1% (N=98 different decks with ~5700 matches).

TwinUR Twin

  • Top Decks MTGO prevalence: 5.8%
  • Deep Dive MTGO prevalence: 7.7% (142)
    Deep Dive matches: 442
  • MWP: 52.3% (p=.26)

vs. Abzan: 68.8% (11/16)
vs. Affinity: 55.2% (16/29)
vs. Burn: 46% (23/50)
vs. Jund: 55.6% (10/18)
vs. Amulet Bloom: 36% (9/25)
vs. Grixis Delver: 36.8% (14/38)

Twin’s showing up a lot less in the Top Decks metagame than in the Deep Dive, which suggests a lot of people who play the deck are not consistently making 4-0/3-1. There is an underperformance effect at play here. This is also reflected in the MWP, which is slightly above-average but not significantly so. I found this a bit odd, given how strong Twin has been at events in the past year (the winningest GP deck after Pod). I think underperforming players are pulling down Twin’s online MWP, which is why its MWP is only slightly and insignificantly higher than the MTGO average. In the hands of a good pilot, Twin is still one of the format’s best decks. In the hands of a less experienced one, however, the deck does not necessarily carry the player. With so many people on Twin (remember: it’s the third most-played deck), the MWP is going to take a hit just from player skill differences.

Turning to the individual matchups, the Burn and Affinity matchups make perfect sense. These are effectively 50-50 races, which reflects most of my experiences with the decks and those of players I know. Grixis Delver also makes a ton of sense. Delver decks are excellent against Twin, particularly the hard-removal-packed Grixis variants (sorry 4 toughness Exarch). Grixis Delver has exploded on the scene, and we know its Twin matchup is a big part of that.

Then we get to the Abzan and Amulet Bloom matchups. Abzan is supposed to be great against Twin. Here, however, it can’t seem to win. We saw a similar effect last time we looked at the dataset, and it’s still present even after increasing N. I believe player experience accounts for this, but not on Twin’s side of the table. Abzan’s metagame share has been declining rapidly on MTGO, which suggests to me the BGx deck is not a great choice these days (more on that later). True, players who are still sticking to BGW might be diehard Abzan pros, but they might also be players who are simply behind the metagame times. That second kind of player might have less overall Modern experience and thus be less equipped to battle Twin. The reverse effect is probably driving the Amulet matches. Your average Amulet Bloom player is quite experienced with their deck: Amulet has one of the lowest ratios of unique players to number of matches. Because of their experience and skill, those Amulet players are probably more experienced at navigating the Twin matchup than Twin players are at navigating the Amulet one. Player skill being more equal, we would expect both win percentages to normalize more towards 50%.

Goblin GuideBurn

  • Top Decks MTGO prevalence: 10%
  • Deep Dive MTGO prevalence: 10.3% (190)
    Deep Dive matches: 569
  • MWP: 50.8% (p=.38)

vs. Abzan: 55.6% (20/36)
vs. Affinity: 41.9% (18/43)
vs. Jund: 38.5% (10/26)
vs. UR Twin: 54% (27/50)
vs. Amulet Bloom: 22.7% (5/22)
vs. Grixis Delver: 59.6% (31/52)

Burn has been the most-played MTGO deck for a while, and that’s just as true in the Deep Dive dataset as it is in the MTGO metagame numbers. Burn’s paper metagame share has been crashing (it’s currently between 4.5% and 5%), but it remains an MTGO powerhouse going into June. Even so, the deck’s MWP has declined a few percentage points since my last article. This reflects both metagame adaptions to Burn, and the tendency for decks to fall back to 50% as more people play them. We saw a similar effect with Twin, but it’s notable to me that Burn’s MWP is right at the average even though Twin’s is slightly over. This reflects the oops-I-win element of Twin, which is less present in Burn.

Unlike the Twin vs. Abzan/Amulet matchups, the Burn matchups make sense across the board. Affinity and Twin are straight races, with Affinity at a slight edge (it can threaten the turn 3 win and easily wins turn 4 on the play) and Twin at a slight deficit (the only way it wins turn 4 is if it draws the combo or if it can somehow control the damage). Amulet is also a race, but between the lifegain from sources like Radiant Fountain and the relative difficulty of Burn interacting with Bloom’s cards, this is heavily in Amulet’s favor. Burn struggles with Jund due to the less painful BGx manabase and Bolt, and beats Abzan for the opposite reasons (you can read my article on Jund’s strengths for more on these points). Finally, Grixis Delver remains Burn’s best matchup, which is something many Grixis Delver players will admit to. Grixis Delver struggles against Burn because of a painful manabase, a lack of lifegain, cards like Gitaxian Probe which are just terrible in the matchup, and a gameplan that is a bit too slow. Thankfully for Grixis mages, the Burn vs. Delver MWP isn’t nearly as lopsided as it was in the first article, which was a normalization I predicted would happen as we added more data.

Arcbound RavagerAffinity

  • Top Decks MTGO prevalence: 5.8%
  • Deep Dive MTGO prevalence: 5% (108)
    Deep Dive matches: 339
  • MWP: 57.5% (p=.01***)

vs. Abzan: 61% (11/18)
vs. Burn: 60.5% (26/43)
vs. Jund: 30% (3/10)
vs. UR Twin: 44.8% (13/29)
vs. Amulet Bloom: 57.1% (8/14)
vs. Grixis Delver: 50% (12/24)

From a metagame perspective, Affinity’s Deep Dive prevalence is very close to the overall Top Decks prevalence, although Affinity’s paper presence has historically (and currently) been higher than its online share. Affinity is in an MTGO metagame share dip these days, but I expect that to reverse in the coming months. Just like you should never bet against BGx, never bet against Affinity.

Speaking of never betting against Affinity, the real takeaway here is not the prevalence — it’s the MWP and its statistical significance. The deck’s MWP is considerably higher than the MTGO-wide average, which reflects Affinity’s longevity in Modern and its biggest events. This deck has been around for as long as the format, and it has always put up results, particularly when people expect it least. With all the focus on Burn, Grixis Delver, Abzan Company, Jund, and other hot Modern decks, players are probably forgetting their Silences and Grudges at home. This is especially true of all the decks not represented in the top-tier echelons. Brewers and tier 2-3 players are preparing for a field of Company/Delver/Burn/BGx/Twin/etc. They are probably forgetting the oldest aggro deck in Modern. This is reflected in the data itself: Affinity has 339 matches, only about 1/3 of which are represented against top-tier decks. This suggests other matchups are strongly driving the significant MWP, which is exactly what we would expect in a format where players might be forgetting Affinity to try and beat decks with more hype.

Looking at individual matchups, the most interesting results are the Amulet matchup and the Abzan matchup. Against Abzan, I expected this matchup to be more even, but I also still think some of the players who are sticking strong with Abzan might not have the best grasp of the format right now. So it’s possible those less experienced/informed players are bringing the Affinity vs. Abzan rate. As for Amulet, I think this is mostly a function of Affinity players having the clock advantage against a deck that can’t really interact with them. It’s not like Bloom has any tools short of a Hive Mind to consistently beat giant Inkmoth hits, or a huge Skirge swinging the life totals. Affinity players who know their Ravager/Plating combat math will be rewarded in this matchup.

Siege RhinoAbzan

  • Top Decks MTGO prevalence: 5.1%
  • Deep Dive MTGO prevalence: 5.3% (97)
    Deep Dive matches: 302
  • MWP: 52% (p=.51)

vs. Affinity: 38.9% (7/18)
vs. Burn: 36.4% (16/36)
vs. Jund: 71.4% (10/14)
vs. UR Twin: 31.3% (5/16)
vs. Amulet Bloom: 45.5% (5/11)
vs. Grixis Delver: 72.7% (16/22)

Abzan’s MTGO metagame share continues to decline as players switch to other decks (get ’em Jund mages!). Abzan may still be considered the 50-50 deck, but that definition is becoming increasingly uncertain in a metagame where everyone expects Abzan. The deck’s MWP is solidly average, which partially reflects the 50-50 nature of the deck, but also reflects metagame context less friendly to Abzan than it used to be. Looking at the deck’s matchups, this makes a lot of sense. Path, TS, and a painful manabase are just not where you want to be against Burn and Affinity, which is why those win rates are so low. Abzan may have a very strong Jund matchup (which is absolutely reflected in my experience with the matchup, where Abzan easily outvalues Jund), but that’s not enough to shore up those matchups against the linear, less-interactive decks. All of this contributes to Abzan’s falling metagame share and its lackluster MWP.

Grixis Delver is notable here in being one of Abzan’s few remaining strong matchups. The Delver variant is everywhere online, and it really struggles against things like Siege Rhino, Lingering Souls, and Path (especially against bigger Delver decks favoring Angler/Tas). Also, Decay is still just as crazy against Delver as it has always been. Another notable matchup is Twin. Abzan is supposed to have a good Twin matchup but, again, that’s not what the data is tracking here. As I mentioned before, I think this is a function of player skill and experience. A lot of players jumped ship from Abzan in the last month, leaving some combination of high-quality Abzan regulars (who will bring the MWP up) and players who effectively “missed the memo” about Abzan’s declining effectiveness (who will probably bring the MWP down). Perhaps the most important takeaway here is that the data suggests the deck isn’t itself so strong against Twin that you can just rely on card quality to win. Pilots still matter.

Dark Confidant MM2015Jund

  • Top Decks MTGO prevalence: 4%
  • Deep Dive MTGO prevalence: 4.7% (86)
    Deep Dive matches: 263
  • MWP: 49.3% (p=.83)

vs. Abzan: 28.6% (4/14)
vs. Affinity:
 70% (7/10)
vs. Burn: 61.5% (16/26)
vs. UR Twin: 44.4% (8/18)
vs. Amulet Bloom:  71.4% (10/14)
vs. Grixis Delver: 38.9% (7/18)

Jund continues to rise up through the MTGO ranks, and I fully expect it to surpass Abzan by the end of the summer if the rest of the field still looks like it does now. Jund’s metagame share is still lower than decks like Twin, Affinity, and Burn, but we have already seen Jund shoot up to 6.5% of paper: MTGO is likely to follow soon. That said, the deck’s MWP is actually lower than the MTGO-wide average, which seems unexpected of a deck that is supposed to be such a great metagame choice. The difference is by no means significant, so it’s hard to know where Jund’s true MWP falls around the MTGO average, but this is certainly not the MWP we would expect of a rising tier 1 staple.

To understand the potential discrepancies between Jund’s metagame trends and its MWP, we need to look at the matchups. Burn, Affinity, and Amulet Bloom are all at the core of Jund’s successes. As I’ve discussed in the earlier article on Jund’s successes, Bolt and a less painful manabase go a long way towards beating those two aggro decks. As for Amulet, Jund combines Abzan’s disruption with better card advantage engines (Bob is way better than Souls here because Amulet can’t kill him and can’t handle the card advantage) and a faster clock (Bolt is big here). These are important driving factors behind Jund’s success, and I expect this to continue into the summer. That said, Jund has some clear weaknesses bringing down its MWP. Jund is not great against fairer decks. Bolt is terrible against Goyf and Tas, and just as terrible against decks like Grixis Moon and UWR Control/Midrange playing Bolt-resistant strategies. Bolt is also not where you want to be against Exarch. All of that is reflected in the abysmal Jund vs. Abzan MWP, as well as the Jund vs. Twin MWP: Bolt is not what you want to be doing against Exarch. Grixis Delver is also an uphill battle, because Jund’s strongest cards are not so great in that matchup (Bob gets killed too easily, Bolt doesn’t stop Tas or Angler, you have no Rhinos to seal the game, etc.).

bloomAmulet Bloom

  • Top Decks prevalence: 4.1%
  • Deep Dive prevalence: 4% (76)
    Deep Dive matches: 250
  • MWP: 60% (p=.002***)

vs. Abzan: 54.5% (6/11)
vs. Affinity:
 42.9% (6/14) 
vs. Burn: 77.3% (17/22)
vs. Jund: 28.6% (4/14)
vs. UR Twin: 64% (16/25)
vs. Grixis Delver: 61.1% (11/18)

Yeah, Amulet Bloom is still probably the best deck in Modern. We are up to 250 matches and the MWP is only getting crazier. Now it’s 60%, a full 10% points over the MTGO-wide average, with a jaw-dropping statistical significance of P = .002. This means Amulet isn’t just at the upper end of expected variance. It’s a legitimate overperformer in another MWP league relative to the competition. This also aligns with our more qualitative experiences of the deck. Amulet Bloom is perhaps the most difficult combo deck to interact with in Modern, and also one of the most linear. It punishes decks that don’t interact with it, and very hard to interact with for decks that try. This matches all other available data on the deck, all of which suggests Amulet is the real deal and the hand’s down victor for highest MWP in Modern.

From a metagame perspective, Amulet Bloom sees a solid amount of play but nothing too overwhelming. It’s about as common as Merfolk, RG Tron, and Jund, which feels odd given how crazy its overall MWP is. Why aren’t more people playing this deck? It has positive matchups everywhere, it has a strong gameplan, and it punishes opponents who either don’t interact with it or screw up an interaction. Why is it underplayed? The big reason is a perceived skill floor. People think this deck is really hard to play, which scares prospective pilots. Is it actually as hard as people think? Yes and no. The deck has a lot of internal nuances to figure out and many play lines you need to consider. But it’s not much harder than Tempo Twin variants or Affinity in that respect, and those decks see a lot more play. That said, most players don’t believe this to be the case, which is why so many of them don’t run Amulet. Those  running it online are extremely experienced with the deck: many have been playing it for years, and the deck has the lowest ratio of unique players to matches of any top-tier deck. This is reflected in all the matchups. Those win-rates aren’t just Amulet Bloom showing its power. It’s the players themselves showing their experience. Amulet is both a deck that rewards player mastery, and Amulet players on MTGO tend to be very experienced with the deck.

As a final note on this, I think both the Twin and Abzan matchup are closer to 50% than the numbers are indicating here. In both cases, there is a player experience effect at play that increases the Amulet Bloom win rate. These guys know their stuff and have been playing for a long time. But it’s also a feature of the deck itself. When you screw up against most decks, you don’t instantly lose. A misplay against Amulet, however, is often game over, and Amulet gives lots of opportunities for opponent misplays.

DelverGrixis Delver

  • Top Decks prevalence: 8.7%
  • Deep Dive prevalence: 8.9% (165)
    Deep Dive matches: 500
  • MWP: 50.4% (p=.395)

vs. Abzan: 27.3% (6/22)
vs. Affinity:
50% (12/24)
vs. Burn: 40.4% (21/52)
vs. Jund: 61.1% (11/18)
vs. UR Twin: 63.2% (24/38)
vs. Amulet Bloom: 38.9% (7/18)

We end with Grixis Delver, an MTGO staple which exploded on the scene back in March and hasn’t looked back since. Grixis Delver is the second most-played MTGO deck after Burn, which is reflected in both the Top Decks dataset and the Deep Dive. Like Burn, the most-played deck online, Grixis Delver has a very middling MWP, which is expected given how many people are on the deck. With such a deck, you naturally see a mix of experienced pilots, good players who are just picking up the deck, people boarding the MTGO hype train, and outright bad players. This all but ensures an MWP hovering right around the average.

Grixis Delver’s observed matchups align nicely with my own experience of the deck. Affinity is a straight race, although I think this is slightly in Delver’s favor depending on what build the Delver player is using. Burn is a bad matchup and Abzan is much worse, the former because of a painful manabase and a slower effective turn, and the latter because Abzan’s cards generally outclass Delver’s. Getting your turn 2-3 Angler or Tas hit by Path is a disaster. So is trying to burn out a Rhino. Amulet Bloom is probably more in Delver’s favor than the matchup results indicate here, but player experience is a strong matchup determinant on both sides of the table. Amulet players tend to be very experienced with their deck and the format. Grixis Delver players run a huge range.

Next Steps

When I look over this data, my biggest takeaway has to do with player experience and skill. I see lots of instances where a matchup is brought up or down based on the relative skill of pilots. This doesn’t mean the deck isn’t a factor. As with most social science data analysis, it’s a little bit of both. But player experience is an under-appreciated factor in deck performance analysis, and one affecting most Modern players. You can use this kind of analysis to see which decks reward tight, experienced play, and which decks are easier to just pick up and take to town. You can also use it to see which matchups are easy/hard independent of player skill. Again, don’t interpret this as player skill being the only deciding factor in matchups and win rates. Decks play a big part in this too. It’s just to say you need to consider all the factors in deck evaluation.

We’ll keep adding data to the Deep Dive dataset and keep updating you on its progress. June is here which means we are in for three Modern GPs and an SCG Open. Hopefully this article gives you some additional tools to help you pick your decks and improve your matchups. And hopefully those events will give us some more awesome finishes and data to discuss as the month goes on!

Sheridan is the former Editor in Chief of Modern Nexus and a current Staff Author. He comes from a background in social science data analysis, database administration, and academia. He has been playing Magic since 1998 and Modern since 2011.

25 thoughts on “Matchups and Win Rates: Top Tier Decks (Part 2)

  1. Wow, some really interesting stuff here.

    I always wondered as for how significant role player’s skill takes in modern. I believe I always underestimated this, and thought that in reality it just goes down to cards drawn. Seems it really plays a much larger role than I thought, and it is reflected by my personal experiences: I am basically playing only Bloom Titan on mtgo. Twin was always “supposed” to be one of the Bloom Titan worst matchups, because of the small amount of interacion with the Twin combo, Blood Moon from the sideboard, counters, and stuff. All the new players picking Bloom Titan for the first time usually complain about how bad that matchup is. I, homever, after playing the matchup a lot, always thought that it wasn’t as bad, especially after i learned what is important in the matchup and what are the nuisances of it. The matchup being ultimately somewhere like 50/50 or 45/55 for Twin if the both players are skilled and familiar with it should make sense.

    1. Kanister, mind if I quickly ask what are some the nuances that I should look out for as a Bloom player? I just picked up the deck about a month or two ago and the twin match up is still something I struggle with so any tips would be appreciated.

      1. I suggest watching videos of people playing the deck. There are just so many interactions to explain that it’s better to see them played out in practice. Speaking of practice, the more you can play the deck (even just goldfishing!) the better you will be.

        If I had to identify one critical nuance of the deck, it’s using bouncelands to return Tolaria West and Fountain to your hand. Knowing when to do that and how to sequence those turns is critical for Amulet success.

    2. I am also finding that the Twin vs. Bloom matchup is probably not as heavily in favor as Twin players initially believed. Even with a pretty wide confidence interval, it’s still somewhere between about 48% and 85% (based on the data we have currently). That’s not even close to the 30-70 matchup some people were claiming. My guess is that it’s close to 50-50, but with player experience able to increase/decrease that by 10%+.

  2. I completely agree that those swapping decks do not get to see the nuanced levels of play that exist within modern. I sometimes wish i could answer every /r/modernmagic threat in all caps with “Just play the deck for 6 months playing games every night until you learn the deck backwards.”

    While there is no statistically significant way to determine, what would be your guess as to what MWP a player gains by being very experienced with a deck? 3-8% would be my guess? that would mean there are those newer with the deck that are experiencing say 47% MWP when compared to a very experienced player that can obtain 53% with the deck. I dont mind that split.

    It’s interesting to compare the statistics to my own experience (on June) and then try to extrapolate the reason. For example, I would not have stated that the Jund vs Burn matchup as favourable to Jund. I accept the reasoning that Jund has a better time than abzan, but still thought burn would have considered Jund a good matchup. I will have to work on this matchup and test more.

    Also, the numbers Jund vs Twin was interesting, I thought the Jund answers and threats lined up quite nicely with their deck both pre and post board. It may be that the twin player is not playing optimally. I will have to test this further.

    Thanks for this website. When the duldrums of starcity and channel fireball get me down writing about den protectors and hordeling outbursts (which I do not care about), it is great that you are a shining light. Thankfully modern PPTQ season should provide heaps of data, articles and pro lists will be available.

    1. I’m trying to do some analysis of player experience/skill. My guess is that it accounts for no more than 10 percentage points in an MWP calculation. So a true 50/50 matchup would actually be 40/60 in the hands of a skilled player. But this might cut both ways, so if the opponent was actually bad, it might widen the gap to 30/70.

      Jund vs. Twin is unexpected for me too. The confidence interval around that win rate, however, is pretty wide, so it’s very possible that the true matchup win rate could be anywhere from 30/70 to 60/40. Additional data, both in this dataset and from other sources, is definitely needed here.

      1. Hey,

        thank you, great analysis! 🙂 regarding playr skill:

        some months ago i looked up the match win % of around 20 famous pro players at 2 Modern Grand Prixs and despite entering the tournament against 3-0 players (byes) they still had a match win% of ~67% (quite similar in both grand prixs). Interestingly the % did not differ much between Day 1 and Day 2 (1-2% difference). I dont have the exact numbers at the moment. I guess that also indicates that most players who reach 3-0 at a Grandprix have chosen a solid deck and are solid players.

  3. Just a tiny nitpick here. You wrote “First, Elves is a fast, linear, minimally-interactive combo deck.”
    No, it’s an synergistic Aggro deck the same way Affinity is. Ezuri + a ton of Elves isn’t a combo.
    /rant

    Anyway I really liked this article and I agree that player skill is a part of the equation that is often overlooked. It’s easy to fall into the trap of saying that X beats Y but things are seldom so black and white when player skill and experience plays a huge role too.
    There is the saying that the best Modern players are the ones who have stuck with a deck for years and I think that is really true.

    1. You have stated a lot of things about amulet on reddit and on here that convinced me its the best choice right now for the meta and will be building it (parting with liege rhino to pay for it). What is your current decklist and how have you been doing personally with it? Also, sideboarding with amulet seems wierd to me. What are tips for boarding?

      1. This is a pretty solid Amulet list from SCG Worcester (2nd place):
        http://sales.starcitygames.com//deckdatabase/displaydeck.php?DeckID=85210

        That’s basically the list I’ve been using in testing, although I treat Oracle as a flex slot for something like Dragonlord. You can also use Simian Spirit Guide to accelerate into faster threats, but I like this less as people have become better at beating the deck.

        As for boarding and other strategy, I suggest checking out the primer on MTGSalvation. Izzetmage is a very knowledgeable user there who has a great handle on Amulet
        http://www.mtgsalvation.com/forums/the-game/modern/tier-2-modern/556715-amulet-bloom

        We’ll probably do our own coverage of this deck at some point too!

  4. I switched to Amulet after playing against it twice in a paper event. It has weak points that players with experience can exploit, but it is fast, punishes decks that don’t have hand disruption and/or countermagic, and its late game is brutal because of the mana advantage that Pacts give it on its fundamental turn. It can tutor a titan, cast it, protect it with PoN, and attack with it for value all on the same turn. It tends to also gain massively from sideboarding, whereas many of the linear decks in the format do not.

    The reasons people aren’t playing it are threefold.

    1. Afraid it’s gonna get axed. It break the Turn 4 rule. I jumped in because *if it doesn’t*, I’m going to wreck the PPTQs when Modern season rolls around.
    2. Although powerful, it’s a high-variance deck, and the mulligan decisions can be very hard.
    3. It does have bad matchups: notably against Infect, skilled Twin pilots, and affinity as noted above. Sometimes you just lose to Blood Moon.

    1. Is point 2 really valid, though?
      http://www.channelfireball.com/articles/can-it-pay-off-to-play-a-high-variance-deck/
      From what I have understanded from the linked article, it is just a problem coming from how people perceive the game, not an actual problem lessening your chances to win the tournament.

      Homever, I believe that there is also a fourth reason. The vast majority of players perceive “fair” decks, fighting for card advantage turn by turn, as more fun to play. Even though the match can have only result in a win or a loss (or a draw, but nobody wants that), people in general feel much better when they lose after “putting up a fair fight” and die to a turn 9 Tarmogoyf swing, than they do dying to T5 Tarmogoyf swing, getting to play just a bunch of lands.

    1. Maybe I’m missing something you are responding to, but the MTGO average win rate is about 50%. If you are talking about specific decks, however, then the average deck win-rate (or matchup win-rate) would definitely not always be 50%.

  5. How was Esper Mentor doing? Probably not enough games with it, because its unpopular, but maybe an unofficial statement?
    I’d love to see some matchups there.

    Also, i would really like to see Merfolk getting more spotlight. Rumors say, their Delver matchup is quite good. Is that true? How are their other matchups?

    thanks, Tyrannon

    1. Esper Midrange is rocking a 58.9% MWP with 16 appearances and 56 matches. The MWP is not significantly different, with a P value of just .21 (but it’s getting there!).

      Merfolk and Delver have seen 11 matches in the dataset so far. Currently, Merfolk is 8/11, which is definitely trending towards a strong matchup. The “true” win-rate is probably closer to 60/40, but Merfolk still seems like a favorite there. The only other Merfolk matchups with a large enough N to extrapolate from (and honestly, it’s still small), are Burn (9/19), and Twin (13/20).

  6. Nice article. Nice to have some hard stats to work with.

    What do you think makes the Bloom Titan – Twin match up so close? It really does seem like it SHOULD be a slam dunk for Twin. They have permission and blood moon to shut down Bloom Titan’s plan, and then Bloom Titan has very few tools to stop them from just comboing out. From a theoretical level, it’s really hard to imagine a way Bloom Titan wins, save for the occasional turn 2 kill and the usual lucky breaks that come with any match-up.

    1. The Twin player is one under the pressure. When Amulet Bloom player manages to find his Cavern of Souls and gets to six mana, it trumps Twin. And it’s not like Twin’s permission is very hard or it’s damage clock is very fast. Just a bunch of remands and 3-mana 2/1s or 1/4s. Twin wins if it manages to combo off early or resolve a Blood Moon and Bloom player has no interaction with that – and remember that they could lways be holding a Slaughter Pact. In the end, the Twin has ways to easily kill the opponent, but the Bloom player has option to go both under it with a god hand or over it if he has drew a piece of interaction or two.

      1. Hello dear kanister. So, you are stating that the Amulet players has got an opener with both Tolaria and a Titan-or-G-pact, or finds one of them in the process.
        And if not?
        If you just have a Titan and no Tolaria/Cavern? Especially if you run Cavern. I am playing 2 Cavern of souls just for this reason.

    2. For me, this is the MTGO player experience gap at its finest. Amulet players are very experienced with the deck: Amulet has one of the lowest ratios of unique players to appearances (and maybe THE lowest, if I remember correctly). Twin, on the other hand, has so many players with such a huge range of skills. If those experienced Amulet pilots are always at the helm of their deck, and Twin players have a much greater range of experience, then this is going to rapidly tend towards Amulet.

  7. I can’t help but metion 2 odd results I noticed:

    Abzan has a bad Twin match-up? why? is blood moon that much of deal? Abzan can pack tons of instant speed enchancement removal in the form of Abrupt Decay, Dromoka’s Command and Golgari Charm and it’s disruption paired with boltproof creatures seem more than enough to smash UR mages, I don’t know why the statistics point that

    Jund has a good Burn match-up? I’ve seen this elsewhere too and i’ll make a risky assumption to explain this: Burn attracts new pilots more than any other deck due to a combination of effectiveness, simplicity and low price. Any of us who has a friend who is about to start modern would probably suggest him to start with burn and see how it goes.

    on the other hand people with full BG and UR shells are most of the time seasoned players, why? because for someone close to the average (EU/US) income it takes years to collect all those cards, therefore he’s got to have learned something in all this time, also there much less young players in these archetypes as it’s simply hard to convince a parent to give you a 4 digit number of dollars/euros to buy cards

    ofc good and bad/ old and new players can be found everywhere, I do not mean to underestimate anyone, but it’s natural that budget choices are much more likely to be piloted by inexperienced players, which will affect these statistics due to their high numbers, on the other hand Jund ‘newbies’ are an oddity (who would invest 1600+$ on a game he just started playing? and might not even like in the long run) and therefore unlikely to affect statistics in a significant manner

    1. Re: Abzan vs. Twin
      I think this is a player skill issue more than anything. Most people aren’t playing Abzan on MTGO, which suggests the remaining Abzan players are either really good with the deck (some of them) or really bad and have made a poor decision (probably more of them). It’s way easier to misplay against Twin, especially the UR versions with Moon and a more control-oriented gameplan, than it is for Twin to misplay against Abzan. I think this accounts for a big part of these results.

      Re: Jund vs. Burn
      This one actually seems more legitimate. At absolute worst, this should be about 50-50. Burn can race Jund, but Jund has a less painful manabase, Bolt, and potentially Huntmaster. Even the more conservative estimates I have seen put it somewhere between 50-50 and 60-40, the latter in Jund’s favor.

  8. your data has a number of errors
    Affinity vs Burn is 26/43 but Burn vs Affinity is 18/43 18+26=44…
    if Abzan vs Burn is 16/36 than it is 44%, not 36.4%
    and I didnt check all numbers
    Also i would say your dataset is not enough for many matchups
    and to dig into Tier2 decks which is also very impotant (because they can win tournaments, cant they?) dataset has to be enlarged greatly

    1. I’d have to check the spreadsheet but those first things are probably just data entry errors in the article itself. There was no easy way to copy and paste the data into the article so this kind of mistake can definitely happen.

      As for the sample size, I made quite a few disclaimers on this point throughout the article. We obviously don’t have enough data for the tier 2 decks, which is a big reason why I didn’t report it. We do, however, have enough for numerous tier 1 matchups. And in instances where the N is a little low and/or the observed win rate doesn’t align with our expectations, I try to note that in the article itself. This crosschecking between the observed rate and our experiences is a strong way to shore up an otherwise smaller-N dataset, and it’s the method employed here in a number of cases.

Leave a Reply