menu

Testing Stoneforge Mystic in Modern: Part One

Are you a Quiet Speculation member?

If not, now is a perfect time to join up! Our powerful tools, breaking-news analysis, and exclusive Discord channel will make sure you stay up to date and ahead of the curve.

The banned list is one of the hot Modern topics whenever a new set is released. Everyone is speculating about what, if anything, will get the ax or be unleashed upon the world. Speculation this time is focused on Infect and/or Dredge taking a hit and Bloodbraid Elf coming off the list. I'm not here to ad to the speculation but instead provide hard data on whether an unrelated card should come off.

stoneforge-mystic-banner-cropped

I have been hinting at (and making excuses for) this article for weeks now. The time has finally come for me to publish my findings. Today I begin presenting the results of my investigation into the viability of unbanning Stoneforge Mystic. It will be quite long, so today will present the setup and methodology and next week I will actually present my data.

The Prelude

Long time readers may remember that last December Sheridan tested Stoneforge Mystic in an Abzan list against Afffinity. What he found was that the option for a turn-three Batterskull did not significantly impact the matchup game 1 and that sideboard cards played a much larger role in giving Abzan a 50% win rate against Affinity. For reference, here's the deck Sheridan used:

[wp_ad_camp_1]

I don't doubt his results are accurate, but I don't think they really tell the story. Affinity has plenty of ways to get around Batterskull so I never expected Stoneforge to have much effect there. Affinity is a "fair" deck (I really need to come up with a better term for that kind of deck) and can ignore most of what Abzan is doing. What I was always interested in was the effect it would have on fair decks, and Sheridan never got a chance to test those.

Expanded Scope

Additionally, Sheridan mentioned that he wanted to do more testing with other decks, so I started gathering data for him. Specifically I started testing a TwinBlade deck, which was Jeskai Twin with Stoneforge Mystic and a pair of Batterskulls. I was mostly done with data collection when Splinter Twin got banned, rendering it all moot.

Splinter TwinWhat I can say about TwinBlade was that it was a nightmare to play against. I tested Burn and was working on Jund and Stoneforge had a noticeable, trending toward significant, impact on both matchups. Burn traditionally had trouble against Twin because it couldn't win quickly enough to beat the combo when Twin had some interaction while the consensus of Twin vs. Jund was that it was 50/50.

The addition of Mystic definitively pushed Twin over Burn. Repeatable lifegain is unsurprisingly hard for Burn to beat, and trying to do so left them open to being comboed out. Jund was also losing ground, though I was never certain if that was due to Mystic herself or if we were just playing the matchups poorly. Trying to defend against the combo and Batterskull spread Jund pretty thin, but that might have been player error.

In any case, the threat of that deck was going to lead me to recommend that Mystic never be unbanned. With Twin gone, I thought it worth looking into again.

Establish Procedures

Having decided to test out Stoneforge, and that I wanted to provide a definitive answer about its impact, I knew that meant I had to test a lot of decks. The problem was that there isn't as strong a consensus about Abzan's other matchups besides Affinity. I decided to establish a baseline myself. This would involve playing a stock Abzan list against a test gauntlet and then running it again with the Stoneforge list. After some scouring, this is what I came up with:

Keep in mind that I began the process in late June, so the Grim Flayer and Collective Brutality technology didn't exist at the time. At this time I also decided that I wanted to use Sheridan's results in my final analysis since it was an already complied data point. To make this work I would be using his list for the actual testing, which was not a problem since at the time Abzan hadn't dramatically evolved since December.

The Gauntlet

I wanted a mix of fair and less-fair decks for my gauntlet. I also wanted the results to be applicable to the metagame as it existed when I began. Complaints about linearity and aggro saturation were particularly high at the time, I so settled upon some fair and unfair linear aggro and the most successful truly unfair deck in Modern. The other consideration was that I wanted decks where Mystic could have an impact. I doubt very strongly that Tron cares about an artifact that's smaller than Wurmcoil Engine, and I wanted to improve the chances of results worth reporting.

I also made sure to go as stock as possible with these lists. I wanted the most representative results as possible, and the less common builds could have skewed things. This was difficult for Burn and Infect as everyone has their own take and I ended up aggregating them to find the "average" deck. The rest seemed to be pretty close to consensus and were relatively easy. As a bonus, the decks had sideboards that were reasonable in a Mystic-fueled Modern.

If traditional Naya or 5-Color Zoo had any metagame presence at the time I would have gone with those as they're closer to what players think of when we talk about fast aggressive decks. The Burn decks that run Wild Nacatl may have a different result than this more traditional list, but the version above is still widely represented and there is considerable dissent about which is better.

Burn was a good choice for the red side of aggro, but as for the non-red I really had only one choice. I wanted top-tier decks that had proven themselves and when I started, there was only one deck that fit the criteria.

Honestly, even if Merfolk wasn't Tier 1 I would have tested it anyway. It's my deck and I want to know what effect Mystic would have on it. Testing with this deck also reminded me why I play UW Merfolk instead. I ended up missing Path to Exile and Echoing Truth, as well as my sideboard, and being underwhelmed by Harbinger. Still, I'm the only one playing that version, so I played the same deck everyone else does.

And then we have the most complained-about deck (that isn't Dredge).

Infect has the fastest kill in the format, but it's fairly vulnerable to Jund and Abzan's disruption, and like Affinity it can ignore Batterskull. This would really show how powerful a threat it is rather than just acting as a wall and lifegain source.

Ad Naus is the most successful unfair deck in Modern now. Grishoalbrand is more broken but also inconsistent, and rarely appears on our tiering charts. Scapeshift is a fair deck and Titan Breach really wasn't a deck when I started.

Despite what was said during the World Championship I think the matchup of Abzan vs. combo decks is pretty even. When Abzan goes Inquisition, Tarmogoyf, Liliana, it's hard to lose. If it doesn't get the right disruption or a decent clock it will lose. Testing would be focused on whether Mystic improves the clock enough to shift the matchup.

Project Creep

I was proceeding through testing all these decks when I began to notice a trend in the data. This trend was interesting enough to want to confirm the result, despite the exhaustion all this Magic was causing. However with PPTQ season getting underway I didn't think that was possible. Then I won won the first one and suddenly I didn't need to test for real anymore. With my ticket to the RPTQ punched (Congratulations to Jordan for doing the same) I had the time to actually test more decks. To confirm the data trend I would need another fair deck and a less fair one. Thus I added two more decks to my gauntlet.

Jund is the poster child for fair decks and I would have gone with it if I could have found a Jund player to test with. I didn't, but a Jeskai player volunteered, and Jeskai will do.

Dredge seemed like a good candidate for the unfair deck. It was the new hotness at the time and while I didn't think turn-three Batterskull would be good, that was actually in line with the phenomenon I wanted to test. The problem was that after the practice matches it was clear that Abzan's win percentage game one was too low and the sideboard matches too swingy for me to consider the data valid. Abandoning that, I looked at the current tiered unfair decks and went with Death's Shadow.

Death's Shadow presents itself as another Zoo deck but with an unfair fast win, coupled with consistency, that pushes aggro decks out of fair territory. On reflection, picking a deck that straddles fair and unfair is the best indication of what Stoneforge will actually do to both. Tracking the fair Zoo style wins versus the Become Immense wins proved enlightening.

Adding all these decks to the gauntlet and finding experienced pilots to work with added several weeks to the project. For anyone looking to perform a similar test, take care to limit yourself and keep your curiosity in check or project creep like this will ruin you. If I wasn't butting up against the next banned announcement I might still be collecting data. Which brings us to my actual methodology.

Methodology

I would be playing the Abzan decks. My project, I would do the grunt work. I didn't want to switch off piloting decks because I wanted to model how these matchups would actually play out in "real Magic," where players know their decks and know the matchups.Stoneforge Mystic This required finding experienced pilots who were as crazy as I am, who specialized in the decks I wanted to test, and were willing to use these stock lists (on which I negotiated with a few on what actually went into the lists).

This was about as hard as you'd think, especially when I explained the scale of the project. In the end I found online players for Burn, Merfolk, Death's Shadow, and Ad Nauseam. The previously codenamed "Elliot" agreed to pilot Infect and then Jeskai in paper after some begging persuasion. As I'm writing this my online partners have not told me how they want to be credited. If I get responses, I will add them in.

Test Parameters

I ambitiously set the target of 100 matches per deck, 50 with the "normal" configuration and 50 with Stoneforge. This actually isn't a large enough n value for a true statistical study, but it would be reasonably representative. Play/draw was alternated with the initial decision based on coin flip, ensuring 25 games a piece on the play for each deck. Sideboarding was included, and will be included in the discussion of the data next week. The testing was conducted over a number of sessions due to scheduling concerns/MODO crashes. "Elliot" testing was done in person, the rest were online.

lightning-stormDuring the Ad Naus sessions we made special consideration for how Lightning Storm doesn't really work online. We both knew what was supposed to happen, so if that wasn't reflected by the interface we discussed what would have actually happened in paper and recorded that result. Misclicks were also accounted for, with some matches thrown out and repeated.

Prior to the actual test games a minimum of ten practice games were played against each Abzan deck so that we could get our eyes in and get a feel for the matchup to better mimic Stoneforge actually being legal. It also helped us to get the "correct" sideboarding strategy worked out. Once that was decided upon it was not changed for the duration of testing, even when we later concluded in several cases that there was a better strategy.

Next Stop: Enlightenment

Let me begin concluding by saying that this was not a fun exercise, but it was educational and I am a better player for the effort. Magic should be fun, and this grinding was exhausting and enraging (my distaste for MODO approached a burning hatred many times). It will be a while before I try this again, and probably longer before I find anyone willing to join in my madness.

Next week I will present the sideboarding strategies and win percentages, and explain what it all means. See you then!

Read about David's conclusions in his subsequent article, Testing Stoneforge Mystic in Modern: Part Two.

20 thoughts on “Testing Stoneforge Mystic in Modern: Part One

  1. Can’t wait to see the results. I’d also be curious to see how SfM would do when slotted in to other decks (such as Jeskai), but obviously that would require another gauntlet. I supposed seeing her impact across so many match-ups even in only one deck should give us a pretty good idea whether or not she’s safe. For the record, my prediction is that she is. We’ll see next week if I’m right!

    1. I can’t answer that, partially because I’ve been sitting on mine since Cawblade, partially because it would be a huge spoiler, and partially because WOTC is a loose cannon and you never know with them.

    2. It is my general recommendation that you should have at least a playset of every card on the Modern Banned List. Every single unban has led to a huge price increase – even if just short term – in the relevant card or cards.

      1. I wouldn’t go that far, but I bought my thopter/swords two banlist updates before they came off, and got my AV right before it got unbanned. You have to think about what’s realistically coming off in the near future. Buying hypergenesis and skullclamps right now is probably not a good use of money, even if they get unbanned 5 years from now.

        I could imagine stoneforge and bloodbraid coming off in the not too distant future (though probably not the kaladesh update) and I could imagine the artifact lands coming off and mox opal going on. The big difference with stoneforge is its a pretty expensive card to invest in if it stays on the list for another couple years.

        1. My opinion on this topic is:

          Things which are realistic to come of the list within a couple of years:

          SfM, Preordain, DTT, Dark Depths (no Tier 1 deck + enables a new strategy and is good as a control/value finisher and bad as a combo finisher)

          Stuff which is likely but will take time:

          BBE, Seething Song, Artifact lands (they actually make Affinity worse XD), DRS

          Greetings,
          Kathal

          1. I hope you’re right except for the artifact lands. They’re still banned due to krack-clan ironworks not affinity (they indeed make it worse). Stoneforge, Preordain, Dig through time and Dark Depths all together sound a little worrying, but it would be nice to see them in a modern tournament.

  2. Awesome idea, really! As a huge advocate of having as little of a ban list as possible (while of course promoting a healthy format) and a lover of White who wishes it had a bigger role in Modern, I look forward to seeing the results. Quick question; why did you decide to test SFM in Abzan? Is it because you were continuing Sheridan’s experiment, or is it because you think that’s the natural first home for the card?

        1. Probably not, but it would initially be the most played thanks to Legacy. Given what I think the format would end up doing (coming next week) Jeskai would be my guess.

          1. Jeskai, Abzan, Death and Taxes have already been highlighted as some possible deck that could improve with a Stoneforge unban. I think that Affinity too would benefit for having 8 possible plating (and a creature that can carry it). It could even be sleaved up in Bant Knightfall and some builds of Zoo. Aren’t those a bit too many decks? Wouldn’t it hurt metagame diversity? I think that this is the major issue against Stoneforge unban. Not the power level.

          2. Mikefon, I’m actually not too worried about that. It’d be no different than Jeksai Control, Grixis Delver, and Burn all using Lightning Bolt but using the card in very different planned ways. The only problem would be if it takes the decks as a whole too far past the power level of the decks around it.

          3. As a reply to your comment mikefon. Sfm will only be played in affinity the first week before people realize that is terrible in the deck since it is slow and the body doesnt matter. If they wanted that effect they would play the one mana sorcery that does the same thing.
            Saying that it hurts diversity because of the fact that it will be played in a lot of decks is like saying goyf and snap huts diversity of those colors. While they in fact enable different decks to be competative. I think we will Stoneforge help a lot more new decks than overpowering the one that would benefit from that card already.

  3. I must say Im surprised you didnt touch on two cards that I think are very influential to a possible unbanning.
    Collective Brutality and Kommand both deal with stoneforge with extreme efficiency. Im honestly hoping Mardu can become a viable deck on the back of a stoneforge unban.

    1. Answers are not a reason to unban cards, as the old saying goes “There are bad answers. There are no bad threats.” Cards like Kommand give decks legs against Stoneforge, but you could easily play answers like Dispel and the threat will still be just as problematic (Brutality just kills Stoneforge like Abrupt Decay). If I’m testing banned cards I need to know if the threat itself is trouble, not whether or not there are answers to it. A sufficiently powerful threat will see play regardless of how easily answered it is.

Join the conversation

Want Prices?

Browse thousands of prices with the first and most comprehensive MTG Finance tool around.


Trader Tools lists both buylist and retail prices for every MTG card, going back a decade.

Quiet Speculation