Bloodbraid Month, Pt. 4: Quantitative Data

Alright, time for the big reveal. With comment section chatter over possible unbannings increasing, it’s time for the hard data to make its own argument. I had less fun with this test than with previous efforts, mostly because I have a long and difficult history with Bloodbraid Elf. That said, the testing was comparatively easy and painless. The combination of straightforward play and considerable experience paid off with this test.

In this series, I take a card from the Modern Banned and Restricted List and test it against a gauntlet of current Tier 1 decks. I am trying to evaluate its power in the current field and determine if it is plausible to unban. So far I’ve tested Stoneforge Mystic, Jace, the Mind Sculptor, and Preordain. Now it’s time for Bloodbraid Elf. I’ve been discussing its history, how it would fit into current decks, and finally the intangibles of playing the card. Today we’ll see the hard test data.

An All-Encompassing Disclaimer

These are the results from my experiment. It is entirely possible that repetition will yield different results. This project models the effect that the banned card would have on the metagame as it stood when the experiment began. This result does not seek to be definitive, but rather to provide a starting point for discussions on whether the card should actually be banned.

Methodology

This test consists of 500 total matches: 250 with the control Jund list, and 250 with the test deck, Bloody Jund. This is so I get 100 matches in against each test deck, or a nice, round number (n) for my analysis. Play/draw alternates, as does which deck is played. The first match with the control list is followed by the the first match with the test deck. The purpose is to mitigate the effect increasing experience and familiarity play on the match results. Sideboarding strategy is decided before testing begins and never changes, even when it is determined to be wrong. Otherwise, the results would be invalid. We also behaved like we didn’t know what the matchup was game one.

Testing was done primarily over Skype with paper cards. We don’t use MTGO because timing-out and misclicking can ruin the data. Accuracy is more important than win percentage. Also, Skype and proxies are free, while buying a deck for testing purposes on MTGO isn’t. We previously used free simulation programs, but they proved too time-consuming for my team’s tastes. When everything is manual, the clicks needed to play become maddening.

Note on Significance

When I refer to statistical significance, I really mean probability. Specifically, the probability that the differences between a set of results are the result of the trial and not normal variance. Statistical tests are used to evaluate whether normal variance is behind the result, or if the experiment caused a noticeable change in result. This is expressed in confidence intervals determined by the p-value from the statistical test. In other words, statistical testing determines how confident researchers are that their results came from the test and not from chance.

If a test yields p > .1, the test is not significant, as we are less than 90% certain that the result isn’t variance. If p < .1, then the result is significant at the 90% level. This is considered weakly significant and insufficiently conclusive by most academic standards; however, it can be acceptable when the n-value of the data set is low. While you can get significant results with as few as 30 entries, it takes huge disparities to produce significant results, so sometimes 90% confidence is all that is achievable. p < .05 is the 95% confidence interval, and is considered a significant result. The data is almost certainly the result of the experiment. Should p < .01, the result is significant at the 99% interval, which is as close to certainty as you can get. When looking at the results, just look at the p-value to see if the data is significant.

Data

Alright, enough waiting. First, I will report the overall win percentages. Then I will post the results of the z-test to show whether the result is significant. I use the z-test because it’s the more common test. I do other tests to confirm my result, but I won’t report them. I’ll finish this section off with some interesting statistics I kept during the test.

  • Total wins – 269
  • Total win % – 53.8%
  • Total control wins – 122
  • Control win % – 48.8%
  • Total test wins – 147
  • Test win % – 58.8%

Overall, Jund had a favorable record against my gauntlet. The control deck was just under 50% against the field, while Bloody Jund shot up to ~60%. That is quite the result and the z-test result should not be surprising.

As you can see, this result is significant at the 95% level, and very nearly at 99%. Including Bloodbraid Elf strongly affected the match results. This was not surprising to me, as I remember just how powerful the card has always been. Some other interesting results from the test:

  • Average cascade length – 2.31 cards
  • Longest cascade – 8 cards (all lands)
  • Average cascade hit’s mana cost – 2.72
  • Times cascading past other Bloodbraids – 98 (once past all three!)
  • Average turn playing first Bloodbraid – 4.98
  • Times losing to Blood Moon – 15

Anyway, enough justifying my obsessive note-taking; time to actually make sense of the results. This necessitates breaking the total data down by gauntlet deck, but I must restate that the n of these tests is small in comparison. The threshold for significance is much higher.

Quick aside: the metagame today looks a lot like the metagame back in 2012. Tron, Affinity, UWx Control, Storm, and creature toolbox (then Birthing Pod, now Collected Company) are all top decks. No, it’s not exactly the same as the last time Bloodbraid was loose, and Jund is not the same powerhouse either. However, it does indicate that the conditions that let Jund thrive back then are still present now, and leads me to speculate about history repeating itself.

Affinity

The classic matchup of two old rivals. In some senses, Jund and Affinity are Modern. If you don’t know how this matchup normally works it’s a removal heavy deck against a small creature deck. Jund wins through superior attrition while Affinity wins either through blitzkrieg or Etched Champion.

  • Control Deck wins – 28, 56%
  • Test Deck wins – 28, 56%

Dead even. Let’s check out my numbers anyway.

Absolutley not significant. The matchup is determined by factors not related to Bloodbraid Elf. Specifically, whether Jund clunks out and doesn’t kill enough robots to stifle Affinity. The deck is so fast that hand disruption is minimally effective, and if Affinity can stick Etched Champion with any kind of power boost (and maintain protection from colors), they’re in for an easy game. Otherwise it’s a Jund-favoring slog through removal. Extra card advantage and tempo on turn four don’t dramatically alter the odds of either scenario.

Sideboarding

Normal Jund:

-1 Chandra, Torch of Defiance
-3 Thoughtseize
-1 Liliana of the Veil

+2 Ancient Grudge
+2 Kitchen Finks
+1 Damnation

Bloody Jund:

-1 Liliana of the Veil
-4 Thoughtseize

+2 Ancient Grudge
+2 Kitchen Finks
+1 Anger of the Gods

Affinity:

-4 Galvanic Blast

+2 Blood Moon
+1 Bitterblossom
+1 Rest in Peace

GB Tron

And now for the traditional predator to Jund. Gx Tron has always been a hard matchup for Jund, which struggles to keep pace. Thoughtseize is critical so you don’t just lose to Tron’s bombs, but they always have more, and it’s hard to profitably interact. The GB version is said to be better than GR because of Collective Brutality, but I have no opinion.

  • Control Deck wins – 19, 38%
  • Test Deck wins – 26, 52%

That’s a very large spike. The additional maindeck Thoughtseize was a factor, but not only incrementally. It’s arguably the best maindeck card in the matchup, but there’s only one more copy so the benefit is small. There’s more to this.

The result is significant at the 90% level but not at 95%. There’s that problem of the small n, as previously mentioned. I would say that these results are probably significant, contingent on additional study.

Bloody Jund had the same problems as Jund against Tron: it just doesn’t measure up in raw power or speed. However, Bloodbraid allowed Jund to make up for that with card advantage and tempo. Even when Jund was behind, playing two spells made catch-up significantly easier. Tron has also cut down on Wurmcoil Engine, and that card was Jund’s worst nightmare. Not a lot killed the initial Wurm, and then you had to expend additional resources to kill the tokens.

Sideboarding

Normal Jund:

-1 Liliana, the Last Hope
-2 Abrupt Decay
-3 Fatal Push

+1 Liliana of the Veil
+2 Surgical Extraction
+3 Fulminator Mage

Bloody Jund:

-2 Abrupt Decay
-2 Fatal Push
-1 Lightning Bolt

+2 Surgical Extraction
+3 Fulminator Mage

GB Tron:

None

Gifts Storm

This matchup is about Jund’s clock. You can have all the disruption in the world (and Jund does), but if you don’t end the game, Storm will eventually find Past in Flames and enough mana to win.

  • Control Deck wins – 26, 52%
  • Test Deck wins – 30, 60%
  • Turn three Storm wins – 5 (3 against control, 2 against test)

That is an interesting jump, but it is not going to be significant. This doesn’t surprise me, there is a lot of variance associated with Storm. For example, I lost once to a turn two Blood Moon as the control deck and three times with the test deck. That’s just Storm variance and Bloodbraid Elf or my play had little effect.

As I said, not a significant result. There was so much going on with Storm that I never felt that my own play mattered as much. As long as I had some kind of clock and had disruption, I’d done all I could.

Sideboarding

Normal Jund:

-2 Terminate
-2 Fatal Push
-1 Liliana, the Last Hope

+2 Collected Brutality
+2 Grafdigger’s Cage
+1 Liliana of the Veil

Bloody Jund:

-2 Terminate
-2 Fatal Push

+2 Collected Brutality
+2 Grafdigger’s Cage

Gifts Storm:

-3 Remand
-1 Noxious Revival
-1 Gifts Ungiven

+3 Blood Moon
+1 Echoing Truth
+1 Pieces of the Puzzle

Additional Notes

The Jund sideboarding guide said to do things this way. I asked about the Surgical Extractions and was told no. Apparently, Grafdigger’s Cage and Scavenging Ooze are enough. You’re free to disagree, but given Storm’s sideboarding toward Blood Moon and away from the graveyard, I see the point.

Grixis Shadow

The deck that eventually supplanted traditional Jund. I thought this would be a worse matchup than it proved to be, all things considered. Jund has a higher density of relevant cards while Grixis has larger threats and more ways to find them.

  • Control Deck wins – 24, 48%
  • Test Deck wins – 30, 60%

This is almost a significant result. The decks are far more evenly matched than anyone figured. This indicates to me the preference for Grixis over Jund comes from other matchups rather than any advantage over the deck.

The matchup was a weird kind of attrition: the most important spells are the discard and kill spells, which are nearly identical across decks. The blue cantrips made it more likely Grixis would see them, but that deck also had a harder time getting out threats. I also think that I played this matchup wrong, as it became clear during testing that Jund did better when it went wide around the bigger Grixis threats, making patience critical for Bloody Jund. I should have been sideboarding to take advantage of this revelation, but it was too late.

Sideboarding

Normal Jund:

-1 Fatal Push
-2 Lightning Bolt
-2 Abrupt Decay

+1 Damnation
+2 Surgical Extraction
+2 Kitchen Finks

Bloody Jund:

-3 Lightning Bolt
-2 Abrupt Decay

+1 Terminate
+2 Surgical Extraction
+2 Kitchen Finks

Grixis Death’s Shadow

-2 Stubborn Denial

+1 Liliana, the Last Hope
+1 Kolaghan’s Command

Jeskai Tempo

It’s rather fortuitous that I’m doing the gauntlet alphabetically, as it lets me save the most interesting result for last. Jeskai Tempo has arguably been the best deck over the past few months even if it’s slipping in our rankings. Its combination of removal and hard-to-fight threats is remarkably Jund-like, and I think it even plays like Jund.

  • Control Deck wins – 25, 50%
  • Test Deck wins – 33, 66%

That is a very large jump. The decks are fighting a war of attrition where tempo is a factor, or just the kind of fight that Bloodbraid Elf wins. The Elf substantially impacted the matchup.

This result is significant at the 90% level and very nearly at 95%. One more win or a control loss was needed. I would say again that this individual result is probably significant, with a high likelihood of confirmation.

Bloodbraid Elf let Jund really break things open in this matchup. Jeskai is all about incremental advantage, which is why Dark Confidant is so important to the matchup if Jeskai doesn’t have a way to kill it. Bloodbraid accomplished the same job, but immediately, and with a tempo boost to boot. Jeskai tempo doesn’t have a similar gamebreaker, and so fell behind against Bloody Jund far more often than the normal version. This isn’t surprising: this is why Jund killed traditional control when it had Bloodbraid previously. This result was just a confirmation.

Sideboarding

Normal Jund:

-3 Fatal Push
-2 Abrupt Decay
-2 Tarmogoyf

+3 Fulminator Mage
+2 Surgical Extraction
+2 Kitchen Finks

Bloody Jund:

-2Fatal Push
-2 Abrupt Decay
-2 Tarmogoyf

+2 Fulminator Mage
+2 Surgical Extraction
+2 Kitchen Finks

Jeskai Control:

-2 Geist of Saint Traft
-2 Electrolyze
-2 Logic Knot

+2 Relic of Progenitus
+2 Pia and Kiran Nalaar
+1 Celestial Purge
+1 Vendilion Clique

Additional Notes

Not having mirror breakers like Ancestral Vision really hurt Jeskai.

What Does It Mean?

Jund was overall improved by the inclusion of Bloodbraid Elf, to the great surprise of nobody on my team. The most significant results were against Tron and Jeskai, respectively the worst matchup and a very even one. What this says about Bloodbraid Elf in the Modern metagame is the subject of next week’s article. See you then!

David began playing Magic during Odyssey block, quit playing Magic when Caw Blade ruled the world, and returned to Modern shortly before Deathrite was banned. He’s made an appearance at the Pro Tour, made money at GP Denver, and is constantly grinding and brewing in Modern.

21 thoughts on “Bloodbraid Month, Pt. 4: Quantitative Data

  1. Your avg cascade length (2.3) and mana cost (2.7) seem extremely high – is this correct? The entire list had six three drops – how could your avg hit be cmc 2.7 when it can never be more than 3?? And I suck at probability but I would think if a deck was fifty percent lands and fifty percent 3 or less cmc youd be 50/50 to find a land on top at any given point? So avg would be 1.5 in a deck with that kind of land density – but you have 2.3 and you are not a fifty percent land deck. Sooo you almost always found a land on top and sometimes multiple lands – and almost never a cascade hit on top?

    1. About 1/3 of the time there was a target on top. The mode of the cascade data is two by quite a large margin, however I had enough long cascades to pull the average up significantly, including 16 instances of cascade 6.
      And yes, the 2.7 is weird to the point of anomaly. I did hit Liliana and Kommand an inordinate amount of time, but the data set also included sideboard games where I had far more 3-drops and that shifted the average up. Also, between three draw steps minimum and often some Bob triggers, you tend to draw a lot of your 1- and 2- drops naturally before you could play Bloodbraid at all, making it more likely to hit high drops.

      1. I do still think finding land or elf on top two thirds of the time isn’t right when those cards are more like a generous fourty percent of the deck than sixty six.

        Similarly your board had two finks two fulminator and an anger. I don’t see many matches where all five come in, and on game one you have eleven one cmc cards vs six three cmcs anyways. The best scenario would maybe see your count of one’s and threes balanced putting avg cascade cmc squarely at two.

        Basically a ten percent jump in wins seems pretty high and I wonder if you just incidentally ran into an awful lot of Christmas lands with your cascades. I would also expect you lose serious points vs burn or other fast aggro with a set of seize and bobs and bobs flipping elves.

        1. Like I said, it is weird. I definitely saw Santa a lot this time, I’m not denying it. That’s why I never liked Bloodbraid back in the day, it always seemed like that was the norm. Though it was three Fulminators not two.

          As for Burn, it wasn’t that bad a matchup in 2012 and I imagine it wouldn’t be the worst now either. Yes, Bob is a huge liability there but there have been times when Jund ran him alongside Hit // Run, so maybe you don’t care. You do still board him out against Burn, an Thoughtseize actually isn’t bad against Burn since it’s functionally gaining one life per burn spell it takes. Inquisition is miles better though.

  2. While the results are not surprising, I think it should be pointed it out that if Bloodbraid Elf was taken off the banlist the metagame would likely adjust. The decks tested represent decks unprepared for a more powerful Jund. If Jund suddenly saw a resurgence do to an elf unban I would think the meta would change enough to tamp down the huge swings in win percentage shown here. Jeskai (AV or a more control list without geist) and Tron especially could make adjustments to make the matchup more favorable.

    I personally would like to see Bloodbraid Elf come off the banlist. Modern is a format of “unfair” decks and bloodbraid is the exact sort of card non-blue decks need to keep up with the nonsense in the format.

  3. At this stage I think there’s some questions I’d ask about the general validity of a test like this.

    Incredible effort by the way. All in the name of learning more, it’s really great to see.

    Let’s examine a couple of things you’ve said though:

    1) tron matchups. You indicate that more wurmcoils would almost invalidate the added bonus from including bloodbraid. I’m inclined to agree. What’s tricky here is what this means in the wider context. Tron players often fluidly shift around on numbers of cards depending on the metagame. You can bet that within a week or so, if bloodbraid was unleashed, the composition of threats would change accordingly, thereby making your results here a bit less valid.

    What’s more, every deck would do this, to varying degrees. It’s possible that your snapshot from which you are taking the sample matches may be giving unfairly weighted results :S

    Although on the other hand, modern (when taken as a whole) is a vast soup of decks and players often deviate from a generally accepted ‘optimal’ build, so the counter argument is that at least this data means something in the abstract, even if it doesn’t apply to a changing fluid metagame.

    I don’t envy your defense of these data. If this was a postgrad thesis i’m sure the prof would have a lot to say 😛

    Either way it was an enjoyable read and I’m thankful for your effort here. I don’t currently know of anyone else doing something like this so it’s not only a novelty but there might be some interesting lessons from it as well.

      1. I do like that you’re splitting the subjective from the objective more clearly with the staggered articles. And I think it’s fair to just say with no adaptation by the meta bbe improves a number of key matches for Jund by around ten percent. I’d have to go back and see how that number compares to your other tests. I feel like it is way higher than the stone forge results.

        I also think it’s tough in the abstract to know what percent increase would be dangerous for the format. Its relative obviously as moving a twenty percent deck up to fourty is fine but moving a fourty five to sixty five is not. So there’s probably an absolute you don’t want to cross like sixty percent vs the gauntlet?

  4. No one else has asked this so I’m sure I’m just missing something, but even after a re-read I can’t figure it out, how exactly do your win% stats vs individual decks work? You stated that your overall win% was around 50%, yet in the individual matchups I see nothing above 33%. What am I reading wrong, and what (assuming you have the numbers) was Jund’s win% vs each deck (control version & test deck)?

    1. You are reading it wrong. When I did the individual stats I said Control/test deck wins: total wins, win percentage. You are reading the total wins as the win percentage.

      1. So what are the actual win percentages for each then? These percentages obviously doesn’t make it very clear to us novice statisticians because, to me, the “total wins” would be the decks actual “win percentage”? Obviously, this is not the case, so, I’m unsure of how to get the actual win percentage from these results.

        1. Exactly what was reported. All the individual results are out of 50. So for the Affinity matchup both the test Jund deck and the control Jund deck won 28 matches and 28/50=56%.

    1. RUG lacks decent hits besides AV. Goyf and Scooze as threats, sure, but then what? Bob, Lili, KC, Decay, Brutality… black just has so many amazing cascade targets. And RUG can’t really interact with noncreature spells because permission clashes with cascade and blue lacks targeted discard (not that your creature removal is even passable in these colors). I’m pessimistic.

      1. Tell you what though, as an extra option in the card pool, RUG players would rejoice at having the option, even if jund overall was a better fit for the card.

        Some sort of aggro/ponza list? I can recall these decks existing back around kamigawa standard and I’m sure stone rain was hit off cascade many times in modern before the elf was confined to a cage :D. Not proposing any sort of top tier viability, but most of modern is brewtopia anyway. The tiered, visible portion of modern is such a small (albeit influential) slice i’m sure RUG would be happy to welcome the elf back to the fold.

        Regards,

        1. The Ponza Elf decks were usually Zoo-style and ran Boom//Bust (do a search on mtgtop8), but that isn’t a hittable target anymore with the CMC rule change on split cards. Not saying Elf wouldn’t improve the RUG wedge generally, just that it wouldn’t suddenly birth a good RUG deck, as it doesn’t solve any of the combination’s problems.

  5. The thing I find most interesting is the drastic increase in winrate against Tron. On paper, that doesn’t make any sense to me… I can see a marginal improvement but there’s nothing about BBE that is inherently useful against Tron. Perhaps the ability to kill a Karn that just -3’d? That can’t come up that often though. Same goes with cascading into Fulminators or at least digging further towards them. Aren’t these super marginal bonuses? Additionally, if there was an insurgence of Bloody Jund, Tron would probably add a couple more Wurmcoils, making Jund’s life way harder. Bottom line: I don’t ever see Jund getting a better than 40% winrate against Tron without heavily warping its sideboard.

    1. You are correct that each of those improvements are marginal, though you put enough marginal buffs together and they add up. The real reason is how the matchup dynamic has changed. Jund wins by doing something slightly more powerful than you at every point of its curve. Tron wins by doing something significantly better on turn three and each turn thereafter. That’s why Tron’s such a bad matchup, it’s overpowering everything Jund has and requires either specific answers and lots of them. The way to beat Tron is by swarming it. Creature swarms or decks that can just play lots of spells make up for that individual power deficit in bulk so that Tron struggles to keep up (unless they have Ugin). Bloodbraid lets you play two spells a turn and that quantity helps with the power deficit, which greatly improves the matchup. I don’t think it becomes favorable, but even is definitely possible. Remember, Tron preyed on Jund in 2012 but it wasn’t oppressing it.

      1. Makes sense. It’s just that I’ve always viewed Jund vs. Tron as the defining polarizing matchup of the format, the epitome of a matchup you might as well give up since there’s nothing you can reasonably do to improve it. So if BBE actually improves Jund to an even matchup, to me that’s ringing some serious alarm bells. If big mana can’t prey on Jund, what can?

        1. Tron may or may not be able to hold down Jund, but it isn’t the only big mana deck out there. For example, I doubt that Valakut decks suddenly become good matchups because of Bloodbraid.

Leave a Reply