Testing the Consistency of Ancient Stirrings

There was much debate a little while back about the Modern bannings. Should Mox Opal get the axe? Is the card selection of Ancient Stirrings too good for Modern? I saw a lot of people making arguments on either side, but I did not see a lot of solid evidence for either case. It was more based on how good KCI was doing in tournaments and whether or not it is too good to exist in Modern in a theoretical sense. I think it would be useful in this debate to attempt to quantify the power of Ancient Stirrings. Due to the complexities of Magic, precisely quantifying the power of any particular card is an immense problem. For solving these classes of problems I prefer to enlist the aid of computers.

Utilizing Data

I love data. Every time I read about people accumulating large amounts of data in regards to Magic it gets me excited. Pouring over logs of matchup data that we can use to inform us on current tournament Magic trends is a treat for me. I am frequently using tools like hypergeometric calculators to aid me in deck building. If you have never used one, I highly recommend it. The cross-section of statistics and gaming is underutilized and can greatly increase our understanding of Magic strategy.

Given that I have a background in computer programming, I occasionally put it to good use for Magic. My main use for it is in designing Monte Carlo simulations. When people use the term Monte Carlo, all it means is that randomness is involved. This can mean either that there is inherent randomness in the events being simulated, and/or that the actions being taken in the simulation are random. In the context of a game like chess where there is no randomness, the randomness could come from the moves selected. If I wanted to determine which opening move is best, I could randomly play out millions of games for each opening move to determine which move has the highest win percentage. In the context of a card game like Magic, the randomness comes from the cards drawn. For example, I could design a simulation for a combo deck that is trying to determine how often it can win on turn four just by goldfishing. The logic of the actions the simulated player takes are preset, but the cards that the player draws are random each time.

Monte Carlo simulations are incredibly useful for games that are as complex as Magic. If we want to use data to inform our decisions about Magic, we need a very large sample size. The sample size is much larger than any one individual just playing out games on their own can provide. Out of necessity, we need to speed up the process. With simulations, we can play out thousands or millions of games in the time it takes to shuffle up a deck.

Simulations have their limit though. They are good for answering simple questions, like how often you will draw a specific card. Answering a question like who is favored in a match-up is much too difficult. That level of analysis would require revolutionary complex AI.

What to Test

To see how good Ancient Stirrings is, I wanted to determine how much consistency it adds to a deck. Most decks that play Ancient Stirrings are playing it primarily to dig for specific cards. In decks like Lantern Control or Tron, it is there to find Ensnaring Bridge or Tron lands. The card does have additional utility like finding lock pieces in the case of Lantern, or payoffs in the case of Tron. That utility is more of a secondary benefit instead of its main purpose. If those decks could play one-mana tutors that only grabbed specifically Ensnaring Bridge or one of the Tron lands, they certainly would. I think seeing how close Ancient Stirrings is to one-mana Demonic Tutor is a reasonable measure of its power level.

For the purposes of testing, I chose to simulate goldfishing Lantern Control trying to find an Ensnaring Bridge. In a lot of matchups, the deck functions as a combo deck trying to find Ensnaring Bridge to lock the opponent out of the game. This is the perfect scenario for gauging the added consistency of Ancient Stirrings. We can treat Ancient Stirrings effectively as additional copies of Bridge—the question becomes exactly how many each Ancient Stirrings is worth.

Assumptions for Goldfishing

In designing simulations, certain assumptions need to be made. Magic is an incredibly complex game, and trying to capture all of that complexity is a difficult task. The beginning assumptions help simplify the problem for testing. It is important to be careful about the assumptions made, though. They need to be made in a way that still allow for drawing meaningful conclusions. If the assumptions are too broad, then the results will not be an accurate reflection of actual games.

I looked at a few different Lantern lists that have been posted lately to get an idea of the common mana bases. All of the lists I looked at play 18 lands and four Mox Opal. Counting the Mox Opals, there are 15 green sources in the mana base: four Glimmervoid, four Spire of Industry, and three Botanical Sanctum. For the purposes of my simulations, I assumed that all of the green sources could always tap for green. I ran some simulations with varying numbers of green sources, and it impacted the percentages by fewer than a whole percent, so I think this assumption is a reasonable approximation.

I used a very basic mulliganing heuristic. For six- and seven-card hands, if it contained six or more lands or fewer than two, it was a mulligan. The simulations kept all five-card hands. This mulliganing heuristic is fairly generous, but any more complexity would require more context than a goldfishing scenario could provide. None of the simulations accounted for scrying after mulligans. This deflates the results slightly, but the comparisons are unaffected.

The approach to playing out turns is straightforward. When playing a land, the simulation prioritized green sources over non-green sources. Whenever it had an Ancient Stirrings and an untapped green source, it would cast it. When deciding what card to take from Ancient Stirrings, it would prioritize, in order: an Ensnaring Bridge, a green source, a non-green source. After that, which card it takes does not really matter as it would have no impact on the simulation. Then, on turn three, it would determine whether or not it had found an Ensnaring Bridge and enough lands to cast it. Each time the simulation had a castable Bridge on turn three was counted as a success.

Conditions for the Simulations

For the simulations, I decided I wanted to compare the impact of adding more than four Ensnaring Bridges to a deck against the impact of four Ancient Stirrings. This will give insight into how close Ancient Stirrings is to a tutor. Tutors function as effective additional copies of a combo piece. The closer Ancient Stirrings is to adding an Ensnaring Bridge to the deck, the closer it is to a tutor.

I ran a total of twelve different simulations. The first was with four Ensnaring Bridges and no Ancient Stirrings, to serve as a baseline. The next had four Bridges and four Stirrings. Finally, I ran four different simulations with no Ancient Stirrings and 5, 6, 7, or 8 Bridges respectively. I did these six simulations for being on the play and for being on the draw to cover all goldfishing scenarios. For each of the twelve scenarios, I ran 100,000 goldfish games to provide a sufficient sample size. The program recorded the number of successful games as defined by casting an Ensnaring Bridge on turn three. Using that data, I determined the percentage of successful games.

Results

# of Ensnaring Bridge# of Ancient Stirrings% of games Bridge was cast (Play)% of games Bridge was cast (Draw)
4-36%43%
4448%55%
5-43%50.5%
6-48%57%
7-53%62%
8-58%67%

I find the results of this experiment astonishing. Having four Ancient Stirrings and four Ensnaring Bridges in the deck is very close to having six copies of Ensnaring Bridge. Each copy of Ancient Stirrings added to the deck is effectively half of a Demonic Tutor. The massive impact on consistency that Ancient Stirrings brings is something that I would not have intuitively picked up on while playing the deck. It only looks at the top five cards. That is only one twelfth of the deck. That is nowhere close to searching the entire library.

Seeing the impact of Ancient Stirrings in this context makes me want to look at some of the other card selection spells in Modern. Perhaps some of the two-mana ones that dig five cards deep are more playable than currently believed. Maybe cards like Peer Through DepthsGrisly Salvage, or Commune with the Gods are secretly great. The last two even have the side benefit of filling the graveyard.

It is entirely possible, however, that two mana is just the breaking point. One mana is very strong and possibly too good, and two mana might simply not be good enough. It is difficult to tell on its face, but that is what testing is for.

Should Ancient Stirrings be Banned?

Before picking up the ban hammer, it is important to keep in mind the deck-building constraints Ancient Stirrings imposes. Loading a deck with 40+ colorless cards to ensure that it always gets a card is a big ask. This leads to a lot of inflexibility in card choices. The card selection spells currently on the Modern banlist—Ponder and Preordain—only ask the deck to have lands that tap for blue. That is a much looser constraint. Personally, I like that Ancient Stirrings exists as a reward for building a colorless deck.

Ultimately, I do not think Ancient Stirrings deserves a ban. It is the best rate on any card selection spell in Modern, but the drawbacks in deckbuilding are too significant. It does function as a half tutor, but only for colorless cards. If a deck using Ancient Stirrings ever becomes too problematic for Modern, I envision that the problem lies with the card it is helping to find and not Ancient Stirrings itself.

6 thoughts on “Testing the Consistency of Ancient Stirrings

  1. Interesting to think of it as half a demonic tutor; but only when its finding a card with four copies. I dont think you can play 3 bridge 2 stirrings and get the same odds as 4 bridges – right?

    Change ancient stirrings to “nonland colourless card” or “land card” and we now have something hard to play. The average deck has 1/3rd lands – this thing is never whiffing even if all it did was found urza lands and inkmoth nexuses. Finding artifacts, eldrazis, and stuff like ugin plus any land in the game makes the drawback something of a joke. Finding a good deck that leverages stirrings isnt materially harder than finding a good deck that leverages serum visions. Playing ravagers or karns vs playing islands.

    1. Yeah, the effectiveness of Ancient Stirrings as being half a tutor is reliant on there being four copies of the card you find. The percentages will go down by a lot for each bridge you remove. The card is very muc4 unlike a tutor when you have a package of one ofs to search for.

      Out of curiosity, I ran my simulation with 1 bridge and 4 stirrings and that came out to 16%. While 2 bridge 0 stirrings is 20%. So, the comparison does not scale linearly.

  2. Okay, now calculate the probabilities of finding and casting your *one* copy of Ensnaring Bridge with and without four Stirrings. I doubt it’s anywhere close to “half a Demonic Tutor” in that case.

    Impulse and Commune with Nature effects are powerful consistency tools, but comparing them to actual tutors is overblown.

  3. The only rationale I could see for a banning of Ancient Stirrings would be Wizards saying something like: “We are fine with the effective power level of Ancient Stirrings decks (i.e. they’re fine with the power level of tron just going land, land, land, Karn), however, Ancient Stirrings enables those decks to be more consistent than we would like. Specifically in the case of tron, its role in both assembling tron by finding the missing lands, and finding a finisher all in the same 1 mana package is a bit too much.”

  4. Mox Opal is the dangerous card. If not for fear of people losing their minds, it would’ve already been banned. I’ve seen many bannings, over the last 20+ years; fast mana is dangerous. To be clear: Opal is dangerous—not broken

  5. I like the analysis as a whole, but saying “it adds deckbuilding constraints” i don’t think is particularly true. As an example, before Justin Cohen (and Sam Black) picked up Amulet, people were playing the deck with Serum Visions. Then, Cohen added Stirrings and the deck got better. It ran just 32 colourless cards (27 lands, 4 amulet, 1EE). So, it’s not taxing them enough to force them to play 40+, and it’s not like they built a deck around stirrings. A deck was built, then they got the perk of playing stirrings. Again, tron players actively want colourless cards in their deck, because their lands add colourless. They GET to play a tutor for their lands and threats, rather than “I better make sure my cards are colourless because i’m playing stirrings”.

    I don’t actually want a stirrings ban, but I think wrapping up and saying people have to make sacrifices to play it is completely incorrect. Just my 2 cents

Leave a Reply