Data analysis approaches behind MTGCardTech

Several years ago, I started a side project with several different aims (see the Projects section on the home page). My approach to this project has been one of discovery and praxis, and I wanted to document some of my journey.

The domain of this project is the trading card game Magic: The Gathering. If you are not familiar with Magic, a little background on the game: there are roughly 20k game pieces (cards), and players construct their own decks of cards to play against each other in tournaments. Decks have a minimum size of 60 cards (most often) plus 15 utility cards (called the “side board”), and tournament play is divided into different “formats” where certain cards are allowed to be played, and other cards are excluded. The game is similar in some ways to Spades, Hearts, or Bridge, where you are essentially trying to play cards that beat (or are better than) your opponents’ cards. Each card has roughly a dozen core attributes to it, and provides a statement of the rules that effect that particular card, other cards in play, or the play of the entire game.

One of the main goals of the project was to analyze the tournament performance of decks to try to find out which cards are best, and why. With a practically infinite number of possibilities in a game that is both skill-based and has some level of randomness, what combination of attributes make some cards more valuable than others?

After building data ingestion pipelines to gather all of the cards in the game, and decks used in competitive tournament play, I had to structure the data in a way that enabled analysis.

Several card attributes are numeric counting numbers (typically 0 through ~20).

Equally important is the rules text which the card may have. For example, “Lightning Bolt deals 3 damage to any target”, and “Flying. At the beginning of each combat, any opponent may sacrifice a creature. If a player does, tap Desecration Demon and put a +1/+1 counter on it.”

As a means to quickly get started, I transformed all of the card attributes into terms, including the numeric vectors. For some key attributes that had a finite number of possibilities, I also constructed terms that represented their converse or complement (for example, a card with a “color” attribute of only “white” is also “notblack”). I also replaced text that is self-referencing with a common term. Once I had a “document” for each card, I then used some simple off-the-shelf tools (initially Mahout on Hadoop, but then Solr, and currently Elasticsearch) to do things like find differences and similarities between cards using TF/IDF. The results of this are currently used on the website now in the “Cards similar to this card” feature (e.g., http://mtgcardtech.jcrickmer.com/cards/106642-lord-of-atlantis/#similar-block).

I then used this to find similarities between tournament decks of cards. I put each card document of a deck into a new deck document, and then looked for similar deck documents. What I found as I experimented in this was that TF/IDF did not actually create spaces that classified or grouped decks together very well. With a limited vector space, and so many valuable data in very frequent terms, inverse document frequency was not the right solution for segmentation and classification to reach my goal. I experimented with several different approaches, including my reliance on bi-grams and tri-grams for rules text grouping, and moving the numeric attributes out of term-equivalent “text” and into actual attribute vectors (thus, the “power” attributes wasn’t a term like “power3” any more, and instead was the value 3 on the “power” attribute vector). This, of course, yielded different (and better) results, but still did not provide valuable insight into deck segmentation or classification.

I next used a number of different packages to expand upon that basic TF/IDF analysis, specifically with Python Sci-kit Learn (and some other tools bundled in Anaconda), Elki, and Orange. With some of those packages, I used used the same algorithms and approaches, and was just learning the tool set. I feel like I had the most success with sklearn and Orange.

With sklearn, I performed clustering of both cards and decks using K-Means, Ward hierarchical clustering, and Agglomerative clustering.

None of these produced the exact level of “rightness” that I was looking for. Although there were expert human-perceptible organizations of the clusters, I felt that they did not match the current human consensus of deck organization. As I explored these clusters, I felt that not enough value was being added to the work that was already out there (one example of some known groupings is from Patrick Chapin and his Sixteen Archetypes). I played with changing what information was included in each document, using single words and n-grams of various length, and increasing the number of stop words. I measured aspects of my resulting data set to try to reduce the total number of dimensions and create a “tighter” space, ultimately eliminating game-inconsequential terms that had frequencies that were too far outside of the average frequency.

The next set of approaches I tried was with Orange, and started with a more supervised approach. I used it’s Tree model implementation, and trained it with a handful of deck documents that were in the categories that I wanted. I then played with it classifying the documents that I trained it with, documents from the same tournament format (i.e., decks that were played against each other), and then documents from different tournament formats.

With Orange, I also tried using Random Forest, hierarchical clustering, and k-means clustering.

Most of these experiments had a similar progression, which was to first try just a concatenation of the card documents into a deck document, followed by creating specific term vectors within the deck document that captured some computed vectors (for example, the number of each of the different card types).  These models did get closer to classifying in a manner that approximated human classification.

Also, somewhere in the middle of this, I attempted a Naive Bayes implementation, however quickly confirmed my expectations that the number of important dimensions of the data would require an inhuman amount of supervised learning.

Ultimately, the results from Orange were the most meaningful to me. In part, I think that it was Orange’s visual interface that allowed be to experiment and test quickly and visually. However, this became a drawback as I wanted to make this more of an continuous, automated process. Orange does provide a programmatic interface, but I found it’s learning curve to be more than I wanted to invest in at the time.

Next, I really wanted to getting back to comparing the actual cards (and not the decks). Clustering and classification were good tools to find similarities, but did not help me compare cards that were substantially different when it came to the question “what card should I include in my deck?” Given all of the dimensions that make a card valuable, I figured that expert human’s would be a faster path than the machine. So, I created a head-to-head “game” based off of Microsoft Research’s TrueSkill algorithm.

The web application shows a player two cards, and asks the player to choose the better of the two cards using whatever measure or test that human player thinks is most valuable – http://mtgcardtech.jcrickmer.com/cards/battle/modern/. Each battle is recorded, and the wins and losses are calculated into a score using TrueSkill. The higher the score of a card, then more valuable the card.

This analysis allowed me to quickly gain human “supervision”, and as long as I was able to capture enough input from players of the these card battles, they did start to align with the frequency and popularity of those cards in decks. For example, this trending analysis shows the cards that are most frequent in the “Modern” format, with their Card Battle scores.

Most recently, I wanted to create new tools to help players discover cards that may work with the strategies or cards that they know that they want to play. Leveraging the TF/IDF analysis of cards, the inclusion of cards into competitively-played decks, and the card battle scores, I created a recommendation engine. (I played with a couple of “off the shelf” recommenders that used some of the same algorithms I had been experimenting with, but found that I already had too much invested in the analysis and pipelines I had already built to want to scrap my home-grown approach.) You can see an example of this in action at http://mtgcardtech.jcrickmer.com/decks/crafter/ – try selecting “Modern” for the format and adding the two cards “Lightning Bolt” and “Island” as seed cards.

There are two sets of recommendations. The first set of recommendations is based off of the frequency of other cards in decks that include the “seed” cards. Thus, if you put in a card that is not played at all, you won’t get any recommendations. The second set of recommendations (the “Spicy” set) takes the card documents of the seed cards, as well as the first set of recommendations, and pulls together other cards that are similar. There is an inversion of the terms between the seed cards and first set of recommendations so that there is a higher diversity of the Spicy set.

I continue to tinker with these concepts, and my current curiosity is around mana bases. And, I welcome any thoughts, feedback, or opinions on any of this work or the underlying ideas.

Words and pictures

This is part of a series on “I believe in…”

Early on in my work as a software developer, it became clear to me that communication was one of the biggest barriers for a team to create something valuable. Communication comes in so many different forms (many of which I believe have been unfairly maligned), and as a developer, I wanted to understand each of them and leverage them to their greatest potential. I read The Mythical Man-Month by Frederick P. Brooks, Jr. and was flabbergasted at the amount of communication that went into the OS/360 project that he references through the book. Looking at the small team of developers that I worked with, the level of communication Brooks wrote about seemed crazy. Yet, it resonated with me.

I believe in writing things down. I believe in drawing and diagramming as a tool to develop a better understanding.

So, I tried to get my team on-board with more communication. We had tools for communicating, but they weren’t really used. For the most part, we took a stack of ideas written in a Word doc from our boss (who played the role of a product manager) and then we huddled together to divvy up the work, only coming together again to discuss the interfaces between one person’s work and the other, and mostly after the work was already “done.” Bugzilla was a joke, the fancy tools we had from Microsoft and Rational Rose were haphazardly used, if at all, and testing, let alone test plans, were almost non-existent.

I was determined to “lead by example,” creating documentation to spur my colleagues into communicating more. My reasoning was: if I created these artifacts to share, shared them, then brought us together in meetings to discuss, then we would improve communication. Right? I was going to validate my specs and designs with peers before I built the wrong thing, and I was going to have us work more as a team.

I wrote specs. I made Use Case Diagrams. And Sequence Diagrams. And Entity Relationship Diagrams. I setup a list-serv for the team. I tweaked VSS to require commit comments of a certain length. I added fields to Bugzilla to capture more detail. I even tried to make Gantt charts to show the dependencies we had between each other over time.

And maybe we worked better as a team. Maybe not. This was early in my career, and it’s hard to say if the communication I tried to foster made a difference to others, or to our output. And, this was 20 years ago, so my memory may be a little hazy.

And, learning that “communication is important for teams” isn’t really a ground-breaking discovery.

Books: The Unified Software Development Process, UML Distilled, The Mythical Man-Month, and Drawing on the Right Side of the Brain

What I did learn was that writing it down, drawing it, diagramming it – those activities were more valuable than I ever thought. It brought awareness to problems I had not yet perceived, and forced me to move from my abstract mental visualization of a solution into something more concrete. Spending 15 minutes trying to translate my ideas into something constrained by a whiteboard’s two-dimensionality and only 4 marker colors actually helped me solve problems better and faster.

And it didn’t matter if anyone ever saw those diagrams or not. The value was in practicing the translation of an abstract idea into something concrete, with all of the physical constraints that that concrete form imposes.

Only when one writes do the gaps appear and the inconsistencies protrude. The act of writing turns out to require hundreds of mini-decisions, and it is the existence of these that distinguishes clear, exact [ideas] from fuzzy ones.

The Mythical Man-Month – Frederick P. Brooks, Jr.

Neuroscience is not something I really know anything about. But, maybe it is similar to what Betty Edwards found and shared with us in Drawing on the Right Side of the Brain. Maybe that simple act of forcing both hemispheres of your brain to try to solve a problem helps software developers and engineers build the right thing faster.

Lastly, my belief in writing and drawing does not apply to just software development. Any process, any system, any organization, any abstract concept can be better known to you by trying to turn it into language or an image.

I believe in writing things down. I believe in drawing and diagramming as a tool to develop a better understanding. An audience is not required to make this act valuable. The more abstract the “thing” is (like software architecture – it is much more abstract than, say, the framing of a house), the more value there is in diagramming it.

Software Development Life Cycle and Value

This is part of a series on “I believe in…”

The value of the software we create is defined both by time and usefulness. Almost every marketplace is full of big and small competitors looking to bring something more useful to prospects and customers every day. When the application is integral to the value proposition of the entire organization (i.e., part of the product sold to customers), it must be able to grow and change in response to the needs of its constituents and stakeholders, and to maintain or accelerate differentiation from competitors. You need a process to effectively manage this life cycle, and that process, and how you execute it, are key to the product’s success.

I believe in Agile and Scrum as a tool to create a more valuable asset that can meet changing demands more quickly. The application or platform is never “done”. We have releases of the software to satisfy current need and position something new and compelling to customers and stakeholders. And then we learn, respond, and release again.

I strongly believe in verifiable and measurable results. Code can be tested to make sure it functions how the developer wrote it (and to challenge assertions), and an analyst can validate that what the developer wrote is what the feature spec said. But to produce high-quality applications, we must also test that the specifications meet the requirements, and the requirements meet the customer expectations, and the software meets the needs of the market. Just as I expect for each developer to test her or his code, and to do so in small, fast iterations, I also expect the same of the Product Owner function (which is connected to Product Management). Through identifying users, actors, and personas, then developing user journeys and validating those user journeys with mock-ups and prototypes, we can get much more certain that we are developing the right solution before committing huge amounts of effort and money in the wrong direction, or even in a direction that is just a degree or two less than optimal.

DevOps is an integral part of recognizing the value of Agile. DevOps provides the framework for quality and consistency (in part through Continuous Integration testing), production readiness and deployment, and monitoring and optimization.

Thus, I believe that the Software Development Life Cycle is equally important, and inextricable from the actual code and product itself.