Mahjong AI: The Next Level

Within the past few decades, we have created machines and artificial intelligence that can play games- and play games well. Chess is a prime example of this, with quite a remarkable history against some of the best human grandmasters. For practice, entertainment, analysis of games, training and research in artificial intelligence, we’ve created many machines and programs that have pushed the boundaries of both technology and, in a way, their respective games. I want to explore a particular game which is seeing recent progress in this department and highlight the unique challenges it presents. Let’s explore artificial intelligence in Mahjong. 

What is Mahjong?

Mahjong is a multi-round, tile-based game originating in China during the Qing Dynasty and has spread around the world since the early 20th century. It’s usually a four-player game, although there are a few three-player variations in Japan, South Korea and Southeast Asia. For this article, I will mainly be talking about Japanese mahjong (or Riichi mahjong), as it seems like a lot of the research for artificial intelligence is based on this particular variation, as it has a large community on- and offline.

Riichi mahjong is played with 136 tiles. Mahjong, in general, uses 144 tiles based on Chinese characters and symbols, but a lot of the variations either add or omit certain tiles. Each player starts with 13 tiles and a certain number of points (usually 25,000), and players draw and discard the tiles in their hand to try and be the first to create a winning hand with (usually) 14 tiles. Since players always draw one tile and discard another, they need to either draw the 14th tile they need or wait for someone else to discard it. These hands earn the player points and, once someone makes a hand, the round is over and another starts. These points accumulate and the player with the most points at the end wins. You don’t see the other players’ hands, but each tile you discard is placed and ordered in front of you for everyone to see.

Just like general Mahjong, there are three numbered suits of tiles and unranked honour tiles, and there are four of each tile (like a deck of cards, but with three suits instead of four). There are the pin tiles, which are numbered 1-9 with little circles or pins; man tiles, which are numbered 1-9 with their respective kanji; and the sou tiles, which are numbered 1-9 with bamboo symbols (except for the 1-sou, which is a peacock). Then there are the wind tiles for each of the cardinal directions (North, East, South and West) and finally, the dragon tiles (White, Green and Red). Generally, there are more unranked honour tiles, like the season and flower tiles, but these are usually omitted in Ricchi mahjong.

A general winning hand consists of four mentsu or sets and a pair of tiles (with a pair being two identical cards). The mentsu can either be a three-of-a-kind, called a koutsu, or three number tiles of the same suit in a consecutive sequence (called a shuntsu). A four-of-a-kind or a quad can also be formed. Tiles can also be stolen from other players to complete these sets, called melds. If any player were to discard a tile that would complete a koutsu in your hand, you can “steal” that tile by calling pon and use that to complete that meld. Similarly, you can call chii to steal a tile that would complete a shuntsu, but this can only be on the discards of the player to your left (right before your turn). You can also call kan to complete a quad, but you must draw a supplementary tile after (so your hand will still consist of four mentsu and a pair- this makes up for the extra tile used to make the quad). A consequence of these moves (aside from the scoring, which I’ll get to in the next section) is that these melds become “open”: once you’ve called chii or pon or kan, whatever meld you made must be face-up on the table so that everyone can see it. An example of a winning hand would be something like a triplet or koutsu of the white dragon, a koutsu of the North wind tiles, a 1-pin, 2-pin and 3-pin forming a shuntsu, a 4-sou, 5-sou, and 6-sou forming another shuntsu, and a pair of 9-man tiles. 

An example of a winning hand consisting of two ponstus, two shuntsus and a pair

Once a player forms a winning hand, they earn points from other players. Because you start with 13 tiles, you need to either draw or wait for someone to discard your 14th tile to win a round. A hand that is waiting on just one tile to win is in tenpai or is referred to as a “ready hand”. If a particular player discards someone’s winning tile, ron is declared, and once the hand’s value is calculated, it’s all deducted from that player and given to the winning player. Using the previous example, if you had everything except another 4-man, you would wait for someone to discard it so you can take it. If you were to draw your winning tile instead, tsumo is declared and the points are divided and taken from everyone on the table. This carries on for several rounds, and whoever has the most points at the end is the winner. 

Mahjong is a game of skill, calculation, deduction, bluffing and chance. In terms of making an AI play (optimally), there are a few unique challenges that arise that aren’t there in the creation of programs and AI for games like Chess or (Texas Hold’em) Poker.

Why is this any different to Chess, Go or Poker?

Let’s start simple. A lot of the games that we’ve created AI for that can play at an incredibly high level are two-player games; Chess, Go and Shogi are all two-player games. Mahjong is generally played by four players, so the scale of information to consider is considerably larger for both humans and AI.

Looking at some game theory, Chess and Go are what’s called games with perfect information. In games like this, all players are always perfectly informed of the game and previous events that have occurred. Chess and Go are games played on a board where everyone can see the pieces; all the information you would need to play and play “optimally” is in front of you. In Mahjong, the only information you have is what’s currently in your hand and the discards and melds of other players. Everything else is unknown, so players have to use what little information they have to try and figure out what the best move to make is. Mahjong is an example of a game with imperfect information.

Texas Hold’em Poker is another game with imperfect information and is also played over several rounds. You initially don’t know the five cards in the middle and, even once they are all revealed, you don’t know the hands of other players. Now, research has gone into games with imperfect information like this; Pluribus has beaten professional Texas Hold’em Poker players in six-player games. So, Mahjong being a game of imperfect information doesn’t make it unique.

Riichi mahjong just has more missing information. In Texas Hold’em, each player is dealt two cards, called hole or pocket cards. With a normal deck of playing cards, there are 52×51 combinations of possible hands for the first person. However, in Mahjong, three players are holding 13 tiles in their hand that you can’t see (or 14, counting the tiles they draw), and there’s more missing information beyond this. When setting up the game, 14 tiles are taken out of play to form the “dead wall”. This is where you draw the supplementary tiles when you perform a kan. There’s another property of the dead wall with rules unique to Riichi mahjong: the dora tiles. One tile in the dead wall is shown face up, and the tile that comes next to it in the sequence is dora. For example, if a 3-man tile is shown face-up on the wall, then the tile next to it in sequence is a 4-man. So, any winning hand that has this tile has more value (like a bonus tile). Having these tiles in a hand adds more han to them. Han is one of the ways a hand’s value is calculated. Generally, the more han a hand has, the more points a hand is worth (it’s a bit more complicated than that but this is a good rule of thumb). The dead wall means that there are 14 tiles out of play, and each player has 13 tiles that you don’t know about, making the scale of imperfect information in Mahjong far greater than in Texas Hold’em. 

Now, while I mentioned a winning hand in Riichi mahjong usually has four mentsu and a pair, that’s actually not enough. A winning hand must have that form and have at least one yaku. Yaku are specific conditions or combinations of tiles, each with their own values (they’re similar to the hands in Texas Hold’em, like a Straight or Royal Flush). All Simples, or tan’yao is one example of a yaku. To satisfy this yaku, your hand must have no honour tiles (winds and dragons) and no “terminal” tiles (tiles numbered one  or nine). Now, while Texas Hold’em has 10 hands, Riichi mahjong has a lot more yaku. Unlike Texas Hold’em, the yaku don’t outclass another, they stack. Having a hand which has a pontsu of one of the three dragons is a yaku, so having a pontsu of two dragons would give two yaku. An example of a more complicated one that’s worth a lot more points is a flush, which is a hand composed of only numbered tiles of one suit. Each yaku is worth some han; easier ones like All Simples are worth one while more complicated ones like a full flush are worth five. I mention this important point after highlighting the scale of imperfect information because this just adds to it. Because there are so many types of yaku that one could meet; there are lots of different possibilities human players have to consider. Players have to decide what yaku (or how many) they try and go for, which depends on the tiles they start with, the tiles they draw, how many points they want/need to win and also what other people are trying to do (since, if you recall, you can “steal” others’ discards). It is estimated that the number of possible hands that satisfy the All Simple yaku is over 903,760,799,976 (this is the number of ways to make a hand that satisfies this yaku, but it ignored the arrangements of those tiles, which is why it’s estimated to be over this number). And that’s just one of the many different types of yaku. Some of these hands are also either restricted to or worth more if your hand is menzenchin. I previously mentioned that you can “steal” stiles from other players to form melds. If your hand is in tenpai and you haven’t formed any melds, you can wager 1000 of your points to riichi. This means that you will earn extra points if you manage to form that hand, but you’re committing to it, so you can’t change your mind or form any melds. However, by not making any melds, you’re relying solely on your draws, which could make it harder to make hands if you aren’t drawing the right tiles. All in all, the sheer size of Mahjong means there’s so much more to consider than with other games such as Texas Hold’em. So, an AI that we make would have to consider many more possibilities.

The last challenge I’ll talk about is that Mahjong is a multi-round game. As you might be able to tell, scoring in Mahjong is a bit complex, and the multi-round structure adds to that. It’s entirely possible to win all but one round and lose the game. Losing a round also doesn’t necessarily mean you made a mistake either; it could’ve been tactical. Because each hand is worth different amounts, players have to strategise what hand they want to go for, factoring in how difficult it would be to make, how likely they are to be successful and how many points they would get. For example, let’s say you’re in last place in the last round of a game. The player in first place has 12,000 points more than you. Since this is the last round, to win the game, you have a few options. You could try and make a hand that’s worth more than 6,000 and try to take them from the player in first. Alternatively, you could make a hand worth more than 12,000 points and take that from anyone else (or everyone else). Hands worth that much are hard to make. So, depending on what hand you start with, the tiles you keep and whether you form any melds or not are all going to be affected by what you think is most likely to make you successful. On the other side, first place is comfortably ahead, so they might try and go for a really easy hand to finish the round quickly, so no one has a chance to make a comeback. A player needs to carefully consider the trade-offs of going for something difficult that’s worth a lot versus something easy that’s not worth a lot. On top of that, since all you can see is other people’s discards, players need to use this limited information to try and figure out what tiles are dangerous to discard.

Mahjong is huge. There are lots of rules, the number of possible hands and ways to earn points are multitudinous and, due to there being multiple rounds, there could be many different paths to victory. I’ve overlooked some details, but that’s a brief overview of how the game is played in comparison to others. There’s a lot of things to consider, yet human players take all of this into consideration. A challenge then is how to just make an AI consider all of this and then play optimally. 

Current State of Mahjong AI

As you can imagine, these challenges made it particularly difficult to make a strong Mahjong AI. Up until 2019, while we certainly had strong AIs, human players were still quite far ahead. However, the development of new techniques aiding reinforced learning have allowed us to make great progress. Currently, we have Super Phoenix (or Suphx for short), developed by Microsoft Research Asia. The paper addresses how they dealt with these challenges in detail beyond the scope of this article. 

One technique I want to briefly talk about is how they dealt with imperfect information. They used a technique called “oracle guiding”. Rather than training the AI by making it play games of Mahjong normally, the AI started out as an “oracle” with perfect information. It would initially start playing with the ability to see everyone’s hands and the dead wall. As you can imagine, this AI would be unfairly powerful. As the AI progressed and got better, the amount of information it had access to was reduced, until it became a “normal agent”, with the normal amount of information allowed. It turns out that, through experimentation, this led the AI to improve faster than if it was just trained as a normal agent initially. Other techniques were used to deal with issues like making sure the AI considers winning the game rather than winning each individual round.

The data used to train the Suphx AI was from tenhou.net, which is one of the most popular online platforms to play Riichi mahjong. It’s also where this AI was evaluated to see how well it could play. After development and offline training, Suphx fought against both human and AI players. A quick note about Tenhou: players earn a rank after earning or losing points from winning or losing games. Tenhou uses a Japanese Martial Arts Ranking System which goes from rookie, to 9 kyu, all the way down to 1 kyu, then 1 dan all the way up to 10 dan (rookie being the lowest rank and 10 dan the highest). Suphx is about 2 dan better than Bakuuchi and NAGA, the two best Mahjong AIs before Suphx. Suphx also has the 10 dan rank against human players, which only over 100 people have gotten in Tenhou’s history. For perspective, this means Suphx is above 99.99% of human players on Tenhou.

What was interesting to me reading the paper was that Suphx seemed to develop its own playing style, a more defensive style of play that liked to keep safe tiles and had a tendency to go for half flushes often. Through reinforcement learning and thousands of games, the AI has taught itself what it deems to be the best way to play a game as huge as Mahjong.

Now, the making of an AI this good against human players is a significant moment in any game’s history. Suphx could join the ranks of the AIs that have beaten us at our own games, amongst AIs such as AlphaZero for Chess, AlphaGo for Go and Pluribus for Texas Hold’em. But what this research also hints at is how the techniques used here could be applied to real-life scenarios. In some ways, Mahjong can be more similar to real-world scenarios than Chess or Go or Poker, areas like finance market prediction where there’s a lot of missing information and complex ways to “win”, so to say. Any scenario which has complicated rules on both operating and “scoring”, lots of imperfect information and properties similar to those that we’ve explored in Mahjong could potentially use artificial intelligence and these techniques to overcome them and reach the next level.

Leave a Reply