Alpha Go pulled off a triumphant win. The first of many wins
Written by: Rich Martin
AlphaGo, from Google’s DeepMind, made history with its historic 4–1 defeat of Go world champion, Lee Sedol. Prior to the five-game match, many Go experts, including Mr. Lee himself predicted that the machine would be defeated 5–0 or, if it did well, 4–1. The result was quite the opposite with Mr. Lee managing just one win.
This is both a shock for Go players, and represents an achievement in AI that many had not expected for [at least another decade](http://www.wired.com/2014/05/the-world-of-computer-go/). Games, and in particular abstract strategy games like Go, chess and draughts have been a staple of AI research since its inception.
IBM’s Deep Blue made history in 1996 by becoming the first chess computer to win a game against the reigning world champion, Gary Kasparov. AlphaGo’s victory today however is much more profound than Deep Blue’s victory two decades ago. Indeed this event may symbolise the most significant advance in the history of computing. For one key reason: whereas Deep Blue was a chess engine designed for the single purpose of playing chess, AlphaGo demonstrates the power of a powerful new general-purpose AI technique that is going to change the world.
To understand why this is so important, we need to take a step back and look at how computers solve games like chess and Go.
Any computer program designed to play an abstract strategy game such as chess, Go, draughts or noughts-and-crosses has the same basic problem: how do you search the tree of possible moves and counter-moves for the move that will give the best chance of winning? In noughts-and-crosses this is easy since the number of possible moves is very small: there are nine possible initial moves, 8 possible replies, 7 subsequent replies, etc.
Giving a total of just 362,880 possible combinations of ‘O’s and ‘X’s filling the board. Once the terminal conditions (where somebody has won before the board is full) and the symmetry of the board are taken into account, depending on how you count it there are only [26,830 possible games of noughts-and-crosses](http://www.mathrec.org/old/2002jan/solutions.html). That’s peanuts for a computer, and indeed the [first computer program that could play noughts-and-crosses —](https://en.wikipedia.org/wiki/OXO) and play it perfectly — was written back in 1952 on one of the very first computers.
Chess is much harder. The average number of possible (legal) moves at any point in a game of chess is around 35 and the game itself consists, on average, of around 70 ply. (A ply in game theory terminology is a turn taken by one player, in chess a move consists of two ply: a turn by white and a turn by black.) This means that to evaluate just 6 ply (or 3 moves) ahead, a naive chess program would have to evaluate 35⁶ or 1,838,265,625 different board positions. This high rate at which the number of different possibilities grows with each ply — called the branching factor — means that looking very far ahead in the game soon becomes impossible.
To solve this, Deep Blue, and other chess engines, prune the search tree by using programmed rules and heuristics to evaluate each position and decide which moves are worth deeper searching than others. These rules have been developed, accumulated, tweaked and adjusted over many years by the humans working in the field of computer chess. The power of a modern chess engine is a result of this rich accumulation of chess-specific knowledge.
Go is much harder still. In Go the branching factor is even higher, at around 250 different possible moves per position. This means that to look just 6 ply ahead, a naive Go computer would need to evaluate around 250⁶ or 244,140,625,000,000 different board positions. Even worse, because a Go game consists of, on average, 5 times more moves than a chess game, the programme needs to look significantly further ahead to gain an equivalent understanding of the position. Because of this, Go has been resistant to the heuristic-based tree pruning methods used in the computer chess world.
AlphaGo’s approach is very different. Instead of using human-crafted rules to prune the search tree, AlphaGo uses a pair of deep neural networks, one trained to recognise good moves, the other trained to recognise strong board positions.
Neural networks are an AI technique inspired by the way animal brains operate. Consisting of layers of nodes, data is presented to the first layer and flows to subsequent layers through a series of connexions of various “strengths”. As the network is trained, each layer in the network comes to represent more abstract features of the previous layer. The final (output) layer provides the result that the network has been trained to evaluate.
Neural networks have been used for many years, but over the last few years advances in both the availability of large amounts of parallel processing power and in the techniques of how the networks operate have led to a huge growth in both our theoretical understanding and the practical application of neural networks.
In the case of AlphaGo, the two networks that it uses both have an input layer consisting of the 19x19 grid of a Go game position. One network (called the value network) outputs a single numerical value that represents how good that position is for black or white. The other network (called the policy network) outputs a grid the same size as the Go board where each point contains a value that estimates the probability that a move would be played at that point next.
These two neural networks guide AlphaGo in its search through the tree of possible moves. The networks were initially trained on a database of 30 million moves from games played by strong human players, they then left it playing millions of games against subtle variations of itself, using the results of each game to further train the pair of neural networks.
The rules that allow AlphaGo to select the best move from among the many possible moves are not known to any programmer on the AlphaGo team. They have not been devised or programmed by humans and are not even accessible to the the AlphaGo programmers. Just as a human player learns by watching others play and learning to recognize patterns in the games and moves, so has AlphaGo learnt to play.
The success of this approach is apparent not only in the huge increase in Go playing strength that AlphaGo represents, but in the efficiency of how it achieves its results,
During the match against Fan Hui, AlphaGo evaluated thousands of times fewer positions than Deep Blue did in its chess match against Kasparov; compensating by selecting those positions more intelligently, using the policy network, and evaluating them more precisely, using the value network — an approach that is perhaps closer to how humans play. Furthermore, while Deep Blue relied on a handcrafted evaluation function, AlphaGo’s neural networks are trained directly from game-play purely through general-purpose supervised and reinforcement learning methods.
The result of the AlphaGo:Lee match has sent a shockwave through the Go community. Much as the chess community had to adapt to the arrival of programs that could defeat the best human players, so will the Go community. This is a good thing. For chess, apart from the problem of cheating, computers have had an extremely positive effect on the game. Acting as coaches and analysis aids, they have helped chess masters vastly expand their understanding of the game. For Go, the effect is likely to be similar.
Indeed, in the post-match press conference after game 3 of the AlphaGo:Lee match, commentator and 9-dan professional player Michael Redmond spoke about the possibility of AlphaGo and future Go computers ushering in a “third-revolution” in the way the game is played.
Reactions to AlphaGo’s victory have ranged from vague dismissal to sadness to paranoid fear-mongering. The sadness seems to come from the mistaken notion that humanity has been “beaten” by AlphaGo. The fear-mongering echoes the existing concerns around AI as voiced by people like Nick Bostrom, Stephen Hawking, Elon Musk, Bill Gates, and others. These are legitimate concerns that need to be taken seriously, but they are only indirectly connected to AlphaGo.
AlphaGo is a task-specific AI. It is exceptionally good at doing exactly one thing. It has no understanding or awareness of the pressure and rivalry of the professional Go world, the media interest that has surrounded it’s victory, what type of wood a Go board is made from, or even that its opponent in these games was a human. It was built to do exactly one thing exceptionally well: understand and play Go.
Task Specific vs General Intelligence
When people discuss the fear around AI they usually talk about something very different: an artificial general intelligence (AGI). An AGI would be a machine that is equally capable of learning Go or learning chess, understanding politics, the rules of cricket (including the LBW rule), and any other aspect of human life. It would be able to grow and adapt to new challenges rather than just being limited to a specific task. We don’t really know what an AGI would look like since we don’t yet know how to build one.
But in our very limited experience of general intelligence — the natural world of humans and other animals — we see it accompanied by troubling traits such as desire, pride and ambition. We don’t yet know if those are necessary traits for an AGI or if they are coincidental side-effects of the way in which we acquired intelligence. It’s obviously wise to be prudent and gain some understanding of this before we give birth to powerful, connected AGIs.
We are still a long way from building an AGI so we yet don’t know if these fears are justified. What we do know, however — what AlphaGo has so beautifully demonstrated — is that before we get to AGIs, we are going to have a wealth of highly capable task-specific AIs at our disposal. The deep learning techniques behind AlphaGo are general techniques that can be applied to a wide range of problems. The big deal isn’t that we made a machine that can beat us at Go, but that we made a machine that can learn to beat us at Go.
The deep learning techniques behind AlphaGo can be applied to medical diagnosis, medical research, scientific discovery, mathematics, and all kinds of other areas that require pattern detection and analysis. They are already being applied to image and speech recognition, translation, as well as predicting the behavior of and designing new pharmaceuticals.
These are areas where human intuition and intelligence has always been the most powerful tool at our disposal. We are now moving into an age in which mankind is no longer alone in these endeavors. An age in which we will be accompanied by powerful teachers, mentors and advisers in the form of task-specific AIs. Working alone or as aids to human practitioners, these AIs will take us places and unlock wonders we could never have achieved on our own.
AlphaGo’s victory over Lee Sedol was not a victory of machine over man. As when mankind first sharpened a rock, tied it to a stick and used it to plough the hard earth, it was a victory of mankind over his own innate limitations.