So what’s the latest in AI? We’ve heard of computers beating trivia masters at Jeopardy! and defeating world-class opponents at chess, but can AI conquer the game of poker, with all of its complexities, 10^160 different situations, and the game’s overwhelming human element?
Carnegie Mellon researcher Noam Brown has done just that. He developed Libratus, a poker AI that beat the world’s best expert poker players at Heads up No-limit Hold’em in an early 2017 competition. Jeff Meyerson of Software Engineering Daily recently chatted with Brown (transcript here).
We recapped their conversation to bring you 13 things you need to know about how AI totally crushes poker and eventually the world.
1. Poker is super easy to automate.
Even though poker is a game of strategy, hidden information, predicting human play, and bluffing – in other words, an “imperfect information” game – it’s actually a very mathematical process that can be optimized with complex computer science algorithms. This is not easy, though, and it’s much harder than applying AI to “perfect information” games like chess or Go. “It’s far more difficult for an AI to emulate human behavior when it comes to poker,” says Brown, “but it’s certainly within the realm of possibility for an AI.”
2. Poker AI’s approach is way different than the AI approach for chess and Go.
In a game like chess or Go, there’s no hidden information. Both computers and humans can see the game pieces on the board and have a good idea as to what’s going on at all times. Because of this, artificial intelligence systems follow a process of searching through a set of formulas and possible scenarios to arrive at the best possible strategy. In a game like poker, you can’t see your opponent’s cards, and you never know exactly what state of the game you’re in. “You really have to start from scratch with an entirely new approach,” says Brown.
3. Libratus learned how to play on its own, 100% from scratch.
Libratus plays completely at random, then plays itself – not other humans – in trillions of hands of poker. According to Brown, “When it finishes a hand, it goes back and reviews all of the decisions that it made and it asks, ‘If I had done this other action instead, would I have gotten more money?’” Over these trillions of hands of poker that the AI plays, it gradually improves in the same way that a human improves: by making mistakes and learning from them.
4. Poker AI uses a reward function.
In a game of poker, the reward function is the same for both humans and computers: making money. The Libratus team started by giving the machine the rules of the game and told it to win as much money as possible.
It works to optimize its reward function – how much money it makes – by trying something randomly in a given situation and looking back at how much more profitable other actions that it didn’t take would have been. If it would have made more money by taking a different action, it will employ that action more often in the future. The AI effectively updates and optimizes its strategy every time it plays, with its metaphorical eye on the prize.
5. Libratus combines situations together to arrive at optimized strategies for almost every situation.
Poker is a game of infinite possibilities. It’s impossible to have an optimized strategy for every situation, even for a machine. So, instead of coming up with unique policies for every possible situation, Noam and his team combined certain situations, for example a bet size of $500 and $501, and developed a single policy to represent both situations, making it easier and faster for the AI to arrive at a good strategy for all situations.
6. AI crushes humans at the “turn” and the “river.”
Poker pros have extremely sophisticated turn and river play. It’s one of the most difficult stages of poker and requires careful analysis and estimation. This is what makes it so great for a machine. “There’s some really advanced things that you can do on the turn and the river that the computer can do way better than humans,” says Brown. Its estimates are much more accurate than a human’s, especially since it’s played and learned from trillions of hands of poker.
7. Libratus doesn’t use deep learning.
Libratus employs reinforcement learning to solve Nash equilibrium algorithms, not deep learning. However, according to Brown, deep learning can be implemented to improve certain aspects of the AI. In fact, deep learning has been used in another poker AI called DeepStack, which employs a neural network with deep learning algorithms.
8. The poker AI didn’t learn how to play from human data.
“One of the really cool things about the AI,” says Brown, “is that it hasn’t learned how to play from human data, it learned by playing against itself.” As a result, Libratus can come up with innovative strategies that humans don’t typically employ, which gives the computer an edge against humans, and makes it difficult for the pros to respond.
9. The poker AI improved after playing against humans.
The AI does not adapt to human opponents and develop strategies specifically to learn how to beat them. Instead, it evaluates the situations that humans are putting it in and works to improve those situations, striving for the perfect, unbeatable strategy. “I have to hand it to the humans,” says Brown, “They put up an incredible fight and they really gave it their all. They studied every night trying to figure out as a team how to take down this AI.”
10. But the AI learned most of its strategic algorithms with little human input.
Libratus mostly employs strategic algorithms that it has learned, without much handcrafted code. For example, when deciding which actions to improve upon, Libratus uses an algorithm to decide. It will say to itself, “These actions that the humans are using are far away from a size that I’m already considering and they seem to be betting this amount a lot, so I should really focus on this situation in particular to improve.”
Other things, such as the bet sizes Libratus considers, were a handcrafted choice by Brown and his team. However, Brown thinks this is an area for improvement and says, “We do actually have an algorithm that can determine those sizes automatically, and I think in the future it will be really nice to use that instead.”
11. The poker AI goes through batch learning overnight.
According to Brown, “In the first and second round, the AI considers a bunch of different bet sizes both for itself and for the opponent. There are 20,000 different possibilities, because you can bet any dollar amount between $100 and $20,000, but it doesn’t consider all of them. It will round, for example, a bet of $501 to a bet of $500.”
However, this is not a perfect strategy because $500 and $501 are obviously different. To combat this, the AI learned – literally overnight – how to respond to specific situations where rounding makes a huge difference. The next time it encountered that situation, it wouldn’t round $501 to $500, but keep the two bet sizes different.
12. Libratus’ algorithms were written in C++, using MPI to communicate between nodes.
Libratus uses algorithms written in C++, and employed MPI, an API for communicating between nodes in a distributed process, during the equilibrium finding process when the AI was playing poker against itself over trillions of hands.
13. AI is going to (yep, let’s get real) replace a lot of human jobs.
Brown likens the rise of artificial intelligence to the Industrial Revolution. “I think it’s important that we recognize that there are going to be winners and losers,” says Brown, “and there are definitely going to be more winners than there are losers. However, it’s important that the losers are compensated for their loss by the winners in this automation process.”
Looking for a role in AI? Indeed Prime features hundreds of companies looking for engineers with experience in reinforcement learning, neural networks, and deep learning. Sign up to use Prime for your job search at no cost.