A poker-bot that was designed by researchers from Facebook and Carnegie Mellon University has consistently beat some of the world's top human players in a series of six person no limit Texas Hold'Em poker games, according to The Verge.
The AI system, named Pluribus, played over 10,000 hands over the course of 12 days. In one situation, it played alongside five human players and in another, it played along five additional AI players. The bot won, on average, five dollars per hand with hourly winnings of about $1000, which researchers called a "decisive margin of victory".
Noam Brown, a research scientist at Facebook AI Research said:
“It’s safe to say we’re at a superhuman level and that’s not going to change.”
Chris Ferguson, a six-time World Series of Poker champion said: “Pluribus is a very hard opponent to play against. It’s really hard to pin him down on any kind of hand.”
In a paper recently published, the scientist behind the bot said that the victories are a significant milestone in AI research. Other computers have mastered games like Chess and Go, but six person Texas Hold Em was always a higher benchmark to accomplish.
This is because information needed to win the game is often hidden from players - it involves multiple players and complex victory outcomes. A game like Go is easier for AI despite having more possible board combinations than atoms in the observable universe, because all the information is at least available to see. This makes it easier for AI to train on.
Back in 2015, a machine learning system beat human pros at two player Hold Em, but raising the number to five opponents increased the complexity of the game significantly. A few different crucial strategies were deployed to address this:
First, they taught Pluribus to play poker by getting it to play against copies of itself — a process known as self-play. This is a common technique for AI training, with the system able to learn the game through trial and error; playing hundreds of thousands of hands against itself. This training process was also remarkably efficient: Pluribus was created in just eight days using a 64-core server equipped with less than 512GB of RAM. Training this program on cloud servers would cost just $150, making it a bargain compared to the hundred-thousand-dollar price tag for other state-of-the-art systems.
Then, to deal with the extra complexity of six players, Brown and Sandholm came up with an efficient way for the AI to look ahead in the game and decide what move to make, a mechanism known as the search function. Rather than trying to predict how its opponents would play all the way to the end of the game (a calculation that would become incredibly complex in just a few steps), Pluribus was engineered to only look two or three moves ahead. This truncated approach was the “real breakthrough,” says Brown.
Pluribus was "remarkably good at bluffing its opponents" and those who played against it praised it for its relentless consistency and the way it could squeeze profits out of thin hands. It was also "predictably unpredictable", and did so just by playing the cards it was dealt, without looking at facial recognition or spotting tells.
Brown says that bluffing can be an art that can be reduced to mathematically optimal strategies: “The AI doesn’t see bluffing as deceptive. It just sees the decision that will make it the most money in that particular situation. What we show is that an AI can bluff, and it can bluff better than any human.”
The fact that AI has now bested humans in six person Hold Em means that there is now much that humans can learn from computers when it comes to playing Hold Em.
Researchers also hope that techniques used to create the AI bot can be transferable to other situations, like cyber security, fraud prevention and financial negotiations.