Reinforcement Learning ‘Really Works’ for AI Against Pro Gamers, OpenAI Trailblazer Says

By NVIDIA On Sep 23, 2018

Ilya Sutskever spoke of the recent Dota 2 gaming results at NVIDIA’s annual NTECH engineering confab at the Silicon Valley campus.

Fast, creative, smart — great gamers are all these things. Somebody has to teach machines how to keep up. That somebody is Ilya Sutskever and his team at OpenAI

Sutskever, co-founder and research director of OpenAI, and his team at Open AI are developing AI bots smart enough to battle some of the world’s best human gamers.

In August, OpenAI Five, a team of five neural networks, were defeated by some of the world’s top professional players of Dota 2, the wildly popular multiplayer online battle arena game.

It was a leap for OpenAI Five to even be playing a nearly unrestricted version of Dota 2 at a professional level, which took place at Valve’s International competition in Vancouver — a world series of esports played for tens of millions of dollars.

That’s because Dota 2 is an extremely complex game. Players can unleash an enormous number of tactics, strategies and interactions in the quest to win. The game layout — only partially observable — requires both short-term tactics and long-term strategy, as each match can last 45 minutes. “Professional players dedicate their lives to this game,” said Sutskever. “It’s not an easy game to play.”

Sutskever spoke Thursday at NTECH, an annual engineering conference at NVIDIA’s Silicon Valley campus. The internal event drew an enthusiastic crowd of several hundred engineers — many also huge gaming fans — and hundreds more online.

Dota 2 Raises AI-Gaming Bar

OpenAI Five’s Dota 2 work marks an entirely new level for human-versus-AI challenges. For comparison, in chess and Go — also popular AI challenges — the average number of actions is 35 and 250, respectively. In Dota 2, which has really complex rules, there are about 170,000 actions per move and there are 20,000 moves per game.

With all of Dota 2’s complexity, it’s closer to the real world than any other previous game tackled by an AI, he said. “So, how did we do it? We used large scale RL (reinforcement learning),” Sutskever told the audience.

Reinforcement learning matters for humans and machines alike. When we earn a bonus point in a game with a move or get blown to bits with another, each of these moments provide reinforcement learning — burned in memory — for the next go-round.

Reinforcement learning matters to AI because it is a very natural way of training neural networks to act in order to achieve goals, which is essential for building an intelligent system.

OpenAI Five has seen spectacular results because it used a reliable reinforcement learning algorithm(Proximal Policy Optimization) at massive scale, running on more than 1,000 NVIDIA Tesla P100 GPUs in Google Cloud Platform.

NVIDIA has been there as an early supporter, with CEO Jensen Huang personally delivering the first DGX-1 AI supercomputer in a box for the folks at OpenAI.

History of GPU Challenges

Sutskever is no stranger at unleashing GPUs on AI’s biggest challenges. He was among the trio of University of Toronto researchers — including Alex Krizhevsky and advisor Geoffrey Hinton — who pioneered a GPU-based convolutional neural network to take the prestigious ImageNet competition by storm.

The results — nearly slashing in half the error rate — go down in history as the moment that spawned the modern AI boom.

The resulting model — dubbed AlexNet — is the basis of countless deep learning models. At GTC 2018, Huang spoke of AlexNet’s influence on thousands of AI strains, stating: “Neural networks are growing and evolving at an extraordinary rate.”

Sutskever says leaps in AI track closely to processing gains. “It’s pretty remarkable that the amount of compute from the original AlexNet to AlphaGo Zero is 300,000x. You’re talking about a five-year gap. Those are big increases.”

OpenAI’s ‘Moonshot’ Ambitions

OpenAI is a nonprofit that was formed in 2015 to develop and release artificial general intelligence aimed at benefiting humanity. Its founding members include Tesla CEO Elon Musk, Y Combinator President Sam Altman and other tech luminaries who have collectively committed $1 billion to its mission.

Researchers at OpenAI are also making strides on a project called Dactyl, which aims to increase the dexterity of a robot hand. The team there has been working on domain randomization — an old concept — with remarkable results. They have been able to train the robot hand to manipulate objects in simulation, and then transfer that knowledge to real-world manipulation. This is important, because simulation is the only way to get enough training experience for these robots. “The idea works really, really well,” Sutskever said.

Sutskever is keen on pushing common AI concepts such as reinforcement learning and domain randomization to new heights. In the wide-ranging discussion at NTECH, he praised the conclusions of Arthur C. Clarke’s book Profiles of the Future, which said historically, doubts were cast on great inventions such as the airplane and space travel.

Skepticism, he said, initially led the U.S. to pass on building and sending a 200-ton rocket to space — on the grounds that it’s too large to be built. “So the Russians went on and built a 200-ton rocket,” he quipped, drawing audience laughter.