Online RL-based DayDreamer can Train a Robot without New Machine Learning Algorithms and Simulation
Robot training is one of the most challenging tasks for AI engineers. Most robotic training modules require extensive simulation with new machine learning algorithms. Not only is this time consuming but also ineffective. To solve these complexities in robot training, AI and robotic engineers at the University of California, Berkley have published a paper on DayDreamer, an effective online Reinforcement Learning (RL) based approach for the real physical world.
What is Robot Training?
Robot training is the sophisticated branch of engineering that involves scientific application of mechanical engineering, robotics, sensors, and machine learning software. Traditionally, robotics teams have always relied on various learning algorithms to train their robots about physical environment. These could involve learning through adaptive control, cognitive intelligence or neural networking based on reinforcement learning or RL.
In the current AI and Robotics space, we are witnessing how different categories of robots are developed, trained and modified to meet larger goals in the business. Commonly, robots could belong to these categories:
- Developmental Robots or DevRobs
- Cognitive Robots (based on RPA and semi-trained Machine Learning)
- Evolutionary robots
Now, DayDreamer is a kind of advanced RL based robot algorithm that has shown remarkable learning capability even with limited interactions during the training phase. In most cases, it could easily outperform pure RL-based robots that were trained for video games.
What is DayDreamer capable of?
Dreamer can learn very quickly from the real world, and this, happens without an intervention of any simulators and machine learning algorithms that are often required to run DevRobs and Cognitive Robots, essentially stating that the Dreamer algorithm overcomes all the complex challenges posed by the world models.
For example, the researchers at Berkley were able to train four robots with Dreamer without adding any new machine learning algorithms. Moreover, the same Dreamer algorithm was used to train a quadruple robot to perform a variety of actions. This training was completed within an hour— which means robotic teams can now develop a robot for physical world within 60 minutes!
The pace at which the Dreamer learns so quickly and outperforms other models can be attributed to its simple pipeline for Online RL, its immaculately designed robot hardware without simulators and enhanced interaction between its supervised learning-based world model and neural networking policy. The parallel computing syncs in tandem with the data collection and neural network learning processes enabling Dreamer algorithm to deliver low-latency action computation.
At Berkley, the team leveraged the Dreamer algorithm (hafner2019dreamer, hafner2020dreamerv2). This allowed the developers to focus on building a fast robot learning in real world. The Dreamer algorithm is able to train the robots for:
- quadrupled walking
- multi-object visual pick and place
- XArm visual pick and place, and
- sphero navigation
The experiments done by the Berkley researchers clearly show that the Dreamer algorithm is capable of overcoming the challenges commonly associated with visual localization policy. With online RL based training and sparse rewards, Dreamer is able to learn a successful strategy within a few hours of autonomous operation.
What DayDreamer’s Neural Network consists of?
Dreamer’s neural network learning architecture consists of two NN components.
- Recurrent state-space model (RSSM) with Kalman filter and an encoder to fuse, analyze and deliver real-time model predictions based on rich learning sensory signals.
- Behavior Learning model for parallel policy optimization which trains policy network, value network and a reward function.
Research article: Wu, P., Escontrela, A., Hafner, D., Goldberg, K., and Abbeel, P., “DayDreamer: World Models for Physical Robot Learning”, 2022. Link: https://arxiv.org/abs/2206.14176
Comments are closed.