Online RL-based DayDreamer can Train a Robot without New Machine Learning Algorithms and Simulation

RobotsAI Machine Learning ProjectsCognitive ScienceDeep Learning

By Sudipto Ghosh On Aug 4, 2022

Robot training is one of the most challenging tasks for AI engineers. Most robotic training modules require extensive simulation with new machine learning algorithms. Not only is this time consuming but also ineffective. To solve these complexities in robot training, AI and robotic engineers at the University of California, Berkley have published a paper on DayDreamer, an effective online Reinforcement Learning (RL) based approach for the real physical world.

What is Robot Training?

Robot training is the sophisticated branch of engineering that involves scientific application of mechanical engineering, robotics, sensors, and machine learning software. Traditionally, robotics teams have always relied on various learning algorithms to train their robots about physical environment. These could involve learning through adaptive control, cognitive intelligence or neural networking based on reinforcement learning or RL.

In the current AI and Robotics space, we are witnessing how different categories of robots are developed, trained and modified to meet larger goals in the business. Commonly, robots could belong to these categories:

Developmental Robots or DevRobs
Cognitive Robots (based on RPA and semi-trained Machine Learning)
Evolutionary robots

Now, DayDreamer is a kind of advanced RL based robot algorithm that has shown remarkable learning capability even with limited interactions during the training phase. In most cases, it could easily outperform pure RL-based robots that were trained for video games.

What is DayDreamer capable of?

Dreamer can learn very quickly from the real world, and this, happens without an intervention of any simulators and machine learning algorithms that are often required to run DevRobs and Cognitive Robots, essentially stating that the Dreamer algorithm overcomes all the complex challenges posed by the world models.

For example, the researchers at Berkley were able to train four robots with Dreamer without adding any new machine learning algorithms. Moreover, the same Dreamer algorithm was used to train a quadruple robot to perform a variety of actions. This training was completed within an hour— which means robotic teams can now develop a robot for physical world within 60 minutes!

The pace at which the Dreamer learns so quickly and outperforms other models can be attributed to its simple pipeline for Online RL, its immaculately designed robot hardware without simulators and enhanced interaction between its supervised learning-based world model and neural networking policy. The parallel computing syncs in tandem with the data collection and neural network learning processes enabling Dreamer algorithm to deliver low-latency action computation.

RobotLAB Unveils Breakthrough in Humanoid Robotics with Launch of BroBot™

Apr 1, 2025

Pudu Robotics Unveils FlashBot Arm: A Semi-Humanoid Embodied AI Service Robot for Commercial Applications

Apr 1, 2025

Hexagon launches new Robotics division to drive next-generation autonomy

Apr 1, 2025

Prev Next 1 of 1,191

At Berkley, the team leveraged the Dreamer algorithm (hafner2019dreamer, hafner2020dreamerv2). This allowed the developers to focus on building a fast robot learning in real world. The Dreamer algorithm is able to train the robots for:

quadrupled walking
multi-object visual pick and place
XArm visual pick and place, and
sphero navigation

The experiments done by the Berkley researchers clearly show that the Dreamer algorithm is capable of overcoming the challenges commonly associated with visual localization policy. With online RL based training and sparse rewards, Dreamer is able to learn a successful strategy within a few hours of autonomous operation.

What DayDreamer’s Neural Network consists of?

Dreamer’s neural network learning architecture consists of two NN components.

Recurrent state-space model (RSSM) with Kalman filter and an encoder to fuse, analyze and deliver real-time model predictions based on rich learning sensory signals.
Behavior Learning model for parallel policy optimization which trains policy network, value network and a reward function.

Research article: Wu, P., Escontrela, A., Hafner, D., Goldberg, K., and Abbeel, P., “DayDreamer: World Models for Physical Robot Learning”, 2022. Link: https://arxiv.org/abs/2206.14176

Online RL-based DayDreamer can Train a Robot without New Machine Learning Algorithms and Simulation

What is Robot Training?

What is DayDreamer capable of?

What DayDreamer’s Neural Network consists of?

[To share your insights with us, please write to sghosh@martechseries.com]

Quick Links

Visit Our Other Sites

Follow Us

Interested in our Customized Editorial Services?

Please fill your details and we’ll get in touch with you!

NEWS

INTERVIEWS

INSIGHTS

AI RADAR

SERVICES

SUBSCRIBE

CONTACT US

Brought to you by

To repurpose or use any of the content or material on this and our sister sites, explicit written permission needs to be sought.

Copyright © 2025 AiThority. All Rights Reserved. Privacy Policy

Online RL-based DayDreamer can Train a Robot without New Machine Learning Algorithms and Simulation

What is Robot Training?

What is DayDreamer capable of?

What DayDreamer’s Neural Network consists of?

[To share your insights with us, please write to sghosh@martechseries.com]

Quick Links

Visit Our Other Sites

Follow Us

Interested in our Customized Editorial Services?

﻿Please fill your details and we’ll get in touch with you!

NEWS

INTERVIEWS

INSIGHTS

AI RADAR

SERVICES

SUBSCRIBE

CONTACT US

Brought to you by

To repurpose or use any of the content or material on this and our sister sites, explicit written permission needs to be sought. Copyright © 2025 AiThority. All Rights Reserved. Privacy Policy

Please fill your details and we’ll get in touch with you!

To repurpose or use any of the content or material on this and our sister sites, explicit written permission needs to be sought.

Copyright © 2025 AiThority. All Rights Reserved. Privacy Policy