DayDreamer: World Models for Physical Robot Learning

Official implementation of the DayDreamer algorithm in TensorFlow 2.

If you find this code useful, please reference in your paper:

@article{wu2022daydreamer,
  title={DayDreamer: World Models for Physical Robot Learning},
  author={Wu, Philipp and Escontrela, Alejandro and Hafner, Danijar and Goldberg, Ken and Abbeel, Pieter},
  journal={Conference on Robot Learning},
  year={2022}
}

Method

DayDreamer learns a world model and an actor critic behavior to train robots from small amounts of experience in the real world, without using simulators. At a high level, DayDreamer consists of two processes. The actor process interacts with the environment and stores experiences into the replay buffer. The learner samples data from the replay buffer to train the world model, and then uses imagined predictions of the world model to train the behavior.

To learn from proprioceptive and visual inputs alike, the world model fuses the sensory inputs of the same time step together into a compact discrete representation. A recurrent neural network predicts the sequence of these representations given actions. From the resulting recurrent states and representations, DayDreamer reconstructs its inputs and predicts rewards and episode ends.

Given the world model, the actor critic learns farsighted behaviors using on-policy reinforcement learning purely inside the representation space of the world model.

For more information:

Setup

pip install tensorflow tensorflow_probability ruamel.yaml cloudpickle

Instructions

To run DayDreamer, open two terminals to execute the commands for the learner bnd the actor in parallel. To view metrics, point TensorBoard at the log directory. For more information, also see the DreamerV2 repository.

A1 Robot:

rm -rf ~/logdir/run1

CUDA_VISIBLE_DEVICES=0 python embodied/agents/dreamerv2plus/train.py --configs a1 --task a1_sim --run learning --tf.platform gpu --logdir ~/logdir/run1

CUDA_VISIBLE_DEVICES=1 python embodied/agents/dreamerv2plus/train.py --configs a1 --task a1_real --run acting --tf.platform gpu --env.kbreset True --imag_horizon 1 --replay_chunk 8 --replay_fixed.minlen 32 --imag_horizon 1 --logdir ~/logdir/run1

XArm Robot:

rm -rf ~/logdir/run1

CUDA_VISIBLE_DEVICES=0 python embodied/agents/dreamerv2plus/train.py --configs xarm --run learning --task xarm_dummy --tf.platform gpu --logdir ~/logdir/run1

CUDA_VISIBLE_DEVICES=-1 python embodied/agents/dreamerv2plus/train.py --configs xarm --run acting --task xarm_real --env.kbreset True --tf.platform cpu --tf.jit False --logdir ~/logdir/run1

UR5 Robot:

rm -rf ~/logdir/run1

CUDA_VISIBLE_DEVICES=0 python embodied/agents/dreamerv2plus/train.py --configs ur5 --run learning --task ur5_dummy --tf.platform gpu --logdir ~/logdir/run11

CUDA_VISIBLE_DEVICES=1 python embodied/agents/dreamerv2plus/train.py --configs ur5 --run acting --task ur5_real --env.kbreset True --tf.platform cpu --tf.jit False --logdir ~/logdir/run11

Questions

Please open an issue on Github.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
drqv2		drqv2
embodied		embodied
media		media
motion_imitation		motion_imitation
pytorch_sac		pytorch_sac
rainbow		rainbow
scores		scores
third_party		third_party
.gitignore		.gitignore
README.md		README.md
ppo.py		ppo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

drqv2

drqv2

embodied

embodied

media

media

motion_imitation

motion_imitation

pytorch_sac

pytorch_sac

rainbow

rainbow

scores

scores

third_party

third_party

.gitignore

.gitignore

README.md

README.md

ppo.py

ppo.py

Repository files navigation

DayDreamer: World Models for Physical Robot Learning

Method

Setup

Instructions

Questions

About

Releases

Packages

Contributors 2

Languages

danijar/daydreamer

Folders and files

Latest commit

History

Repository files navigation

DayDreamer: World Models for Physical Robot Learning

Method

Setup

Instructions

Questions

About

Topics

Resources

Stars

Watchers

Forks

Languages