Model Card for SafeDreamer
Official release of SafeDreamer checkpoints for the paper
SafeDreamer: Safe Reinforcement Learning with World Models by
Weidong Huang, Jiaming Ji, Chunhe Xia, Borong Zhang, Yaodong Yang
Quick links: [Website] [Paper]
Model Details
We open-source a total of 80+ SafeDreamer model checkpoints. We are excited to see what the community will do with these models, and hope that our release will encourage other research labs to open-source their checkpoints as well. This section aims to provide further details about the released models. The model is named following the format: date_algorithm_environment_task_seedId, such as 20240228-145125_bsrp_lag_safetygym_SafetyRacecarButton1-v0_0.ckpt.
Model Description
- Developed by: Weidong Huang
- Model type: SafeDreamer models trained on tasks from Safety-Gymnasium.
- License: apache-2.0.
Model Sources
- Repository: https://github.com/PKU-Alignment/SafeDreamer
- Paper: https://arxiv.org/abs/2307.07176
Uses
Our SafeDreamer checkpoints represent one of the initial significant releases for safe reinforcement learning models. They offer wide-ranging possibilities. We believe these checkpoints will aid researchers in training, fine-tuning, evaluating, and studying models across the 20 safety control tasks we've provided models for. Yet, we also anticipate the community will find new ways to use these checkpoints.
Direct Use
You can load model checkpoints with the official implementation to replicate our results or create paths for any task it supports.
Out-of-Scope Use
We anticipate that our model checkpoints, in their current form, will not generalize effectively to novel (unseen) tasks. Most likely, utilizing these models for specific target tasks will necessitate a degree of fine-tuning with relevant task data.
How to Get Started with the Models
See the official implementation for how to install and examples of how to use it.
Training Procedure
We used the official implementation with standard settings to train our checkpoints. While most models were trained until they stopped improving, a few were not. For a detailed look at how each model performed on different tasks, see the task-specific graphs in our paper.
Environmental Impact
Carbon emissions are estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: NVIDIA GeForce RTX 3090
- Hours used: Approx. 50,000
- Provider: Private infrastructure
- Carbon Emitted: Approx. 7560 kg CO2eq
Citation
If you find our work useful, please consider citing the paper as follows:
BibTeX:
@inproceedings{
safedreamer,
title={SafeDreamer: Safe Reinforcement Learning with World Models},
author={Weidong Huang and Jiaming Ji and Borong Zhang and Chunhe Xia and Yaodong Yang},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=tsE5HLYtYg}
}
Contact
Correspondence to: Weidong Huang