You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving

Website | Paper | Video


VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving

Zilin Huang1,*, Zihao Sheng1,*, Yansong Qu2,†, Junwei You1, Sikai Chen1,βœ‰

1University of Wisconsin-Madison, 2Purdue University

*Equally Contributing First Authors, βœ‰Corresponding Author

πŸ“’ News

  • 2025.08: πŸ”₯πŸ”₯ VLM-RL has been accepted to Transportation Research Part C: Emerging Technologies!
    We will release the model weights on Hugging Face soon. Stay tuned!

πŸ’‘ Highlights

πŸ”₯ To the best of our knowledge, VLM-RL is the first work in the autonomous driving field to unify VLMs with RL for end-to-end driving policy learning in the CARLA simulator.

🏁 VLM-RL outperforms state-of-the-art baselines, achieving a 10.5% reduction in collision rate, a 104.6% increase in route completion rate, and robust generalization to unseen driving scenarios.

Route 1 Route 2 Route 3 Route 4 Route 5
Route 1 Route 2 Route 3 Route 4 Route 5
Route 6 Route 7 Route 8 Route 9 Route 10
Route 6 Route 7 Route 8 Route 9 Overtake

πŸ“‹ Table of Contents

  1. Highlights
  2. Getting Started
  3. Training
  4. Evaluation
  5. Contributors
  6. Citation
  7. Other Resources

πŸ› οΈ Getting Started

  1. Download and install CARLA 0.9.13 from the official release page.
  2. Create a conda env and install the requirements:
# Clone the repo
git clone https://github.com/zihaosheng/VLM-RL.git
cd VLM-RL

# Create a conda env
conda create -y -n vlm-rl python=3.8
conda activate vlm-rl

# Install PyTorch
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116

# Install the requirements
pip install -r requirements.txt
  1. Start a Carla server with the following command. You can ignore this if start_carla=True
./CARLA_0.9.13/CarlaUE4.sh -quality_level=Low -benchmark -fps=15 -RenderOffScreen -prefernvidia -carla-world-port=2000

If start_carla=True, revise the CARLA_ROOT in carla_env/envs/carla_route_env.py to the path of your CARLA installation.

(back to top)

πŸš‹ Training

Training VLM-RL

To reproduce the results in the paper, we provide the following training scripts:

python train.py --config=vlm_rl --start_carla --no_render --total_timesteps=1_000_000 --port=2000 --device=cuda:0

Note: On the first run, the script will automatically download the required OpenCLIP pre-trained model, which may take a few minutes. Please wait for the download to complete before the training begins.

To accelerate the training process, you can set up multiple CARLA servers running in parallel.

For example, to train the VLM-RL model with 3 CARLA servers on different GPUs, run the following commands in three separate terminals:

Terminal 1:

python train.py --config=vlm_rl --start_carla --no_render --total_timesteps=1_000_000 --port=2000 --device=cuda:0

Terminal 2:

python train.py --config=vlm_rl --start_carla --no_render --total_timesteps=1_000_000 --port=2005 --device=cuda:1

Terminal 3:

python train.py --config=vlm_rl --start_carla --no_render --total_timesteps=1_000_000 --port=2010 --device=cuda:2

To train the VLM-RL model with PPO, run:

python train.py --config=vlm_rl_ppo --start_carla --no_render --total_timesteps=1_000_000 --port=2000 --device=cuda:0

Training Baselines

To train baseline models, simply change the --config argument to the desired model. For example, to train the TIRL-SAC model, run:

python train.py --config=tirl_sac --start_carla --no_render --total_timesteps=1_000_000 --port=2000 --device=cuda:0

More baseline models can be found in the CONFIGS dictionary of config.py.

(back to top)

πŸ“Š Evaluation

To evaluate trained model checkpoints, run:

python run_eval.py

Note: that this command will first KILL all the existing CARLA servers and then start a new one. Try to avoid running this command while training is in progress.

(back to top)

πŸ‘₯ Contributors

Special thanks to the following contributors who have helped with this project:

zihaosheng
Zihao Sheng
zilinhuang
Zilin Huang
yansongqu
Yansong Qu
junweiyou
Junwei You

(back to top)

🎯 Citation

If you find VLM-RL useful for your research, please consider giving us a star 🌟 and citing our paper:

@article{huang2024vlmrl,
  title={VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving},
  author={Huang, Zilin and Sheng, Zihao and Qu, Yansong and You, Junwei and Chen, Sikai},
  journal={arXiv preprint arXiv:2412.15544},
  year={2024}
}

(back to top)

πŸ“š Other Resources

Our team is actively working on research projects in the field of AI and autonomous driving. Here are a few of them you might find interesting:

(back to top)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support