sb3/ppo-Walker2d-v3 · Trouble reproducing results

May 1, 2023

Hello,

I am attempting to use the ppo-Walker2d-v3.zip model, but am unable to get near the reported mean return of 3571. Instead, I consistently get a return of ~1 for episodes of 1000 steps. The snippet below illustrates how I am setting up the environment, perhaps the issue is here?

from sb3_contrib.common.wrappers import TimeFeatureWrapper
from gym.wrappers import NormalizeObservation
from stable_baselines3 import PPO
from huggingface_sb3 import load_from_hub
import gym

checkpoint = load_from_hub(
    repo_id="sb3/ppo-Walker2d-v3",
    filename="ppo-Walker2d-v3.zip",
)
custom_objects = {
    "learning_rate": 0.0,
    "lr_schedule": lambda _: 0.0,
    "clip_range": lambda _: 0.0,
}
expert_model = PPO.load(checkpoint,custom_objects=custom_objects)

env_name = "Walker2d-v3"
venv = gym.make(env_name)
venv = NormalizeObservation(venv)
venv = TimeFeatureWrapper(venv)

araffin

Stable-Baselines3 org May 2, 2023

Hello,
Please follow the instructions and use the RL Zoo. You are not loading the normalizing statistics.

Stephanehk changed discussion status to closed May 2, 2023