Update README.md
Browse files
README.md
CHANGED
@@ -29,3 +29,22 @@ Training progress:
|
|
29 |
Numbers on X axis are average over 40 episodes, each lasting for about 500 timesteps on average. So in total the agent was trained over about 5e6 timesteps.
|
30 |
Learning rate decay schedule: <code>torch.optim.lr_scheduler.StepLR(opt, step_size=4000, gamma=0.7)</code>
|
31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
Numbers on X axis are average over 40 episodes, each lasting for about 500 timesteps on average. So in total the agent was trained over about 5e6 timesteps.
|
30 |
Learning rate decay schedule: <code>torch.optim.lr_scheduler.StepLR(opt, step_size=4000, gamma=0.7)</code>
|
31 |
|
32 |
+
Minimal code to use the agent:</br>
|
33 |
+
<pre><code>
|
34 |
+
import gym</br>
|
35 |
+
</br>
|
36 |
+
env_name = 'LunarLanderContinuous-v2'</br>
|
37 |
+
env = gym.make(env_name)</br>
|
38 |
+
agent = torch.load('best_models/best_reinforce_lunar_lander_cont_model_269.402.pt')</br>
|
39 |
+
render = True</br>
|
40 |
+
observation = env.reset()</br>
|
41 |
+
while True:</br>
|
42 |
+
if render:</br>
|
43 |
+
env.render()</br>
|
44 |
+
action = agent.act(observation)</br>
|
45 |
+
observation, reward, done, info = env.step(action)</br>
|
46 |
+
</br>
|
47 |
+
if done:</br>
|
48 |
+
break</br>
|
49 |
+
env.close()</br>
|
50 |
+
</code></pre>
|