File size: 4,140 Bytes
ffd41ae
 
 
f048d67
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e4dbea9
 
f048d67
 
 
 
 
 
 
 
 
 
 
 
 
 
e4dbea9
f048d67
e4dbea9
f048d67
 
 
 
 
e4dbea9
f048d67
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
---

license: mit
---


# Tau LLM Unity ML Agents Project

Welcome to the Tau LLM Unity ML Agents Project repository! This project focuses on training reinforcement learning agents using Unity ML-Agents and the PPO algorithm. Our goal is to optimize the performance of the agents through various configurations and training runs.

## Project Overview

This repository contains the code and configurations for training agents in a Unity environment using the Proximal Policy Optimization (PPO) algorithm. The agents are designed to learn and adapt to their environment, improving their performance over time.

### Key Features

- **Reinforcement Learning**: Utilizes the PPO algorithm for training agents.
- **Unity ML-Agents**: Integrates with Unity ML-Agents for a seamless training experience.
- **Custom Reward Functions**: Implements gradient-based reward functions for nuanced feedback.
- **Memory Networks**: Incorporates memory networks to handle temporal dependencies.
- **TensorBoard Integration**: Monitors training progress and performance using TensorBoard.

## Configuration

Below is the configuration used for training the agents:

```yaml

behaviors:

  TauAgent:

    trainer_type: ppo

    hyperparameters:

      batch_size: 256

      buffer_size: 4096

      learning_rate: 0.00003

      beta: 0.005

      epsilon: 0.2

      lambd: 0.95

      num_epoch: 10

      learning_rate_schedule: linear

    network_settings:

      normalize: true

      hidden_units: 256

      num_layers: 4

      vis_encode_type: simple

      memory:

        memory_size: 256

        sequence_length: 256

        num_layers: 4

    reward_signals:

      extrinsic:

        gamma: 0.99

        strength: 1.0

      curiosity:

        gamma: 0.995

        strength: 0.1

        network_settings:

          normalize: true

          hidden_units: 256

          num_layers: 4

          learning_rate: 0.00003

    keep_checkpoints: 10

    checkpoint_interval: 100000

    threaded: true

    max_steps: 3000000

    time_horizon: 256

    summary_freq: 10000

```

## Model Naming Convention

The models in this repository follow the naming convention `Tau_<series>_<max_steps>`. This helps in easily identifying the series and the number of training steps for each model.

## Getting Started

### Prerequisites

- Unity 6
- Unity ML-Agents Toolkit
- Python 3.10.11
- PyTorch
- Transformers

### Installation

1. Clone the repository:
   ```bash

   git clone https://github.com/p3nGu1nZz/Tau.git

   cd tau\MLAgentsProject

   ```

2. Install the required Python packages:
   ```bash

   pip install -r requirements.txt

   ```

3. Open the Unity project:
   - Launch Unity Hub and open the project folder.

### Training the Agent

To start training the agent, run the following command:
```bash

mlagents-learn .\config\tau_agent_ppo_c.yaml --run-id=tau_agent_ppo_A0 --env .\Build --torch-device cuda --timeout-wait 300 --force

```
Note: The preferred way to run a build is by creating a new build into the `Build` directory which is referenced by the above command.

### Monitoring Training

You can monitor the training progress using TensorBoard:
```bash

tensorboard --logdir results 

```

## Results

The training results, including the average reward and cumulative reward, can be visualized using TensorBoard. The graphs below show the performance of the agent over time:

![Average Reward](path/to/average_reward.png)
![Cumulative Reward](path/to/cumulative_reward.png)

## Citation

If you use this project in your research, please cite it as follows:

```bibtex

@misc{Tau,

  author = {K. Rawson},

  title = {Tau LLM Unity ML Agents Project},

  year = {2024},

  publisher = {GitHub},

  journal = {GitHub repository},

  howpublished = {\url{https://github.com/p3nGu1nZz/Tau}},

}

```

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- Unity ML-Agents Toolkit
- TensorFlow and PyTorch communities
- Hugging Face for hosting the model repository