|
---
|
|
license: mit
|
|
---
|
|
|
|
# Tau LLM Unity ML Agents Project
|
|
|
|
Welcome to the Tau LLM Unity ML Agents Project repository! This project focuses on training reinforcement learning agents using Unity ML-Agents and the PPO algorithm. Our goal is to optimize the performance of the agents through various configurations and training runs.
|
|
|
|
## Project Overview
|
|
|
|
This repository contains the code and configurations for training agents in a Unity environment using the Proximal Policy Optimization (PPO) algorithm. The agents are designed to learn and adapt to their environment, improving their performance over time.
|
|
|
|
### Key Features
|
|
|
|
- **Reinforcement Learning**: Utilizes the PPO algorithm for training agents.
|
|
- **Unity ML-Agents**: Integrates with Unity ML-Agents for a seamless training experience.
|
|
- **Custom Reward Functions**: Implements gradient-based reward functions for nuanced feedback.
|
|
- **Memory Networks**: Incorporates memory networks to handle temporal dependencies.
|
|
- **TensorBoard Integration**: Monitors training progress and performance using TensorBoard.
|
|
|
|
## Configuration
|
|
|
|
Below is the configuration used for training the agents:
|
|
|
|
```yaml
|
|
behaviors:
|
|
TauAgent:
|
|
trainer_type: ppo
|
|
hyperparameters:
|
|
batch_size: 256
|
|
buffer_size: 4096
|
|
learning_rate: 0.00003
|
|
beta: 0.005
|
|
epsilon: 0.2
|
|
lambd: 0.95
|
|
num_epoch: 10
|
|
learning_rate_schedule: linear
|
|
network_settings:
|
|
normalize: true
|
|
hidden_units: 256
|
|
num_layers: 4
|
|
vis_encode_type: simple
|
|
memory:
|
|
memory_size: 256
|
|
sequence_length: 256
|
|
num_layers: 4
|
|
reward_signals:
|
|
extrinsic:
|
|
gamma: 0.99
|
|
strength: 1.0
|
|
curiosity:
|
|
gamma: 0.995
|
|
strength: 0.1
|
|
network_settings:
|
|
normalize: true
|
|
hidden_units: 256
|
|
num_layers: 4
|
|
learning_rate: 0.00003
|
|
keep_checkpoints: 10
|
|
checkpoint_interval: 100000
|
|
threaded: true
|
|
max_steps: 3000000
|
|
time_horizon: 256
|
|
summary_freq: 10000
|
|
```
|
|
|
|
## Model Naming Convention
|
|
|
|
The models in this repository follow the naming convention `Tau_<series>_<max_steps>`. This helps in easily identifying the series and the number of training steps for each model.
|
|
|
|
## Getting Started
|
|
|
|
### Prerequisites
|
|
|
|
- Unity 6
|
|
- Unity ML-Agents Toolkit
|
|
- Python 3.10.11
|
|
- PyTorch
|
|
- Transformers
|
|
|
|
### Installation
|
|
|
|
1. Clone the repository:
|
|
```bash
|
|
git clone https://github.com/p3nGu1nZz/Tau.git
|
|
cd tau\MLAgentsProject
|
|
```
|
|
|
|
2. Install the required Python packages:
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
3. Open the Unity project:
|
|
- Launch Unity Hub and open the project folder.
|
|
|
|
### Training the Agent
|
|
|
|
To start training the agent, run the following command:
|
|
```bash
|
|
mlagents-learn .\config\tau_agent_ppo_c.yaml --run-id=tau_agent_ppo_A0 --env .\Build --torch-device cuda --timeout-wait 300 --force
|
|
```
|
|
Note: The preferred way to run a build is by creating a new build into the `Build` directory which is referenced by the above command.
|
|
|
|
### Monitoring Training
|
|
|
|
You can monitor the training progress using TensorBoard:
|
|
```bash
|
|
tensorboard --logdir results
|
|
```
|
|
|
|
## Results
|
|
|
|
The training results, including the average reward and cumulative reward, can be visualized using TensorBoard. The graphs below show the performance of the agent over time:
|
|
|
|
![Average Reward](path/to/average_reward.png)
|
|
![Cumulative Reward](path/to/cumulative_reward.png)
|
|
|
|
## Citation
|
|
|
|
If you use this project in your research, please cite it as follows:
|
|
|
|
```bibtex
|
|
@misc{Tau,
|
|
author = {K. Rawson},
|
|
title = {Tau LLM Unity ML Agents Project},
|
|
year = {2024},
|
|
publisher = {GitHub},
|
|
journal = {GitHub repository},
|
|
howpublished = {\url{https://github.com/p3nGu1nZz/Tau}},
|
|
}
|
|
```
|
|
|
|
## License
|
|
|
|
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
|
|
|
## Acknowledgments
|
|
|
|
- Unity ML-Agents Toolkit
|
|
- TensorFlow and PyTorch communities
|
|
- Hugging Face for hosting the model repository |