|
# Profiling in Python |
|
|
|
As part of the ML-Agents Tookit, we provide a lightweight profiling system, in |
|
order to identity hotspots in the training process and help spot regressions |
|
from changes. |
|
|
|
Timers are hierarchical, meaning that the time tracked in a block of code can be |
|
further split into other blocks if desired. This also means that a function that |
|
is called from multiple places in the code will appear in multiple places in the |
|
timing output. |
|
|
|
All timers operate using a "global" instance by default, but this can be |
|
overridden if necessary (mainly for testing). |
|
|
|
## Adding Profiling |
|
|
|
There are two ways to indicate code should be included in profiling. The |
|
simplest way is to add the `@timed` decorator to a function or method of |
|
interested. |
|
|
|
```python |
|
class TrainerController: |
|
# .... |
|
@timed |
|
def advance(self, env: EnvManager) -> int: |
|
# do stuff |
|
``` |
|
|
|
You can also used the `hierarchical_timer` context manager. |
|
|
|
```python |
|
with hierarchical_timer("communicator.exchange"): |
|
outputs = self.communicator.exchange(step_input) |
|
``` |
|
|
|
The context manager may be easier than the `@timed` decorator for profiling |
|
different parts of a large function, or profiling calls to abstract methods that |
|
might not use decorator. |
|
|
|
## Output |
|
|
|
By default, at the end of training, timers are collected and written in json |
|
format to `{summaries_dir}/{run_id}_timers.json`. The output consists of node |
|
objects with the following keys: |
|
|
|
- total (float): The total time in seconds spent in the block, including child |
|
calls. |
|
- count (int): The number of times the block was called. |
|
- self (float): The total time in seconds spent in the block, excluding child |
|
calls. |
|
- children (dictionary): A dictionary of child nodes, keyed by the node name. |
|
- is_parallel (bool): Indicates that the block of code was executed in multiple |
|
threads or processes (see below). This is optional and defaults to false. |
|
|
|
### Parallel execution |
|
|
|
#### Subprocesses |
|
|
|
For code that executes in multiple processes (for example, |
|
SubprocessEnvManager), we periodically send the timer information back to the |
|
"main" process, aggregate the timers there, and flush them in the subprocess. |
|
Note that (depending on the number of processes) this can result in timers where |
|
the total time may exceed the parent's total time. This is analogous to the |
|
difference between "real" and "user" values reported from the unix `time` |
|
command. In the timer output, blocks that were run in parallel are indicated by |
|
the `is_parallel` flag. |
|
|
|
#### Threads |
|
|
|
Timers currently use `time.perf_counter()` to track time spent, which may not |
|
give accurate results for multiple threads. If this is problematic, set |
|
`threaded: false` in your trainer configuration. |
|
|