leaderboard / benchmark_submission.md
benediktstroebl's picture
init v1
7c691e6
|
raw
history blame
496 Bytes

To submit a new benchmark to the library:

  1. Implement a new benchmark using some standard format (such as the METR Task Standard). This includes specifying the exact instructions for each tasks as well as the task environment that is provided inside the container the agent is run in.

  2. We will encourage developers to support running their tasks on separate VMs and specify the exact hardware specifications for each task in the task environment.