Spaces:
Runtime error
Runtime error
pminervini
commited on
Commit
•
1a9b14a
1
Parent(s):
51b654e
update
Browse files- src/display/about.py +4 -2
src/display/about.py
CHANGED
@@ -57,8 +57,10 @@ LLM_BENCHMARKS_DETAILS = f"""
|
|
57 |
To reproduce our results, here is the commands you can run, using [this script](https://huggingface.co/spaces/hallucinations-leaderboard/leaderboard/blob/main/backend-cli.py): python backend-cli.py.
|
58 |
|
59 |
Alternatively, if you're interested in evaluating a specific task with a particular model, you can use the [EleutherAI LLM Evaluation Harness library](https://github.com/EleutherAI/lm-evaluation-harness/) as follows:
|
60 |
-
`python main.py --model=hf-
|
61 |
-
` --tasks=<task_list> --num_fewshot=<n_few_shot> --batch_size=1 --output_path=<output_path>`
|
|
|
|
|
62 |
|
63 |
The total batch size we get for models which fit on one A100 node is 8 (8 GPUs * 1). If you don't use parallelism, adapt your batch size to fit. You can expect results to vary slightly for different batch sizes because of padding.
|
64 |
|
|
|
57 |
To reproduce our results, here is the commands you can run, using [this script](https://huggingface.co/spaces/hallucinations-leaderboard/leaderboard/blob/main/backend-cli.py): python backend-cli.py.
|
58 |
|
59 |
Alternatively, if you're interested in evaluating a specific task with a particular model, you can use the [EleutherAI LLM Evaluation Harness library](https://github.com/EleutherAI/lm-evaluation-harness/) as follows:
|
60 |
+
`python main.py --model=hf-auto --model_args="pretrained=<your_model>,revision=<your_model_revision>,parallelize=True"`
|
61 |
+
` --tasks=<task_list> --num_fewshot=<n_few_shot> --batch_size=1 --output_path=<output_path>`
|
62 |
+
|
63 |
+
Note that the Hallucinations Library includes several tasks definitions that are not included in the Harness library -- you can find them at [this link](https://huggingface.co/spaces/hallucinations-leaderboard/leaderboard/tree/main/src/backend/tasks)).
|
64 |
|
65 |
The total batch size we get for models which fit on one A100 node is 8 (8 GPUs * 1). If you don't use parallelism, adapt your batch size to fit. You can expect results to vary slightly for different batch sizes because of padding.
|
66 |
|