Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
eduagarcia
commited on
Commit
•
43c2b1a
1
Parent(s):
aa7060a
add dynamic documentation for RAW_RESULTS_REPO
Browse files- src/display/about.py +4 -4
src/display/about.py
CHANGED
@@ -1,6 +1,6 @@
|
|
1 |
from src.display.utils import ModelType
|
2 |
from src.display.utils import Tasks
|
3 |
-
from src.envs import REPO_ID, QUEUE_REPO, RESULTS_REPO, PATH_TO_COLLECTION, LEADERBOARD_NAME, TRUST_REMOTE_CODE, TASK_CONFIG
|
4 |
|
5 |
LM_EVAL_URL = "https://github.com/eduagarcia/lm-evaluation-harness-pt"
|
6 |
|
@@ -72,7 +72,7 @@ We chose these benchmarks as they test a variety of reasoning and general knowle
|
|
72 |
## Details and logs
|
73 |
You can find:
|
74 |
- detailed numerical results in the `results` Hugging Face dataset: https://huggingface.co/datasets/{RESULTS_REPO}
|
75 |
-
- details on the input/outputs for the models in the `details` of each model, that you can access by clicking the 📄 emoji after the model name
|
76 |
- community queries and running status in the `requests` Hugging Face dataset: https://huggingface.co/datasets/{QUEUE_REPO}
|
77 |
|
78 |
## Reproducibility
|
@@ -140,10 +140,10 @@ How can I report an evaluation failure?
|
|
140 |
|
141 |
## 2) Model results
|
142 |
What kind of information can I find?
|
143 |
-
- *Let's imagine you are interested in the Yi-34B results. You have access to 3 different information categories:*
|
144 |
- *The [request file](https://huggingface.co/datasets/{QUEUE_REPO}/blob/main/01-ai/Yi-34B_eval_request_False_bfloat16_Original.json): it gives you information about the status of the evaluation*
|
145 |
- *The [aggregated results folder](https://huggingface.co/datasets/{RESULTS_REPO}/tree/main/01-ai/Yi-34B): it gives you aggregated scores, per experimental run*
|
146 |
-
- *The [details dataset](https://huggingface.co/datasets/{
|
147 |
|
148 |
|
149 |
Why do models appear several times in the leaderboard?
|
|
|
1 |
from src.display.utils import ModelType
|
2 |
from src.display.utils import Tasks
|
3 |
+
from src.envs import REPO_ID, QUEUE_REPO, RESULTS_REPO, PATH_TO_COLLECTION, LEADERBOARD_NAME, TRUST_REMOTE_CODE, TASK_CONFIG, RAW_RESULTS_REPO
|
4 |
|
5 |
LM_EVAL_URL = "https://github.com/eduagarcia/lm-evaluation-harness-pt"
|
6 |
|
|
|
72 |
## Details and logs
|
73 |
You can find:
|
74 |
- detailed numerical results in the `results` Hugging Face dataset: https://huggingface.co/datasets/{RESULTS_REPO}
|
75 |
+
{"- details on the input/outputs for the models in the `details` of each model, that you can access by clicking the 📄 emoji after the model name" if RAW_RESULTS_REPO is not None else ""}
|
76 |
- community queries and running status in the `requests` Hugging Face dataset: https://huggingface.co/datasets/{QUEUE_REPO}
|
77 |
|
78 |
## Reproducibility
|
|
|
140 |
|
141 |
## 2) Model results
|
142 |
What kind of information can I find?
|
143 |
+
- *Let's imagine you are interested in the Yi-34B results. You have access to {"3" if RAW_RESULTS_REPO is not None else "2"} different information categories:*
|
144 |
- *The [request file](https://huggingface.co/datasets/{QUEUE_REPO}/blob/main/01-ai/Yi-34B_eval_request_False_bfloat16_Original.json): it gives you information about the status of the evaluation*
|
145 |
- *The [aggregated results folder](https://huggingface.co/datasets/{RESULTS_REPO}/tree/main/01-ai/Yi-34B): it gives you aggregated scores, per experimental run*
|
146 |
+
{"- *The [details dataset](https://huggingface.co/datasets/{RAW_RESULTS_REPO}/tree/main/01-ai/Yi-34B): it gives you the full details (scores and examples for each task and a given model)*" if RAW_RESULTS_REPO is not None else ""}
|
147 |
|
148 |
|
149 |
Why do models appear several times in the leaderboard?
|