Adding Evaluation Results

#1
Files changed (1) hide show
  1. README.md +14 -0
README.md CHANGED
@@ -290,3 +290,17 @@ def tokenize_single_input(tokenizer, prompt):
290
  To explore conditional language models, you can also set `prefix = "Assistant GPT3:"` to mimic ChatGPT behavior (this may cause performance degradation).
291
 
292
  *Hint: In BPE, `tokenize(A) + tokenize(B)` does not always equals to `tokenize(A + B)`*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
290
  To explore conditional language models, you can also set `prefix = "Assistant GPT3:"` to mimic ChatGPT behavior (this may cause performance degradation).
291
 
292
  *Hint: In BPE, `tokenize(A) + tokenize(B)` does not always equals to `tokenize(A + B)`*
293
+
294
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
295
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_TheBloke__openchat_v2_openorca_preview-GPTQ)
296
+
297
+ | Metric | Value |
298
+ |-----------------------|---------------------------|
299
+ | Avg. | 31.46 |
300
+ | ARC (25-shot) | 27.99 |
301
+ | HellaSwag (10-shot) | 26.06 |
302
+ | MMLU (5-shot) | 24.24 |
303
+ | TruthfulQA (0-shot) | 50.08 |
304
+ | Winogrande (5-shot) | 70.64 |
305
+ | GSM8K (5-shot) | 13.27 |
306
+ | DROP (3-shot) | 7.96 |