File size: 2,921 Bytes
239b81d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "Ac6wadk3rmkK"
},
"source": [
"# LM Evaluation Harness (by [EleutherAI](https://www.eleuther.ai/) & [Laiviet](https://github.com/laiviet/lm-evaluation-harness))\n",
"\n",
"This [`LM-Evaluation-Harness`](https://github.com/EleutherAI/lm-evaluation-harness) provides a unified framework to test generative language models on a large number of different evaluation tasks. For a complete list of available tasks, see the [task table](https://github.com/EleutherAI/lm-evaluation-harness/blob/master/docs/task_table.md), or scroll to the bottom of the page.\n",
"\n",
"1. Clone the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and install the necessary libraries (`sentencepiece` is required for the Llama tokenizer)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "UA5I86u91e0A"
},
"outputs": [],
"source": [
"!git clone --branch main https://github.com/laiviet/lm-evaluation-harness.git\n",
"!cd lm-evaluation-harness && pip install -e . -q\n",
"!pip install cohere tiktoken sentencepiece -q"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "pnHoAVK25QZn"
},
"outputs": [],
"source": [
"!huggingface-cli login --token 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\n",
"!cd lm-evaluation-harness && python main.py \\\n",
" --model hf-auto \\\n",
" --model_args pretrained=nicholasKluge/Aira-2-portuguese-124M \\\n",
" --tasks arc_pt,truthfulqa_pt \\\n",
" --device cuda:0 \\\n",
" --model_alias Aira-2-portuguese-124M \\\n",
" --task_alias open_llm"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4Bm78wiZ4Own"
},
"source": [
"## Task Table 📚\n",
"\n",
"| Task Name | Train | Val | Test | Val/Test Docs | Metrics |\n",
"|----------------|-------|-----|------|--:------------|---------------|\n",
"| arc_pt,mmlu_pt | ✓ | ✓ | ✓ | 1172 | acc, acc_norm |\n",
"| hellaswag_pt | ✓ | ✓ | | 10042 | acc, acc_norm |\n",
"| mmlu_pt | | ✓ | ✓ | 1,662 | acc, acc_norm |\n",
"| truthfulqa_pt | | ✓ | | 817 | mc1, mc2 | "
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"provenance": [],
"machine_shape": "hm",
"gpuType": "T4"
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
} |