File size: 7,152 Bytes
51ee2ef 8d8c0a9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
---
language:
- en
license: apache-2.0
tags:
- pytorch
- causal-lm
- pythia
datasets:
- hellaswag
metrics:
- accuracy
---
# Model Card for EleutherAI/pythia-160m HellaSwag Evaluation
This model card presents the evaluation results of the EleutherAI/pythia-160m model on the HellaSwag task.
## Model Details
### Model Description
- **Developed by:** EleutherAI
- **Model type:** Causal Language Model
- **Language(s):** English
- **License:** Apache 2.0
- **Finetuned from model:** EleutherAI/pythia-160m
### Model Sources
- **Repository:** [EleutherAI/pythia-160m](https://huggingface.co/EleutherAI/pythia-160m)
- **Paper:** [More Information Needed]
## Uses
### Direct Use
This evaluation demonstrates the model's performance on the HellaSwag task, which tests for commonsense reasoning in AI systems.
### Out-of-Scope Use
This evaluation is specific to the HellaSwag task and may not be indicative of the model's performance on other tasks or in real-world applications.
## Bias, Risks, and Limitations
The evaluation results should be interpreted within the context of the HellaSwag task. The model may exhibit biases present in its training data or the evaluation dataset.
### Recommendations
Users should be aware of the model's limitations and consider additional evaluation on task-specific datasets before deployment in real-world applications.
## How to Get Started with the Model
To use this model for the HellaSwag task:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("EleutherAI/pythia-160m", revision="step100000")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-160m", revision="step100000")
# Use the model for the HellaSwag task
```
## Training Details
### Training Data
The model was evaluated on the HellaSwag dataset. For more information, visit [the HellaSwag dataset page](https://huggingface.co/datasets/hellaswag).
### Training Procedure
#### Training Hyperparameters
- **Training regime:** float32
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
The model was evaluated on the HellaSwag dataset, which consists of 10,042 samples.
#### Metrics
- **Accuracy (acc):** Measures the proportion of correctly predicted answers.
- **Normalized Accuracy (acc_norm):** A variant of accuracy that accounts for potential biases in the dataset.
### Results
| Metric | Value | Standard Error |
|--------|-------|----------------|
| Accuracy | 0.28719 | 0.00452 |
| Normalized Accuracy | 0.30821 | 0.00461 |
## Environmental Impact
- **Hardware Type:** Tesla T4 GPU
- **Hours used:** Approximately 0.095 hours (341.39 seconds)
- **Cloud Provider:** [More Information Needed]
- **Compute Region:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]
## Technical Specifications
### Model Architecture and Objective
EleutherAI/pythia-160m is a causal language model with approximately 162 million parameters.
### Compute Infrastructure
- **Hardware:** Tesla T4 GPU
- **Software:** PyTorch 2.4.1+cu121, Transformers 4.44.2
- **Date of Evaluation:** October 18, 2024
### Command
```
lm_eval --model hf \
--model_args pretrained=EleutherAI/pythia-160m,revision=step100000,dtype="float" \
--tasks hellaswag \
--device cuda \
--batch_size auto:4 \
--output_path hellaswag_test \
--log_samples
```
#### Command output
```
Passed argument batch_size = auto:4.0. Detecting largest batch size
Determined largest batch size: 64
Passed argument batch_size = auto:4.0. Detecting largest batch size
Determined largest batch size: 64
hf (pretrained=EleutherAI/pythia-160m,revision=step100000,dtype=float), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto:4 (64,64,64,64,64)
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag| 1|none | 0|acc |β |0.2872|Β± |0.0045|
| | |none | 0|acc_norm|β |0.3082|Β± |0.0046|
2024-10-18 12:25:25.770584: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-18 12:25:25.847675: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-18 12:25:25.887843: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-18 12:25:25.961158: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-10-18 12:25:27.647707: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-10-18:12:25:29,450 INFO [__main__.py:279] Verbosity set to INFO
2024-10-18:12:25:42,060 INFO [__main__.py:376] Selected Tasks: ['hellaswag']
2024-10-18:12:25:42,062 INFO [evaluator.py:164] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
2024-10-18:12:25:42,062 INFO [evaluator.py:201] Initializing hf model, with arguments: {'pretrained': 'EleutherAI/pythia-160m', 'revision': 'step100000', 'dtype': 'float'}
2024-10-18:12:25:42,128 INFO [huggingface.py:129] Using device 'cuda'
2024-10-18:12:25:42,395 INFO [huggingface.py:481] Using model type 'default'
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
warnings.warn(
2024-10-18:12:25:42,769 INFO [huggingface.py:365] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda'}
2024-10-18:12:25:56,709 WARNING [model.py:422] model.chat_template was called with the chat_template set to False or None. Therefore no chat template will be applied. Make sure this is an intended behavior.
2024-10-18:12:25:56,710 INFO [task.py:415] Building contexts for hellaswag on rank 0...
100%|ββββββββββ| 10042/10042 [00:05<00:00, 1695.72it/s]
2024-10-18:12:26:04,007 INFO [evaluator.py:489] Running loglikelihood requests
Running loglikelihood requests: 100%|ββββββββββ| 40168/40168 [03:53<00:00, 171.85it/s]
fatal: not a git repository (or any of the parent directories): .git
2024-10-18:12:30:36,510 INFO [evaluation_tracker.py:206] Saving results aggregated
2024-10-18:12:30:36,524 INFO [evaluation_tracker.py:287] Saving per-sample results for: hellaswag
```
|