File size: 7,152 Bytes
51ee2ef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8d8c0a9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
---
language:
- en
license: apache-2.0
tags:
- pytorch
- causal-lm
- pythia
datasets:
- hellaswag
metrics:
- accuracy
---

# Model Card for EleutherAI/pythia-160m HellaSwag Evaluation

This model card presents the evaluation results of the EleutherAI/pythia-160m model on the HellaSwag task.

## Model Details

### Model Description

- **Developed by:** EleutherAI
- **Model type:** Causal Language Model
- **Language(s):** English
- **License:** Apache 2.0
- **Finetuned from model:** EleutherAI/pythia-160m

### Model Sources

- **Repository:** [EleutherAI/pythia-160m](https://huggingface.co/EleutherAI/pythia-160m)
- **Paper:** [More Information Needed]

## Uses

### Direct Use

This evaluation demonstrates the model's performance on the HellaSwag task, which tests for commonsense reasoning in AI systems.

### Out-of-Scope Use

This evaluation is specific to the HellaSwag task and may not be indicative of the model's performance on other tasks or in real-world applications.

## Bias, Risks, and Limitations

The evaluation results should be interpreted within the context of the HellaSwag task. The model may exhibit biases present in its training data or the evaluation dataset.

### Recommendations

Users should be aware of the model's limitations and consider additional evaluation on task-specific datasets before deployment in real-world applications.

## How to Get Started with the Model

To use this model for the HellaSwag task:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("EleutherAI/pythia-160m", revision="step100000")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-160m", revision="step100000")

# Use the model for the HellaSwag task
```

## Training Details

### Training Data

The model was evaluated on the HellaSwag dataset. For more information, visit [the HellaSwag dataset page](https://huggingface.co/datasets/hellaswag).

### Training Procedure

#### Training Hyperparameters

- **Training regime:** float32

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

The model was evaluated on the HellaSwag dataset, which consists of 10,042 samples.

#### Metrics

- **Accuracy (acc):** Measures the proportion of correctly predicted answers.
- **Normalized Accuracy (acc_norm):** A variant of accuracy that accounts for potential biases in the dataset.

### Results

| Metric | Value | Standard Error |
|--------|-------|----------------|
| Accuracy | 0.28719 | 0.00452 |
| Normalized Accuracy | 0.30821 | 0.00461 |

## Environmental Impact

- **Hardware Type:** Tesla T4 GPU
- **Hours used:** Approximately 0.095 hours (341.39 seconds)
- **Cloud Provider:** [More Information Needed]
- **Compute Region:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]

## Technical Specifications

### Model Architecture and Objective

EleutherAI/pythia-160m is a causal language model with approximately 162 million parameters.

### Compute Infrastructure

- **Hardware:** Tesla T4 GPU
- **Software:** PyTorch 2.4.1+cu121, Transformers 4.44.2
- **Date of Evaluation:** October 18, 2024

### Command

```
lm_eval --model hf \
    --model_args pretrained=EleutherAI/pythia-160m,revision=step100000,dtype="float" \
    --tasks hellaswag \
    --device cuda \
    --batch_size auto:4 \
    --output_path hellaswag_test \
    --log_samples
```

#### Command output
```
Passed argument batch_size = auto:4.0. Detecting largest batch size
Determined largest batch size: 64
Passed argument batch_size = auto:4.0. Detecting largest batch size
Determined largest batch size: 64
hf (pretrained=EleutherAI/pythia-160m,revision=step100000,dtype=float), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto:4 (64,64,64,64,64)
|  Tasks  |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag|      1|none  |     0|acc     |↑  |0.2872|Β±  |0.0045|
|         |       |none  |     0|acc_norm|↑  |0.3082|Β±  |0.0046|

2024-10-18 12:25:25.770584: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-18 12:25:25.847675: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-18 12:25:25.887843: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-18 12:25:25.961158: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-10-18 12:25:27.647707: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-10-18:12:25:29,450 INFO     [__main__.py:279] Verbosity set to INFO
2024-10-18:12:25:42,060 INFO     [__main__.py:376] Selected Tasks: ['hellaswag']
2024-10-18:12:25:42,062 INFO     [evaluator.py:164] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
2024-10-18:12:25:42,062 INFO     [evaluator.py:201] Initializing hf model, with arguments: {'pretrained': 'EleutherAI/pythia-160m', 'revision': 'step100000', 'dtype': 'float'}
2024-10-18:12:25:42,128 INFO     [huggingface.py:129] Using device 'cuda'
2024-10-18:12:25:42,395 INFO     [huggingface.py:481] Using model type 'default'
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
2024-10-18:12:25:42,769 INFO     [huggingface.py:365] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda'}
2024-10-18:12:25:56,709 WARNING  [model.py:422] model.chat_template was called with the chat_template set to False or None. Therefore no chat template will be applied. Make sure this is an intended behavior.
2024-10-18:12:25:56,710 INFO     [task.py:415] Building contexts for hellaswag on rank 0...
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 10042/10042 [00:05<00:00, 1695.72it/s]
2024-10-18:12:26:04,007 INFO     [evaluator.py:489] Running loglikelihood requests
Running loglikelihood requests: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 40168/40168 [03:53<00:00, 171.85it/s]
fatal: not a git repository (or any of the parent directories): .git
2024-10-18:12:30:36,510 INFO     [evaluation_tracker.py:206] Saving results aggregated
2024-10-18:12:30:36,524 INFO     [evaluation_tracker.py:287] Saving per-sample results for: hellaswag
```