--- language: - en license: apache-2.0 tags: - pytorch - causal-lm - pythia datasets: - hellaswag metrics: - accuracy --- # Model Card for EleutherAI/pythia-160m HellaSwag Evaluation This model card presents the evaluation results of the EleutherAI/pythia-160m model on the HellaSwag task. ## Model Details ### Model Description - **Developed by:** EleutherAI - **Model type:** Causal Language Model - **Language(s):** English - **License:** Apache 2.0 - **Finetuned from model:** EleutherAI/pythia-160m ### Model Sources - **Repository:** [EleutherAI/pythia-160m](https://huggingface.co/EleutherAI/pythia-160m) - **Paper:** [More Information Needed] ## Uses ### Direct Use This evaluation demonstrates the model's performance on the HellaSwag task, which tests for commonsense reasoning in AI systems. ### Out-of-Scope Use This evaluation is specific to the HellaSwag task and may not be indicative of the model's performance on other tasks or in real-world applications. ## Bias, Risks, and Limitations The evaluation results should be interpreted within the context of the HellaSwag task. The model may exhibit biases present in its training data or the evaluation dataset. ### Recommendations Users should be aware of the model's limitations and consider additional evaluation on task-specific datasets before deployment in real-world applications. ## How to Get Started with the Model To use this model for the HellaSwag task: ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("EleutherAI/pythia-160m", revision="step100000") tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-160m", revision="step100000") # Use the model for the HellaSwag task ``` ## Training Details ### Training Data The model was evaluated on the HellaSwag dataset. For more information, visit [the HellaSwag dataset page](https://huggingface.co/datasets/hellaswag). ### Training Procedure #### Training Hyperparameters - **Training regime:** float32 ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data The model was evaluated on the HellaSwag dataset, which consists of 10,042 samples. #### Metrics - **Accuracy (acc):** Measures the proportion of correctly predicted answers. - **Normalized Accuracy (acc_norm):** A variant of accuracy that accounts for potential biases in the dataset. ### Results | Metric | Value | Standard Error | |--------|-------|----------------| | Accuracy | 0.28719 | 0.00452 | | Normalized Accuracy | 0.30821 | 0.00461 | ## Environmental Impact - **Hardware Type:** Tesla T4 GPU - **Hours used:** Approximately 0.095 hours (341.39 seconds) - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications ### Model Architecture and Objective EleutherAI/pythia-160m is a causal language model with approximately 162 million parameters. ### Compute Infrastructure - **Hardware:** Tesla T4 GPU - **Software:** PyTorch 2.4.1+cu121, Transformers 4.44.2 - **Date of Evaluation:** October 18, 2024 ### Command ``` lm_eval --model hf \ --model_args pretrained=EleutherAI/pythia-160m,revision=step100000,dtype="float" \ --tasks hellaswag \ --device cuda \ --batch_size auto:4 \ --output_path hellaswag_test \ --log_samples ``` #### Command output ``` Passed argument batch_size = auto:4.0. Detecting largest batch size Determined largest batch size: 64 Passed argument batch_size = auto:4.0. Detecting largest batch size Determined largest batch size: 64 hf (pretrained=EleutherAI/pythia-160m,revision=step100000,dtype=float), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto:4 (64,64,64,64,64) | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |---------|------:|------|-----:|--------|---|-----:|---|-----:| |hellaswag| 1|none | 0|acc |↑ |0.2872|± |0.0045| | | |none | 0|acc_norm|↑ |0.3082|± |0.0046| 2024-10-18 12:25:25.770584: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-10-18 12:25:25.847675: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-10-18 12:25:25.887843: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-10-18 12:25:25.961158: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-10-18 12:25:27.647707: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-10-18:12:25:29,450 INFO [__main__.py:279] Verbosity set to INFO 2024-10-18:12:25:42,060 INFO [__main__.py:376] Selected Tasks: ['hellaswag'] 2024-10-18:12:25:42,062 INFO [evaluator.py:164] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 2024-10-18:12:25:42,062 INFO [evaluator.py:201] Initializing hf model, with arguments: {'pretrained': 'EleutherAI/pythia-160m', 'revision': 'step100000', 'dtype': 'float'} 2024-10-18:12:25:42,128 INFO [huggingface.py:129] Using device 'cuda' 2024-10-18:12:25:42,395 INFO [huggingface.py:481] Using model type 'default' /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884 warnings.warn( 2024-10-18:12:25:42,769 INFO [huggingface.py:365] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda'} 2024-10-18:12:25:56,709 WARNING [model.py:422] model.chat_template was called with the chat_template set to False or None. Therefore no chat template will be applied. Make sure this is an intended behavior. 2024-10-18:12:25:56,710 INFO [task.py:415] Building contexts for hellaswag on rank 0... 100%|██████████| 10042/10042 [00:05<00:00, 1695.72it/s] 2024-10-18:12:26:04,007 INFO [evaluator.py:489] Running loglikelihood requests Running loglikelihood requests: 100%|██████████| 40168/40168 [03:53<00:00, 171.85it/s] fatal: not a git repository (or any of the parent directories): .git 2024-10-18:12:30:36,510 INFO [evaluation_tracker.py:206] Saving results aggregated 2024-10-18:12:30:36,524 INFO [evaluation_tracker.py:287] Saving per-sample results for: hellaswag ```