Update README.md
Browse files
README.md
CHANGED
@@ -1,13 +1,250 @@
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
---
|
5 |
|
6 |
-
# Model Card for Model ID
|
7 |
-
|
8 |
-
<!-- Provide a quick summary of what the model is/does. -->
|
9 |
-
|
10 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
## Model Details
|
13 |
|
@@ -196,6 +433,4 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
|
|
196 |
|
197 |
## Model Card Contact
|
198 |
|
199 |
-
[More Information Needed]
|
200 |
-
|
201 |
-
|
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
inference:
|
6 |
+
parameters:
|
7 |
+
max_new_tokens: 64
|
8 |
+
do_sample: true
|
9 |
+
temperature: 0.8
|
10 |
+
repetition_penalty: 1.15
|
11 |
+
no_repeat_ngram_size: 4
|
12 |
+
eta_cutoff: 0.0006
|
13 |
+
renormalize_logits: true
|
14 |
+
widget:
|
15 |
+
- text: My name is El Microondas the Wise, and
|
16 |
+
example_title: El Microondas
|
17 |
+
- text: Kennesaw State University is a public
|
18 |
+
example_title: Kennesaw State University
|
19 |
+
- text: >-
|
20 |
+
Bungie Studios is an American video game developer. They are most famous for
|
21 |
+
developing the award winning Halo series of video games. They also made
|
22 |
+
Destiny. The studio was founded
|
23 |
+
example_title: Bungie
|
24 |
+
- text: The Mona Lisa is a world-renowned painting created by
|
25 |
+
example_title: Mona Lisa
|
26 |
+
- text: >-
|
27 |
+
The Harry Potter series, written by J.K. Rowling, begins with the book
|
28 |
+
titled
|
29 |
+
example_title: Harry Potter Series
|
30 |
+
- text: >-
|
31 |
+
Question: I have cities, but no houses. I have mountains, but no trees. I
|
32 |
+
have water, but no fish. What am I?
|
33 |
+
|
34 |
+
Answer:
|
35 |
+
example_title: Riddle
|
36 |
+
- text: The process of photosynthesis involves the conversion of
|
37 |
+
example_title: Photosynthesis
|
38 |
+
- text: >-
|
39 |
+
Jane went to the store to buy some groceries. She picked up apples, oranges,
|
40 |
+
and a loaf of bread. When she got home, she realized she forgot
|
41 |
+
example_title: Story Continuation
|
42 |
+
- text: >-
|
43 |
+
Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph, and
|
44 |
+
another train leaves Station B at 10:00 AM and travels at 80 mph, when will
|
45 |
+
they meet if the distance between the stations is 300 miles?
|
46 |
+
|
47 |
+
To determine
|
48 |
+
example_title: Math Problem
|
49 |
+
- text: In the context of computer programming, an algorithm is
|
50 |
+
example_title: Algorithm Definition
|
51 |
+
pipeline_tag: text-generation
|
52 |
+
datasets:
|
53 |
+
- JeanKaddour/minipile
|
54 |
+
- pszemraj/simple_wikipedia_LM
|
55 |
+
- mattymchen/refinedweb-3m
|
56 |
+
- Locutusque/TM-DATA
|
57 |
+
- Skylion007/openwebtext
|
58 |
---
|
59 |
|
|
|
|
|
|
|
|
|
60 |
|
61 |
+
# Model Card for nano-phi-115M-control-v0.1
|
62 |
+
|
63 |
+
Inspired by [Phi2](https://huggingface.co/microsoft/phi-2), and open source small language model attempts like [smol_llama-101M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA).
|
64 |
+
Pre-trained with training 7B token from scratch, with a dataset of 0.6B token.
|
65 |
+
This model acts as a control of [kenhktsui/nano-phi-115M-v0.1](https://huggingface.co/kenhktsui/nano-phi-115M-v0.1) which applies quality filter to dataset resulting in small dataset.
|
66 |
+
It just took 2d 4h to train in Colab with a A100 40GB (~USD$ 100).
|
67 |
+
It achieves quite competitive results in evaluation given its training token, and training data size.
|
68 |
+
No alignment has been done yet.
|
69 |
+
|
70 |
+
## Some metrics
|
71 |
+
- model
|
72 |
+
- hidden_size: 768
|
73 |
+
- num_key_value_heads: 8 (grouped query attention)
|
74 |
+
- num_attention_heads: 24
|
75 |
+
- num_hidden_layers: 6
|
76 |
+
- context length: 1024
|
77 |
+
- total params: 115M
|
78 |
+
- training:
|
79 |
+
- global steps: 14,000
|
80 |
+
|
81 |
+
|
82 |
+
|
83 |
+
## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
84 |
+
|
85 |
+
| Metric | Value |
|
86 |
+
|-----------------------|---------------------------|
|
87 |
+
| Avg. | 28.75 |
|
88 |
+
| ARC (25-shot) | 21.67 |
|
89 |
+
| HellaSwag (10-shot) | 26.89 |
|
90 |
+
| MMLU (5-shot) | 24.76 |
|
91 |
+
| TruthfulQA (0-shot) | 47.69 |
|
92 |
+
| Winogrande (5-shot) | 51.46 |
|
93 |
+
| GSM8K (5-shot) | 0.0 |
|
94 |
+
|
95 |
+
Details:
|
96 |
+
|
97 |
+
hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-ehgq969i:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 16
|
98 |
+
| Task |Version| Metric |Value | |Stderr|
|
99 |
+
|--------|------:|--------|-----:|---|-----:|
|
100 |
+
|arc_easy| 0|acc |0.3973|± |0.0100|
|
101 |
+
| | |acc_norm|0.3531|± |0.0098|
|
102 |
+
|
103 |
+
hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-ehgq969i:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 25, batch_size: 16
|
104 |
+
| Task |Version| Metric |Value | |Stderr|
|
105 |
+
|-------------|------:|--------|-----:|---|-----:|
|
106 |
+
|arc_challenge| 0|acc |0.1843|± |0.0113|
|
107 |
+
| | |acc_norm|0.2167|± |0.0120|
|
108 |
+
|
109 |
+
hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-ehgq969i:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 10, batch_size: 16
|
110 |
+
| Task |Version| Metric |Value | |Stderr|
|
111 |
+
|---------|------:|--------|-----:|---|-----:|
|
112 |
+
|hellaswag| 0|acc |0.2682|± |0.0044|
|
113 |
+
| | |acc_norm|0.2689|± |0.0044|
|
114 |
+
|
115 |
+
hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-ehgq969i:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 16
|
116 |
+
| Task |Version|Metric|Value | |Stderr|
|
117 |
+
|-------------|------:|------|-----:|---|-----:|
|
118 |
+
|truthfulqa_mc| 1|mc1 |0.2619|± |0.0154|
|
119 |
+
| | |mc2 |0.4769|± |0.0156|
|
120 |
+
|
121 |
+
hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-ehgq969i:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 16
|
122 |
+
| Task |Version| Metric |Value | |Stderr|
|
123 |
+
|-------------------------------------------------|------:|--------|-----:|---|-----:|
|
124 |
+
|hendrycksTest-abstract_algebra | 1|acc |0.2200|± |0.0416|
|
125 |
+
| | |acc_norm|0.2200|± |0.0416|
|
126 |
+
|hendrycksTest-anatomy | 1|acc |0.3333|± |0.0407|
|
127 |
+
| | |acc_norm|0.3333|± |0.0407|
|
128 |
+
|hendrycksTest-astronomy | 1|acc |0.2895|± |0.0369|
|
129 |
+
| | |acc_norm|0.2895|± |0.0369|
|
130 |
+
|hendrycksTest-business_ethics | 1|acc |0.2000|± |0.0402|
|
131 |
+
| | |acc_norm|0.2000|± |0.0402|
|
132 |
+
|hendrycksTest-clinical_knowledge | 1|acc |0.2189|± |0.0254|
|
133 |
+
| | |acc_norm|0.2189|± |0.0254|
|
134 |
+
|hendrycksTest-college_biology | 1|acc |0.2222|± |0.0348|
|
135 |
+
| | |acc_norm|0.2222|± |0.0348|
|
136 |
+
|hendrycksTest-college_chemistry | 1|acc |0.1700|± |0.0378|
|
137 |
+
| | |acc_norm|0.1700|± |0.0378|
|
138 |
+
|hendrycksTest-college_computer_science | 1|acc |0.3000|± |0.0461|
|
139 |
+
| | |acc_norm|0.3000|± |0.0461|
|
140 |
+
|hendrycksTest-college_mathematics | 1|acc |0.2500|± |0.0435|
|
141 |
+
| | |acc_norm|0.2500|± |0.0435|
|
142 |
+
|hendrycksTest-college_medicine | 1|acc |0.1965|± |0.0303|
|
143 |
+
| | |acc_norm|0.1965|± |0.0303|
|
144 |
+
|hendrycksTest-college_physics | 1|acc |0.2353|± |0.0422|
|
145 |
+
| | |acc_norm|0.2353|± |0.0422|
|
146 |
+
|hendrycksTest-computer_security | 1|acc |0.2000|± |0.0402|
|
147 |
+
| | |acc_norm|0.2000|± |0.0402|
|
148 |
+
|hendrycksTest-conceptual_physics | 1|acc |0.2043|± |0.0264|
|
149 |
+
| | |acc_norm|0.2043|± |0.0264|
|
150 |
+
|hendrycksTest-econometrics | 1|acc |0.2456|± |0.0405|
|
151 |
+
| | |acc_norm|0.2456|± |0.0405|
|
152 |
+
|hendrycksTest-electrical_engineering | 1|acc |0.2621|± |0.0366|
|
153 |
+
| | |acc_norm|0.2621|± |0.0366|
|
154 |
+
|hendrycksTest-elementary_mathematics | 1|acc |0.2566|± |0.0225|
|
155 |
+
| | |acc_norm|0.2566|± |0.0225|
|
156 |
+
|hendrycksTest-formal_logic | 1|acc |0.1587|± |0.0327|
|
157 |
+
| | |acc_norm|0.1587|± |0.0327|
|
158 |
+
|hendrycksTest-global_facts | 1|acc |0.1600|± |0.0368|
|
159 |
+
| | |acc_norm|0.1600|± |0.0368|
|
160 |
+
|hendrycksTest-high_school_biology | 1|acc |0.3226|± |0.0266|
|
161 |
+
| | |acc_norm|0.3226|± |0.0266|
|
162 |
+
|hendrycksTest-high_school_chemistry | 1|acc |0.2956|± |0.0321|
|
163 |
+
| | |acc_norm|0.2956|± |0.0321|
|
164 |
+
|hendrycksTest-high_school_computer_science | 1|acc |0.2800|± |0.0451|
|
165 |
+
| | |acc_norm|0.2800|± |0.0451|
|
166 |
+
|hendrycksTest-high_school_european_history | 1|acc |0.2606|± |0.0343|
|
167 |
+
| | |acc_norm|0.2606|± |0.0343|
|
168 |
+
|hendrycksTest-high_school_geography | 1|acc |0.2626|± |0.0314|
|
169 |
+
| | |acc_norm|0.2626|± |0.0314|
|
170 |
+
|hendrycksTest-high_school_government_and_politics| 1|acc |0.2176|± |0.0298|
|
171 |
+
| | |acc_norm|0.2176|± |0.0298|
|
172 |
+
|hendrycksTest-high_school_macroeconomics | 1|acc |0.2128|± |0.0208|
|
173 |
+
| | |acc_norm|0.2128|± |0.0208|
|
174 |
+
|hendrycksTest-high_school_mathematics | 1|acc |0.2630|± |0.0268|
|
175 |
+
| | |acc_norm|0.2630|± |0.0268|
|
176 |
+
|hendrycksTest-high_school_microeconomics | 1|acc |0.2227|± |0.0270|
|
177 |
+
| | |acc_norm|0.2227|± |0.0270|
|
178 |
+
|hendrycksTest-high_school_physics | 1|acc |0.3046|± |0.0376|
|
179 |
+
| | |acc_norm|0.3046|± |0.0376|
|
180 |
+
|hendrycksTest-high_school_psychology | 1|acc |0.2055|± |0.0173|
|
181 |
+
| | |acc_norm|0.2055|± |0.0173|
|
182 |
+
|hendrycksTest-high_school_statistics | 1|acc |0.4815|± |0.0341|
|
183 |
+
| | |acc_norm|0.4815|± |0.0341|
|
184 |
+
|hendrycksTest-high_school_us_history | 1|acc |0.2059|± |0.0284|
|
185 |
+
| | |acc_norm|0.2059|± |0.0284|
|
186 |
+
|hendrycksTest-high_school_world_history | 1|acc |0.2574|± |0.0285|
|
187 |
+
| | |acc_norm|0.2574|± |0.0285|
|
188 |
+
|hendrycksTest-human_aging | 1|acc |0.2063|± |0.0272|
|
189 |
+
| | |acc_norm|0.2063|± |0.0272|
|
190 |
+
|hendrycksTest-human_sexuality | 1|acc |0.2443|± |0.0377|
|
191 |
+
| | |acc_norm|0.2443|± |0.0377|
|
192 |
+
|hendrycksTest-international_law | 1|acc |0.2727|± |0.0407|
|
193 |
+
| | |acc_norm|0.2727|± |0.0407|
|
194 |
+
|hendrycksTest-jurisprudence | 1|acc |0.2130|± |0.0396|
|
195 |
+
| | |acc_norm|0.2130|± |0.0396|
|
196 |
+
|hendrycksTest-logical_fallacies | 1|acc |0.2515|± |0.0341|
|
197 |
+
| | |acc_norm|0.2515|± |0.0341|
|
198 |
+
|hendrycksTest-machine_learning | 1|acc |0.2321|± |0.0401|
|
199 |
+
| | |acc_norm|0.2321|± |0.0401|
|
200 |
+
|hendrycksTest-management | 1|acc |0.2039|± |0.0399|
|
201 |
+
| | |acc_norm|0.2039|± |0.0399|
|
202 |
+
|hendrycksTest-marketing | 1|acc |0.1966|± |0.0260|
|
203 |
+
| | |acc_norm|0.1966|± |0.0260|
|
204 |
+
|hendrycksTest-medical_genetics | 1|acc |0.3000|± |0.0461|
|
205 |
+
| | |acc_norm|0.3000|± |0.0461|
|
206 |
+
|hendrycksTest-miscellaneous | 1|acc |0.2631|± |0.0157|
|
207 |
+
| | |acc_norm|0.2631|± |0.0157|
|
208 |
+
|hendrycksTest-moral_disputes | 1|acc |0.2457|± |0.0232|
|
209 |
+
| | |acc_norm|0.2457|± |0.0232|
|
210 |
+
|hendrycksTest-moral_scenarios | 1|acc |0.2682|± |0.0148|
|
211 |
+
| | |acc_norm|0.2682|± |0.0148|
|
212 |
+
|hendrycksTest-nutrition | 1|acc |0.2451|± |0.0246|
|
213 |
+
| | |acc_norm|0.2451|± |0.0246|
|
214 |
+
|hendrycksTest-philosophy | 1|acc |0.2605|± |0.0249|
|
215 |
+
| | |acc_norm|0.2605|± |0.0249|
|
216 |
+
|hendrycksTest-prehistory | 1|acc |0.2932|± |0.0253|
|
217 |
+
| | |acc_norm|0.2932|± |0.0253|
|
218 |
+
|hendrycksTest-professional_accounting | 1|acc |0.2340|± |0.0253|
|
219 |
+
| | |acc_norm|0.2340|± |0.0253|
|
220 |
+
|hendrycksTest-professional_law | 1|acc |0.2432|± |0.0110|
|
221 |
+
| | |acc_norm|0.2432|± |0.0110|
|
222 |
+
|hendrycksTest-professional_medicine | 1|acc |0.4301|± |0.0301|
|
223 |
+
| | |acc_norm|0.4301|± |0.0301|
|
224 |
+
|hendrycksTest-professional_psychology | 1|acc |0.2369|± |0.0172|
|
225 |
+
| | |acc_norm|0.2369|± |0.0172|
|
226 |
+
|hendrycksTest-public_relations | 1|acc |0.2091|± |0.0390|
|
227 |
+
| | |acc_norm|0.2091|± |0.0390|
|
228 |
+
|hendrycksTest-security_studies | 1|acc |0.2408|± |0.0274|
|
229 |
+
| | |acc_norm|0.2408|± |0.0274|
|
230 |
+
|hendrycksTest-sociology | 1|acc |0.2388|± |0.0301|
|
231 |
+
| | |acc_norm|0.2388|± |0.0301|
|
232 |
+
|hendrycksTest-us_foreign_policy | 1|acc |0.2600|± |0.0441|
|
233 |
+
| | |acc_norm|0.2600|± |0.0441|
|
234 |
+
|hendrycksTest-virology | 1|acc |0.2048|± |0.0314|
|
235 |
+
| | |acc_norm|0.2048|± |0.0314|
|
236 |
+
|hendrycksTest-world_religions | 1|acc |0.2047|± |0.0309|
|
237 |
+
| | |acc_norm|0.2047|± |0.0309|
|
238 |
+
|
239 |
+
hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-ehgq969i:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 16
|
240 |
+
| Task |Version|Metric|Value | |Stderr|
|
241 |
+
|----------|------:|------|-----:|---|-----:|
|
242 |
+
|winogrande| 0|acc |0.5146|± | 0.014|
|
243 |
+
|
244 |
+
hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-ehgq969i:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 16
|
245 |
+
|Task |Version|Metric|Value| |Stderr|
|
246 |
+
|-----|------:|------|----:|---|-----:|
|
247 |
+
|gsm8k| 0|acc | 0|± | 0|
|
248 |
|
249 |
## Model Details
|
250 |
|
|
|
433 |
|
434 |
## Model Card Contact
|
435 |
|
436 |
+
[More Information Needed]
|
|
|
|