BEE-spoke-data/verysmol_llama-v8-minipile_x2
This is still a work-in-progress and should be treated as such.
Model description
This is an autogressive smol language model. It generates text.
It achieves the following results on the evaluation set:
- Loss: 2.7521
- Accuracy: 0.4686
Intended uses & limitations
Doing things. Limitations are that it is smol.
Additionally, <insert generic, emotionless, and corporate statement about bias in language models here>.
Data
Most recent training run was on JeanKaddour/minipile
for 2 epochs. Otherwise, please refer to the below quote:
UnFoRtUnAtElY We'rE UnAbLe tO ShArE DeTaIlS AbOuT ThE TrAiNiNg aNd tHe dAtAsEtS (eXtRaCtEd fRoM ThE OpEn wEb) DuE To tHe hIgHlY CoMpEtItIvE NaTuRe oF ThE FiElD.
evals
eval metrics | |
---|---|
epoch | 2.0 |
eval_accuracy | 0.4685 |
eval_loss | 2.7521 |
eval_runtime | 0:00:03.89 |
eval_samples | 300 |
eval_samples_per_second | 77.049 |
eval_steps_per_second | 9.759 |
perplexity | 15.675 |
harness
some improvements and some degradations over prev versions. May indicate the last dataset in curricula matters/needs to be chosen specifically
hf-causal-experimental (pretrained=BEE-spoke-data/verysmol_llama-v8-minipile_x2,revision=main,trust_remote_code=True,dtype='float'), limit: None, provide_description: False, num_fewshot: 0, batch_size: 16
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
arc_easy | 0 | acc | 0.3662 | ± | 0.0099 |
acc_norm | 0.3460 | ± | 0.0098 | ||
boolq | 1 | acc | 0.6052 | ± | 0.0085 |
lambada_openai | 0 | ppl | 156.8153 | ± | 6.5985 |
acc | 0.2010 | ± | 0.0056 | ||
openbookqa | 0 | acc | 0.1280 | ± | 0.0150 |
acc_norm | 0.2660 | ± | 0.0198 | ||
piqa | 0 | acc | 0.5865 | ± | 0.0115 |
acc_norm | 0.5805 | ± | 0.0115 | ||
winogrande | 0 | acc | 0.5217 | ± | 0.0140 |
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
arc_challenge | 0 | acc | 0.1877 | ± | 0.0114 |
acc_norm | 0.2235 | ± | 0.0122 |
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
hellaswag | 0 | acc | 0.2622 | ± | 0.0088 |
acc_norm | 0.2777 | ± | 0.0089 |
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
truthfulqa_mc | 1 | mc1 | 0.2705 | ± | 0.0156 |
mc2 | 0.4729 | ± | 0.0155 |
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.00015
- train_batch_size: 8
- eval_batch_size: 8
- seed: 5404
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-07
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 2.0
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
2.7625 | 0.02 | 200 | 2.8982 | 0.4457 |
2.7377 | 0.03 | 400 | 2.8812 | 0.4477 |
2.6883 | 0.05 | 600 | 2.8774 | 0.4489 |
2.7654 | 0.06 | 800 | 2.8811 | 0.4479 |
2.744 | 0.08 | 1000 | 2.8838 | 0.4464 |
2.6922 | 0.09 | 1200 | 2.8921 | 0.4461 |
2.7416 | 0.11 | 1400 | 2.8930 | 0.4464 |
2.7337 | 0.12 | 1600 | 2.8972 | 0.4465 |
2.7046 | 0.14 | 1800 | 2.8933 | 0.4472 |
2.673 | 0.15 | 2000 | 2.8926 | 0.4483 |
...
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
2.5155 | 1.88 | 24800 | 2.7524 | 0.4685 |
2.5092 | 1.89 | 25000 | 2.7522 | 0.4686 |
2.5093 | 1.91 | 25200 | 2.7523 | 0.4685 |
2.4574 | 1.92 | 25400 | 2.7521 | 0.4686 |
2.5137 | 1.94 | 25600 | 2.7522 | 0.4686 |
2.4598 | 1.95 | 25800 | 2.7521 | 0.4686 |
2.515 | 1.97 | 26000 | 2.7521 | 0.4685 |
2.5429 | 1.98 | 26200 | 2.7521 | 0.4686 |
2.4789 | 2.0 | 26400 | 2.7521 | 0.4686 |
- Downloads last month
- 0