BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v13-KI

note that training still WIP

This model is a fine-tuned version of BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v12-minipile on the None dataset. It achieves the following results on the evaluation set:

Loss: 2.5937
Accuracy: 0.4948

Training and evaluation data

KI dataset

hf-causal-experimental (pretrained=BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v13-KI,revision=main,trust_remote_code=True,dtype='float'), limit: None, provide_description: False, num_fewshot: 0, batch_size: 8

Task	Version	Metric	Value		Stderr
arc_easy	0	acc	0.4322	±	0.0102
		acc_norm	0.3960	±	0.0100
boolq	1	acc	0.6196	±	0.0085
lambada_openai	0	ppl	61.6595	±	2.4362
		acc	0.2779	±	0.0062
openbookqa	0	acc	0.1540	±	0.0162
		acc_norm	0.2840	±	0.0202
piqa	0	acc	0.6028	±	0.0114
		acc_norm	0.6028	±	0.0114
winogrande	0	acc	0.5193	±	0.0140

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.00025
train_batch_size: 8
eval_batch_size: 4
seed: 2280
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-08
lr_scheduler_type: inverse_sqrt
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
2.5744	0.08	200	2.7138	0.4776
2.5387	0.16	400	2.6713	0.4836
2.4718	0.23	600	2.6462	0.4873
2.4681	0.31	800	2.6328	0.4892
2.5351	0.39	1000	2.6227	0.4908
2.5316	0.47	1200	2.6159	0.4914
2.527	0.54	1400	2.6103	0.4921
2.4838	0.62	1600	2.6058	0.4930
2.4483	0.7	1800	2.6024	0.4934
2.426	0.78	2000	2.5998	0.4937
2.4685	0.86	2200	2.5961	0.4944
2.4473	0.93	2400	2.5937	0.4948

Framework versions

Transformers 4.36.0.dev0
Pytorch 2.1.0
Datasets 2.15.0
Tokenizers 0.15.0

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	29.23
AI2 Reasoning Challenge (25-Shot)	23.81
HellaSwag (10-Shot)	29.39
MMLU (5-Shot)	25.37
TruthfulQA (0-shot)	44.77
Winogrande (5-shot)	51.14
GSM8k (5-shot)	0.91

BEE-spoke-data
/

NanoLlama-GQA-L10-A32_KV8-v13-KI