qing-yao
/

babylm-default_seed-42_1e-3

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

babylm-default_seed-42_1e-3

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 3.0125
Accuracy: 0.4208

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 64
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 256
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 32000
num_epochs: 20.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
6.173	0.9997	1788	4.2526	0.3061
4.0459	1.9999	3577	3.7287	0.3474
3.6173	2.9995	5365	3.4746	0.3704
3.395	3.9997	7154	3.3376	0.3836
3.3065	4.9999	8943	3.2582	0.3911
3.202	5.9996	10731	3.2096	0.3957
3.1358	6.9998	12520	3.1768	0.3991
3.0931	8.0	14309	3.1536	0.4014
3.0605	8.9997	16097	3.1361	0.4035
3.0176	9.9999	17886	3.1262	0.4047
2.9953	10.9995	19674	3.1186	0.4056
2.987	11.9997	21463	3.1099	0.4066
2.9794	12.9999	23252	3.1034	0.4076
2.9745	13.9996	25040	3.0990	0.4079
2.9327	14.9998	26829	3.0990	0.4078
2.9374	16.0	28618	3.0970	0.4082
2.9411	16.9997	30406	3.0878	0.4091
2.9448	17.9999	32195	3.0865	0.4099
2.89	18.9995	33983	3.0356	0.4161
2.7351	19.9930	35760	3.0125	0.4208

Framework versions

Transformers 4.45.1
Pytorch 2.5.1+cu124
Datasets 2.19.1
Tokenizers 0.20.0

Downloads last month: 143

Safetensors

Model size

110M params

Tensor type

F32

·

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for qing-yao/babylm-default_seed-42_1e-3

Quantizations

1 model

Evaluation results

Metadata error: specify a dataset to view leaderboard