bowphs
/

c4-model

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

c4-model / README.md

bowphs's picture

End of training

989830e verified about 2 months ago

|

history blame contribute delete

3.58 kB

	---
	library_name: transformers
	base_model: bowphs/pythia-70m-multi
	tags:
	- generated_from_trainer
	datasets:
	- allenai/c4
	metrics:
	- accuracy
	model-index:
	- name: c4-model
	results:
	- task:
	name: Causal Language Modeling
	type: text-generation
	dataset:
	name: allenai/c4 en
	type: allenai/c4
	args: en
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.3716248289345064
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# c4-model

	This model is a fine-tuned version of [bowphs/pythia-70m-multi](https://huggingface.co/bowphs/pythia-70m-multi) on the allenai/c4 en dataset.
	It achieves the following results on the evaluation set:
	- Loss: 3.5532
	- Accuracy: 0.3716

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 42
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- training_steps: 30000

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:------:\|:-----:\|:---------------:\|:--------:\|
	\| No log \| 0.0000 \| 1 \| 10.7029 \| 0.0164 \|
	\| No log \| 0.0001 \| 2 \| 10.5331 \| 0.0496 \|
	\| No log \| 0.0001 \| 4 \| 10.3022 \| 0.0533 \|
	\| No log \| 0.0003 \| 8 \| 10.0235 \| 0.0536 \|
	\| No log \| 0.0005 \| 16 \| 9.6536 \| 0.0635 \|
	\| No log \| 0.0011 \| 32 \| 9.0284 \| 0.0759 \|
	\| No log \| 0.0021 \| 64 \| 8.0249 \| 0.0832 \|
	\| No log \| 0.0043 \| 128 \| 6.9172 \| 0.1129 \|
	\| No log \| 0.0085 \| 256 \| 6.1629 \| 0.1558 \|
	\| No log \| 0.0171 \| 512 \| 5.5805 \| 0.1817 \|
	\| No log \| 0.0341 \| 1024 \| 5.1235 \| 0.2028 \|
	\| 5.4529 \| 0.0667 \| 2000 \| 4.7613 \| 0.2264 \|
	\| 5.4529 \| 0.0683 \| 2048 \| 4.7481 \| 0.2281 \|
	\| 4.5765 \| 0.1333 \| 4000 \| 4.4123 \| 0.2610 \|
	\| 4.5765 \| 0.1365 \| 4096 \| 4.4043 \| 0.2625 \|
	\| 4.3252 \| 0.2 \| 6000 \| 4.2221 \| 0.2827 \|
	\| 4.146 \| 0.2667 \| 8000 \| 4.0350 \| 0.3098 \|
	\| 4.146 \| 0.2731 \| 8192 \| 4.0134 \| 0.3129 \|
	\| 3.9652 \| 0.3333 \| 10000 \| 3.8860 \| 0.3304 \|
	\| 3.8441 \| 0.4 \| 12000 \| 3.8005 \| 0.3418 \|
	\| 3.7739 \| 0.4667 \| 14000 \| 3.7315 \| 0.3503 \|
	\| 3.72 \| 0.5333 \| 16000 \| 3.6880 \| 0.3553 \|
	\| 3.72 \| 0.5461 \| 16384 \| 3.6777 \| 0.3564 \|
	\| 3.6718 \| 0.6 \| 18000 \| 3.6533 \| 0.3593 \|
	\| 3.6527 \| 0.6667 \| 20000 \| 3.6212 \| 0.3633 \|
	\| 3.6201 \| 0.7333 \| 22000 \| 3.5985 \| 0.3660 \|
	\| 3.593 \| 0.8 \| 24000 \| 3.5819 \| 0.3679 \|
	\| 3.5857 \| 0.8667 \| 26000 \| 3.5683 \| 0.3697 \|
	\| 3.5801 \| 0.9333 \| 28000 \| 3.5582 \| 0.3711 \|
	\| 3.5649 \| 1.0 \| 30000 \| 3.5532 \| 0.3716 \|


	### Framework versions

	- Transformers 4.48.0.dev0
	- Pytorch 2.5.1+cu124
	- Datasets 3.2.0
	- Tokenizers 0.21.0