BEE-spoke-data
/

smol_llama-101M-midjourney-messages

Text Generation

prompt generator

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

smol_llama-101M-midjourney-messages / README.md

pszemraj's picture

Model save

4406359 12 months ago

|

3.76 kB

	---
	license: apache-2.0
	base_model: BEE-spoke-data/smol_llama-101M-GQA
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	model-index:
	- name: smol_llama-101M-GQA-midjourney-messages-cleaned-1024-vN
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# smol_llama-101M-GQA-midjourney-messages-cleaned-1024-vN

	This model is a fine-tuned version of [BEE-spoke-data/smol_llama-101M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.8431
	- Accuracy: 0.4682

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.00025
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 17056
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-08
	- lr_scheduler_type: inverse_sqrt
	- lr_scheduler_warmup_ratio: 0.05
	- num_epochs: 1.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------:\|
	\| 3.3031 \| 0.03 \| 200 \| 3.2643 \| 0.4169 \|
	\| 3.1762 \| 0.06 \| 400 \| 3.1674 \| 0.4247 \|
	\| 3.0914 \| 0.08 \| 600 \| 3.0850 \| 0.4359 \|
	\| 3.0384 \| 0.11 \| 800 \| 3.0371 \| 0.4419 \|
	\| 3.0235 \| 0.14 \| 1000 \| 3.0057 \| 0.4467 \|
	\| 2.9874 \| 0.17 \| 1200 \| 2.9816 \| 0.4496 \|
	\| 2.9708 \| 0.19 \| 1400 \| 2.9650 \| 0.4518 \|
	\| 2.9796 \| 0.22 \| 1600 \| 2.9487 \| 0.4541 \|
	\| 2.9371 \| 0.25 \| 1800 \| 2.9364 \| 0.4560 \|
	\| 2.932 \| 0.28 \| 2000 \| 2.9265 \| 0.4571 \|
	\| 2.9272 \| 0.3 \| 2200 \| 2.9175 \| 0.4580 \|
	\| 2.935 \| 0.33 \| 2400 \| 2.9115 \| 0.4591 \|
	\| 2.9074 \| 0.36 \| 2600 \| 2.9038 \| 0.4600 \|
	\| 2.9404 \| 0.39 \| 2800 \| 2.8986 \| 0.4611 \|
	\| 2.8896 \| 0.41 \| 3000 \| 2.8938 \| 0.4617 \|
	\| 2.8946 \| 0.44 \| 3200 \| 2.8893 \| 0.4624 \|
	\| 2.9183 \| 0.47 \| 3400 \| 2.8855 \| 0.4623 \|
	\| 2.887 \| 0.5 \| 3600 \| 2.8813 \| 0.4638 \|
	\| 2.8823 \| 0.52 \| 3800 \| 2.8780 \| 0.4638 \|
	\| 2.9171 \| 0.55 \| 4000 \| 2.8744 \| 0.4642 \|
	\| 2.8884 \| 0.58 \| 4200 \| 2.8718 \| 0.4646 \|
	\| 2.8875 \| 0.61 \| 4400 \| 2.8700 \| 0.4651 \|
	\| 2.9121 \| 0.63 \| 4600 \| 2.8668 \| 0.4653 \|
	\| 2.8653 \| 0.66 \| 4800 \| 2.8639 \| 0.4658 \|
	\| 2.8603 \| 0.69 \| 5000 \| 2.8625 \| 0.4659 \|
	\| 2.8489 \| 0.72 \| 5200 \| 2.8598 \| 0.4661 \|
	\| 2.8674 \| 0.74 \| 5400 \| 2.8577 \| 0.4666 \|
	\| 2.884 \| 0.77 \| 5600 \| 2.8554 \| 0.4669 \|
	\| 2.857 \| 0.8 \| 5800 \| 2.8535 \| 0.4672 \|
	\| 2.8747 \| 0.83 \| 6000 \| 2.8516 \| 0.4673 \|
	\| 2.8809 \| 0.86 \| 6200 \| 2.8501 \| 0.4672 \|
	\| 2.8832 \| 0.88 \| 6400 \| 2.8482 \| 0.4679 \|
	\| 2.8817 \| 0.91 \| 6600 \| 2.8472 \| 0.4681 \|
	\| 2.8813 \| 0.94 \| 6800 \| 2.8457 \| 0.4684 \|
	\| 2.8493 \| 0.97 \| 7000 \| 2.8444 \| 0.4677 \|
	\| 2.8455 \| 0.99 \| 7200 \| 2.8431 \| 0.4682 \|


	### Framework versions

	- Transformers 4.36.0.dev0
	- Pytorch 2.1.0
	- Datasets 2.15.0
	- Tokenizers 0.15.0