Create README.md

854f7bc verified about 1 month ago

3.99 kB

	---
	library_name: transformers
	license: other
	license_name: qwen
	license_link: https://huggingface.co/Qwen/Qwen2.5-14B/blob/main/LICENSE
	base_model: Qwen/Qwen2.5-14B
	tags:
	- generated_from_trainer
	model-index:
	- name: 14B-Qwen2.5-Freya-x1
	results: []
	---

	![Kunou](https://huggingface.co/Sao10K/72B-Qwen2.5-Kunou-v1/resolve/main/knn.png)

	Sister Versions for Lightweight and Heavyweight Use!

	# 14B-Qwen2.5-Freya-v1

	I decided to mess around with training methods, considering the re-emegence of no longer used methods like multi-step training. Some people began doing it again, and so, why not? Inspired by LimaRP's methology but done it my way.


	Freya-S1
	- LoRA Trained on ~1.1GB of literature and raw text over Qwen 2.5's base model.
	- Cleaned text and literature as best as I could, still, may have had issues here and there.

	Freya-S2
	- The first LoRA was applied over Qwen 2.5 Instruct, then I trained on top of that.
	- Reduced LoRA rank because it's mainly instruct and other details I won't get into.

	Recommended Model Settings \| Look, I just use these, they work fine enough. I don't even know how DRY or other meme samplers work. Your system prompt matters more anyway.
	```
	Prompt Format: ChatML
	Temperature: 1.1
	min_p: 0.1
	```

	Training time in total was ~10 Hours on a 8xH100 Node, sponsored by the Government of Singapore or something. Thanks for the national service allowance, MHA.

	https://sao10k.carrd.co/ for contact.

	---

	[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.6.0`
	```yaml
	base_model:
	- s1: Qwen/Qwen2.5-14B
	- s2: Qwen/Qwen2.5-14B-Instruct
	model_type: AutoModelForCausalLM
	tokenizer_type: AutoTokenizer

	load_in_8bit: false
	load_in_4bit: false
	strict: false
	sequence_len: 16384
	bf16: auto
	fp16:
	tf32: false
	flash_attention: true
	special_tokens:

	adapter: lora
	lora_r:
	- s1: 64
	- s2: 32
	lora_alpha: 64
	lora_dropout: 0.2
	lora_fan_in_fan_out:
	peft_use_rslora: true
	lora_target_linear: true

	# Data
	dataset_prepared_path: dataset_rUn_freya
	datasets:
	# S1 - Writing / Completion
	- path: datasets/eBooks-cleaned-75K
	type: completion
	- path: datasets/novels-clean-dedupe-10K
	type: completion
	# S2 - Instruct
	- path: datasets/10k-amoral-full-fixed-sys.json
	type: chat_template
	chat_template: chatml
	roles_to_train: ["gpt"]
	field_messages: conversations
	message_field_role: from
	message_field_content: value
	train_on_eos: turn
	- path: datasets/44k-hespera-smartshuffle.json
	type: chat_template
	chat_template: chatml
	roles_to_train: ["gpt"]
	field_messages: conversations
	message_field_role: from
	message_field_content: value
	train_on_eos: turn
	- path: datasets/5k_rpg_adventure_instruct-sys.json
	type: chat_template
	chat_template: chatml
	roles_to_train: ["gpt"]
	field_messages: conversations
	message_field_role: from
	message_field_content: value
	train_on_eos: turn
	shuffle_merged_datasets: true
	warmup_ratio: 0.1

	plugins:
	- axolotl.integrations.liger.LigerPlugin
	liger_rope: true
	liger_rms_norm: true
	liger_layer_norm: true
	liger_glu_activation: true
	liger_fused_linear_cross_entropy: true

	# Iterations
	num_epochs:
	- s1: 2
	- s2: 2

	# Sampling
	sample_packing: true
	pad_to_sequence_len: true
	train_on_inputs: false
	group_by_length: false

	# Batching
	gradient_accumulation_steps: 4
	micro_batch_size: 2
	gradient_checkpointing: unsloth

	# Evaluation
	val_set_size: 0.025
	evals_per_epoch: 5
	eval_table_size:
	eval_max_new_tokens: 256
	eval_sample_packing: false
	eval_batch_size: 1

	# Optimizer
	optimizer: paged_ademamix_8bit
	lr_scheduler: cosine
	learning_rate:
	- s1: 0.000002
	- s2: 0.000004
	weight_decay: 0.2
	max_grad_norm: 10.0

	# Garbage Collection
	gc_steps: 10

	# Misc
	deepspeed: ./deepspeed_configs/zero2.json

	```

	</details><br>