Sao10K
/

70B-L3.3-mhnnn-x1

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

70B-L3.3-mhnnn-x1 / README.md

Sao10K's picture

Update README.md

3fe1847 verified 28 days ago

|

history blame contribute delete

3.55 kB

	---
	library_name: transformers
	base_model:
	- meta-llama/Llama-3.3-70B-Instruct
	tags:
	- generated_from_trainer
	model-index:
	- name: 70B-L3.3-mhnnn-x1
	results: []
	license: llama3.3
	---

	![yeah](https://huggingface.co/Sao10K/70B-L3.3-mhnnn-x1/resolve/main/Huh.jpg)
	my mental when things do not go well

	# 70B-L3.3-mhnnn-x1

	I quite liked it, after messing around. Same data composition as Freya, applied differently.

	Has occasional brainfarts which are fixed with a regen, the price for more creative outputs.

	Recommended Model Settings \| Look, I just use these, they work fine enough. I don't even know how DRY or other meme samplers work. Your system prompt matters more anyway.
	```
	Prompt Format: Llama-3-Instruct
	Temperature: 1.1
	min_p: 0.05
	```

	Types of Data included within Sets
	```
	Completion - Novels / eBooks
	Text Adventure - Include details like 'Text Adventure Narrator' in the System Prompt, give it a one-shot example and it'll fly.
	Amoral Assistant - Include the terms 'Amoral', 'Neutral' along with the regular assistant prompt for better results.
	Instruct / Assistant - The usual assistant tasks.
	Roleplay - As per Usual, Regular Sets
	```

	Training time in total was ~14 Hours on a 8xH100 Node, shout out to SCDF for not sponsoring this run. My funds are dry doing random things.

	https://sao10k.carrd.co/ for contact.

	---

	[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.6.0`
	```yaml
	adapter: lora # 16-bit
	lora_r: 64
	lora_alpha: 64
	lora_dropout: 0.2
	peft_use_rslora: true
	lora_target_linear: true

	# Data
	dataset_prepared_path: dataset_run_freya
	datasets:
	# S1 - Writing / Completion
	- path: datasets/eBooks-cleaned-75K
	type: completion
	- path: datasets/novels-clean-dedupe-10K
	type: completion
	# S2 - Instruct
	- path: datasets/10k-amoral-full-fixed-sys.json
	type: chat_template
	chat_template: llama3
	roles_to_train: ["gpt"]
	field_messages: conversations
	message_field_role: from
	message_field_content: value
	train_on_eos: turn
	- path: datasets/44k-hespera-smartshuffle.json
	type: chat_template
	chat_template: llama3
	roles_to_train: ["gpt"]
	field_messages: conversations
	message_field_role: from
	message_field_content: value
	train_on_eos: turn
	- path: datasets/5k_rpg_adventure_instruct-sys.json
	type: chat_template
	chat_template: llama3
	roles_to_train: ["gpt"]
	field_messages: conversations
	message_field_role: from
	message_field_content: value
	train_on_eos: turn
	shuffle_merged_datasets: true
	warmup_ratio: 0.1

	plugins:
	- axolotl.integrations.liger.LigerPlugin
	liger_rope: true
	liger_rms_norm: true
	liger_layer_norm: true
	liger_glu_activation: true
	liger_fused_linear_cross_entropy: true

	# Iterations
	num_epochs: 1

	# Sampling
	sample_packing: true
	pad_to_sequence_len: true
	train_on_inputs: false
	group_by_length: false

	# Batching
	gradient_accumulation_steps: 4
	micro_batch_size: 2
	gradient_checkpointing: unsloth

	# Evaluation
	val_set_size: 0.025
	evals_per_epoch: 5
	eval_table_size:
	eval_max_new_tokens: 256
	eval_sample_packing: false
	eval_batch_size: 1

	# Optimizer
	optimizer: paged_ademamix_8bit
	lr_scheduler: cosine
	learning_rate: 0.00000242
	weight_decay: 0.2
	max_grad_norm: 10.0

	# Garbage Collection
	gc_steps: 10

	# Misc
	deepspeed: ./deepspeed_configs/zero3_bf16.json
	```

	</details><br>