Owen Arliawan

Update README.md

680d906 verified 7 months ago

4.46 kB

	---
	license: apache-2.0
	---
	Based on Meta-Llama-3-8b-Instruct, and is governed by Meta Llama 3 License agreement:
	https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b/blob/main/LICENSE


	We don't know how good this model is exactly in benchmarks since we have not benched this yet, but we think real prompts and usage is more telling anyways.


	From our testing this model is:

	- Less Refusals
	- More Uncensored
	- Follows requests better
	- Can reply in requested formats better without adding unnecesary information

	We are happy for anyone to try it out and give some feedback.
	You can also try this model on our API at https://www.awanllm.com/


	Trained on 2048 sequence length, while the base model is 8192 sequence length. From testing it still performs the same 8192 context just fine.

	Trained using Cognitive Computations Eric Hartford's https://huggingface.co/datasets/cognitivecomputations/dolphin dataset as we've found great results from their dolphin models in previous Llama models.

	Trained for 2 days on 2x RTX3090 on our own machine, using 4-bit loading and Qlora 64-rank 128-alpha resulting in ~2% trainable weights.


	The goal for this model is to have the model less-censored and great at general tasks like the previous dolphin models by Eric Hartford.
	We started training this BEFORE they launched their own full weight trained Llama-3-8B-Dolphin-2.9 with their own curated datasets and the newer "Dolphin 2.9" dataset.
	https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b


	The difference is that we train this using Meta's new Llama 3 instruct format and not the regular ChatML format that Dolphin models are usually trained on. This is because we think that it might perform better using the format it was originally trained on.
	Instruct format:
	```
	<\|begin_of_text\|><\|start_header_id\|>system<\|end_header_id\|>

	{{ system_prompt }}<\|eot_id\|><\|start_header_id\|>user<\|end_header_id\|>

	{{ user_message_1 }}<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>

	{{ model_answer_1 }}<\|eot_id\|><\|start_header_id\|>user<\|end_header_id\|>

	{{ user_message_2 }}<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>
	```


	Quants:

	GGUF: https://huggingface.co/AwanLLM/Meta-Llama-3-8B-Dolphin-Lite-v0.1-GGUF

	FP16: https://huggingface.co/AwanLLM/Meta-Llama-3-8B-Instruct-Dolphin-Lite

	Exllamav2:

	4bpw: https://huggingface.co/AwanLLM/Meta-Llama-3-8B-Dolphin-Lite-v0.1-exl2-h8-4bpw-exl2

	8bpw: https://huggingface.co/AwanLLM/Meta-Llama-3-8B-Dolphin-Lite-v0.1-exl2-h8-8bpw-exl2


	[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)

	Axolotl Config:
	```
	base_model: Meta-Llama-3-8B-Instruct
	model_type: LlamaForCausalLM
	tokenizer_type: AutoTokenizer

	train_on_inputs: false
	group_by_length: false
	load_in_8bit: false
	load_in_4bit: true
	strict: false
	sequence_len: 2048
	bf16: true
	fp16: false
	tf32: false
	flash_attention: true

	# Data
	datasets:
	- path: flan1m-universal-uncensored-system-2048.jsonl
	type:
	system_prompt: ""
	system_format: "<\|begin_of_text\|><\|start_header_id\|>system<\|end_header_id\|>\n\n{system}<\|eot_id\|><\|start_header_id\|>user<\|end_header_id\|>\n\n"
	field_system: system
	field_instruction: input
	field_output: output
	format: "{instruction}<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>\n\n"
	no_input_format: "{instruction}<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>\n\n"

	warmup_steps: 10
	dataset_prepared_path: ./last_run_prepared

	# Iterations
	num_epochs: 1
	saves_per_epoch: 4

	# Evaluation
	val_set_size: 0.01
	eval_table_size:
	eval_table_max_new_tokens:
	eval_sample_packing: false
	evals_per_epoch: 4

	# LoRA
	output_dir: ./qlora-out
	adapter: qlora
	lora_model_dir:
	lora_r: 64
	lora_alpha: 128
	lora_dropout: 0.05
	lora_target_linear: true
	lora_fan_in_fan_out:
	lora_target_modules:
	save_safetensors: true

	# Sampling
	sample_packing: true
	pad_to_sequence_len: true

	# Batching
	gradient_accumulation_steps: 32
	micro_batch_size: 4
	gradient_checkpointing: true
	gradient_checkpointing_kwargs:
	use_reentrant: true

	# Optimizer
	optimizer: paged_adamw_8bit
	lr_scheduler: cosine
	learning_rate: 0.0002

	# Misc
	early_stopping_patience:
	resume_from_checkpoint:
	logging_steps: 1
	debug:
	deepspeed: zero3_bf16.json
	weight_decay: 0.1
	special_tokens:
	pad_token: <\|end_of_text\|>
	```