--- library_name: transformers license: other license_name: qwen license_link: https://huggingface.co/Qwen/Qwen2.5-14B/blob/main/LICENSE base_model: Qwen/Qwen2.5-14B tags: - generated_from_trainer model-index: - name: 14B-Qwen2.5-Freya-x1 results: [] --- ![Freya](https://huggingface.co/Sao10K/14B-Qwen2.5-Freya-x1/resolve/main/sad.png) *Me during failed runs* # 14B-Qwen2.5-Freya-v1 I decided to mess around with training methods again, considering the re-emegence of methods like multi-step training. Some people began doing it again, and so, why not? Inspired by AshhLimaRP's methology but done it my way. Freya-S1 - LoRA Trained on ~1.1GB of literature and raw text over Qwen 2.5's base model. - Cleaned text and literature as best as I could, still, may have had issues here and there. Freya-S2 - The first LoRA was applied over Qwen 2.5 Instruct, then I trained on top of that. - Reduced LoRA rank because it's mainly instruct and other details I won't get into. Recommended Model Settings | *Look, I just use these, they work fine enough. I don't even know how DRY or other meme samplers work. Your system prompt matters more anyway.* ``` Prompt Format: ChatML Temperature: 1+ # I don't know, man. min_p: 0.05 ``` Training time in total was ~10 Hours on a 8xH100 Node, sponsored by the Government of Singapore or something. Thanks for the national service allowance, MHA. https://sao10k.carrd.co/ for contact. --- [Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config axolotl version: `0.6.0` ```yaml base_model: - s1: Qwen/Qwen2.5-14B - s2: Qwen/Qwen2.5-14B-Instruct model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer load_in_8bit: false load_in_4bit: false strict: false sequence_len: 16384 bf16: auto fp16: tf32: false flash_attention: true special_tokens: adapter: lora # 16-bit lora_r: - s1: 64 - s2: 32 lora_alpha: 64 lora_dropout: 0.2 lora_fan_in_fan_out: peft_use_rslora: true lora_target_linear: true # Data dataset_prepared_path: dataset_run_freya datasets: # S1 - Writing / Completion - path: datasets/eBooks-cleaned-75K type: completion - path: datasets/novels-clean-dedupe-10K type: completion # S2 - Instruct - path: datasets/10k-amoral-full-fixed-sys.json type: chat_template chat_template: chatml roles_to_train: ["gpt"] field_messages: conversations message_field_role: from message_field_content: value train_on_eos: turn - path: datasets/44k-hespera-smartshuffle.json type: chat_template chat_template: chatml roles_to_train: ["gpt"] field_messages: conversations message_field_role: from message_field_content: value train_on_eos: turn - path: datasets/5k_rpg_adventure_instruct-sys.json type: chat_template chat_template: chatml roles_to_train: ["gpt"] field_messages: conversations message_field_role: from message_field_content: value train_on_eos: turn shuffle_merged_datasets: true warmup_ratio: 0.1 plugins: - axolotl.integrations.liger.LigerPlugin liger_rope: true liger_rms_norm: true liger_layer_norm: true liger_glu_activation: true liger_fused_linear_cross_entropy: true # Iterations num_epochs: - s1: 1 - s2: 2 # Sampling sample_packing: true pad_to_sequence_len: true train_on_inputs: false group_by_length: false # Batching gradient_accumulation_steps: 4 micro_batch_size: 2 gradient_checkpointing: unsloth # Evaluation val_set_size: 0.025 evals_per_epoch: 5 eval_table_size: eval_max_new_tokens: 256 eval_sample_packing: false eval_batch_size: 1 # Optimizer optimizer: paged_ademamix_8bit lr_scheduler: cosine learning_rate: - s1: 0.000002 - s2: 0.000004 weight_decay: 0.2 max_grad_norm: 10.0 # Garbage Collection gc_steps: 10 # Misc deepspeed: ./deepspeed_configs/zero2.json ```