Nekochu commited on
Commit
4c60476
1 Parent(s): 8d5b2c5
Files changed (1) hide show
  1. README.md +8 -2
README.md CHANGED
@@ -20,11 +20,17 @@ Fine-tuning of ‘Llama-3.1-8B’ with a focus on RP and uncensored.
20
  <details>
21
  <summary>This training can be replicated using LLaMA-Factory. </summary>
22
 
23
- Stage A: Continued **S**upervised **F**ine-**T**uning, QA
24
  ```
25
- set CUDA_VISIBLE_DEVICES=0 && llamafactory-cli train --stage sft --do_train True --model_name_or_path NousResearch/Meta-Llama-3.1-8B-Instruct --preprocessing_num_workers 1 --finetuning_type lora --template alpaca --rope_scaling linear --flash_attn fa2 --dataset_dir data --dataset faproulette_co-OCR-fix-gpt4o_qa,ascii_art,Uncensored_DAN,Lumimaid-v2,Degrees_of_Lewdity --cutoff_len 8192 --learning_rate 5e-05 --num_train_epochs 1.0 --max_samples 100000 --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --lr_scheduler_type cosine --max_grad_norm 1.0 --logging_steps 10 --save_steps 1000 --warmup_steps 1000 --neftune_noise_alpha 5 --optim adamw_8bit --packing True --neat_packing True --report_to none --output_dir saves\LLaMA3.1-8B-Chat\lora\Luminia-8B-RP --bf16 True --plot_loss True --ddp_timeout 180000000 --include_num_input_tokens_seen True --quantization_bit 4 --quantization_method bitsandbytes --lora_rank 32 --lora_alpha 64 --lora_dropout 0.15 --create_new_adapter True --lora_target all --use_adam_mini True
26
  ```
27
 
 
 
 
 
 
 
28
  <details>
29
  <summary>dataset_info.json</summary>
30
 
 
20
  <details>
21
  <summary>This training can be replicated using LLaMA-Factory. </summary>
22
 
23
+ Stage A SFT
24
  ```
25
+ set CUDA_VISIBLE_DEVICES=0 && llamafactory-cli train --stage sft --do_train True --model_name_or_path NousResearch/Meta-Llama-3.1-8B-Instruct --preprocessing_num_workers 16 --finetuning_type lora --template alpaca --rope_scaling linear --flash_attn fa2 --dataset_dir data --dataset faproulette_co-OCR-fixer,ascii_art,Uncensored_DAN,Lumimaid-v2,Degrees_of_Lewdity,qa-unc-sft,psy_mental_health --cutoff_len 8192 --learning_rate 5e-05 --num_train_epochs 1.0 --max_samples 100000 --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --lr_scheduler_type cosine --max_grad_norm 1.0 --logging_steps 10 --save_steps 1000 --warmup_steps 1000 --neftune_noise_alpha 5 --optim adamw_8bit --packing True --neat_packing True --report_to none --output_dir saves\LLaMA3.1-8B-Chat\lora\Luminia-8B-RP --bf16 True --plot_loss True --ddp_timeout 180000000 --include_num_input_tokens_seen True --quantization_bit 4 --quantization_method bitsandbytes --lora_rank 32 --lora_alpha 64 --lora_dropout 0.15 --create_new_adapter True --lora_target all --use_adam_mini True
26
  ```
27
 
28
+ Stage B: merged. Than proceeds to DPO, `simpo` exp hallucination/unstable output...
29
+ ```
30
+ set CUDA_VISIBLE_DEVICES=0 && llamafactory-cli train --stage dpo --do_train True --model_name_or_path NousResearch/Meta-Llama-3.1-8B-Instruct --preprocessing_num_workers 16 --finetuning_type lora --template alpaca --flash_attn fa2 --dataset_dir data --dataset qa-unc-dpo --cutoff_len 8192 --learning_rate 3e-05 --num_train_epochs 6.5 --max_samples 100000 --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --lr_scheduler_type cosine --max_grad_norm 1.0 --logging_steps 5 --save_steps 1000 --warmup_steps 500 --optim adamw_8bit --packing False --report_to none --output_dir saves\LLaMA3.1-8B-Chat\lora\Luminia-8B-RP-DPO --bf16 True --plot_loss True --ddp_timeout 180000000 --quantization_bit 4 --quantization_method bitsandbytes --lora_rank 32 --lora_alpha 64 --lora_dropout 0.15 --lora_target all --pref_beta 0.1 --pref_ftx 0 --pref_loss simpo --create_new_adapter True --use_adam_mini True
31
+ ```
32
+
33
+
34
  <details>
35
  <summary>dataset_info.json</summary>
36