v0.2
Browse files
README.md
CHANGED
@@ -20,11 +20,17 @@ Fine-tuning of ‘Llama-3.1-8B’ with a focus on RP and uncensored.
|
|
20 |
<details>
|
21 |
<summary>This training can be replicated using LLaMA-Factory. </summary>
|
22 |
|
23 |
-
Stage A
|
24 |
```
|
25 |
-
set CUDA_VISIBLE_DEVICES=0 &&
|
26 |
```
|
27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
<details>
|
29 |
<summary>dataset_info.json</summary>
|
30 |
|
|
|
20 |
<details>
|
21 |
<summary>This training can be replicated using LLaMA-Factory. </summary>
|
22 |
|
23 |
+
Stage A SFT
|
24 |
```
|
25 |
+
set CUDA_VISIBLE_DEVICES=0 && llamafactory-cli train --stage sft --do_train True --model_name_or_path NousResearch/Meta-Llama-3.1-8B-Instruct --preprocessing_num_workers 16 --finetuning_type lora --template alpaca --rope_scaling linear --flash_attn fa2 --dataset_dir data --dataset faproulette_co-OCR-fixer,ascii_art,Uncensored_DAN,Lumimaid-v2,Degrees_of_Lewdity,qa-unc-sft,psy_mental_health --cutoff_len 8192 --learning_rate 5e-05 --num_train_epochs 1.0 --max_samples 100000 --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --lr_scheduler_type cosine --max_grad_norm 1.0 --logging_steps 10 --save_steps 1000 --warmup_steps 1000 --neftune_noise_alpha 5 --optim adamw_8bit --packing True --neat_packing True --report_to none --output_dir saves\LLaMA3.1-8B-Chat\lora\Luminia-8B-RP --bf16 True --plot_loss True --ddp_timeout 180000000 --include_num_input_tokens_seen True --quantization_bit 4 --quantization_method bitsandbytes --lora_rank 32 --lora_alpha 64 --lora_dropout 0.15 --create_new_adapter True --lora_target all --use_adam_mini True
|
26 |
```
|
27 |
|
28 |
+
Stage B: merged. Than proceeds to DPO, `simpo` exp hallucination/unstable output...
|
29 |
+
```
|
30 |
+
set CUDA_VISIBLE_DEVICES=0 && llamafactory-cli train --stage dpo --do_train True --model_name_or_path NousResearch/Meta-Llama-3.1-8B-Instruct --preprocessing_num_workers 16 --finetuning_type lora --template alpaca --flash_attn fa2 --dataset_dir data --dataset qa-unc-dpo --cutoff_len 8192 --learning_rate 3e-05 --num_train_epochs 6.5 --max_samples 100000 --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --lr_scheduler_type cosine --max_grad_norm 1.0 --logging_steps 5 --save_steps 1000 --warmup_steps 500 --optim adamw_8bit --packing False --report_to none --output_dir saves\LLaMA3.1-8B-Chat\lora\Luminia-8B-RP-DPO --bf16 True --plot_loss True --ddp_timeout 180000000 --quantization_bit 4 --quantization_method bitsandbytes --lora_rank 32 --lora_alpha 64 --lora_dropout 0.15 --lora_target all --pref_beta 0.1 --pref_ftx 0 --pref_loss simpo --create_new_adapter True --use_adam_mini True
|
31 |
+
```
|
32 |
+
|
33 |
+
|
34 |
<details>
|
35 |
<summary>dataset_info.json</summary>
|
36 |
|