Kendamarron commited on
Commit
915b62d
1 Parent(s): 2ba2c2e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -5
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  library_name: transformers
3
- license: other
4
  base_model: llm-jp/llm-jp-3-3.7b-instruct
5
  tags:
6
  - llama-factory
@@ -9,16 +9,22 @@ tags:
9
  model-index:
10
  - name: sft
11
  results: []
 
 
12
  ---
13
 
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
  should probably proofread and complete it, then remove this comment. -->
16
 
17
- # sft
18
 
19
- This model is a fine-tuned version of [llm-jp/llm-jp-3-3.7b-instruct](https://huggingface.co/llm-jp/llm-jp-3-3.7b-instruct) on the longwriter dataset.
20
- It achieves the following results on the evaluation set:
21
- - Loss: 0.7541
 
 
 
 
22
 
23
  ## Model description
24
 
@@ -63,3 +69,50 @@ The following hyperparameters were used during training:
63
  - Pytorch 2.5.1+cu124
64
  - Datasets 3.1.0
65
  - Tokenizers 0.20.3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: transformers
3
+ license: apache-2.0
4
  base_model: llm-jp/llm-jp-3-3.7b-instruct
5
  tags:
6
  - llama-factory
 
9
  model-index:
10
  - name: sft
11
  results: []
12
+ language:
13
+ - ja
14
  ---
15
 
16
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
  should probably proofread and complete it, then remove this comment. -->
18
 
19
+ # Kendamarron/LongWriter-llm-jp-3-3.7b-instruct
20
 
21
+ [llm-jp/llm-jp-3-3.7b-instruct](https://huggingface.co/llm-jp/llm-jp-3-3.7b-instruct)を長文出力ができるようにSFTしたモデルです。
22
+
23
+ ## Dataset
24
+ - [Kendamarron/Japanese-LongWriter-3k](https://huggingface.co/datasets/Kendamarron/Japanese-LongWriter-3k)
25
+
26
+ ## Detail
27
+ https://zenn.dev/kendama/articles/32aa9ec4bed409
28
 
29
  ## Model description
30
 
 
69
  - Pytorch 2.5.1+cu124
70
  - Datasets 3.1.0
71
  - Tokenizers 0.20.3
72
+
73
+ ### LLaMA-Factory yaml
74
+ ```
75
+ ### model
76
+ model_name_or_path: llm-jp/llm-jp-3-3.7b-instruct
77
+
78
+ ### method
79
+ stage: sft
80
+ do_train: true
81
+ finetuning_type: full
82
+ deepspeed: examples/deepspeed/ds_z3_config.json
83
+ enable_liger_kernel: true
84
+
85
+ ### dataset
86
+ dataset: longwriter
87
+ template: alpaca_ja
88
+ cutoff_len: 32768
89
+ overwrite_cache: true
90
+ preprocessing_num_workers: 16
91
+
92
+ ### output
93
+ output_dir: saves/llm_jp/full/sft
94
+ logging_steps: 1
95
+ save_steps: 500
96
+ plot_loss: true
97
+ overwrite_output_dir: true
98
+
99
+ ### train
100
+ per_device_train_batch_size: 2
101
+ gradient_accumulation_steps: 1
102
+ learning_rate: 1.0e-5
103
+ optim: adamw_bnb_8bit
104
+ num_train_epochs: 2.0
105
+ lr_scheduler_type: cosine
106
+ warmup_ratio: 0.1
107
+ bf16: true
108
+ ddp_timeout: 180000000
109
+
110
+ ### eval
111
+ val_size: 0.01
112
+ per_device_eval_batch_size: 1
113
+ eval_strategy: steps
114
+ eval_steps: 500
115
+
116
+ ### logging
117
+ report_to: wandb
118
+ ```