Kendamarron
/

LongWriter-llm-jp-3-3.7b-instruct

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Kendamarron commited on 4 days ago

Commit

915b62d

•

1 Parent(s): 2ba2c2e

Update README.md

Files changed (1) hide show

README.md +58 -5

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 library_name: transformers
-license: other
 base_model: llm-jp/llm-jp-3-3.7b-instruct
 tags:
 - llama-factory
@@ -9,16 +9,22 @@ tags:
 model-index:
 - name: sft
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# sft
-This model is a fine-tuned version of [llm-jp/llm-jp-3-3.7b-instruct](https://huggingface.co/llm-jp/llm-jp-3-3.7b-instruct) on the longwriter dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.7541
 ## Model description
@@ -63,3 +69,50 @@ The following hyperparameters were used during training:
 - Pytorch 2.5.1+cu124
 - Datasets 3.1.0
 - Tokenizers 0.20.3

 ---
 library_name: transformers
+license: apache-2.0
 base_model: llm-jp/llm-jp-3-3.7b-instruct
 tags:
 - llama-factory
 model-index:
 - name: sft
   results: []
+language:
+- ja
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# Kendamarron/LongWriter-llm-jp-3-3.7b-instruct
+[llm-jp/llm-jp-3-3.7b-instruct](https://huggingface.co/llm-jp/llm-jp-3-3.7b-instruct)を長文出力ができるようにSFTしたモデルです。
+## Dataset
+- [Kendamarron/Japanese-LongWriter-3k](https://huggingface.co/datasets/Kendamarron/Japanese-LongWriter-3k)
+## Detail
+https://zenn.dev/kendama/articles/32aa9ec4bed409
 ## Model description
 - Pytorch 2.5.1+cu124
 - Datasets 3.1.0
 - Tokenizers 0.20.3
+### LLaMA-Factory yaml
+```
+### model
+model_name_or_path: llm-jp/llm-jp-3-3.7b-instruct
+### method
+stage: sft
+do_train: true
+finetuning_type: full
+deepspeed: examples/deepspeed/ds_z3_config.json
+enable_liger_kernel: true
+### dataset
+dataset: longwriter
+template: alpaca_ja
+cutoff_len: 32768
+overwrite_cache: true
+preprocessing_num_workers: 16
+### output
+output_dir: saves/llm_jp/full/sft
+logging_steps: 1
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+### train
+per_device_train_batch_size: 2
+gradient_accumulation_steps: 1
+learning_rate: 1.0e-5
+optim: adamw_bnb_8bit
+num_train_epochs: 2.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+### eval
+val_size: 0.01
+per_device_eval_batch_size: 1
+eval_strategy: steps
+eval_steps: 500
+### logging
+report_to: wandb
+```