OpenAssistant
/

oasst-sft-6-llama-30b-xor

Model card Files Files and versions Community

andreaskoepf commited on Apr 26, 2023

Commit

cecdd87

•

1 Parent(s): 94537a6

Update README.md

Files changed (1) hide show

README.md +47 -2

README.md CHANGED Viewed

@@ -4,8 +4,6 @@ license: other
 # OpenAssistant LLaMa 30B SFT 6
-- **Paper:** https://arxiv.org/abs/2304.07327
 Due to the license attached to LLaMa models by Meta AI it is not possible to directly distribute LLaMa-based models. Instead we provide XOR weights for the OA models.
 Thanks to Mick for writing the `xor_codec.py` script which enables this process
@@ -140,3 +138,50 @@ ae48c4c68e4e171d502dd0896aa19a84  ./pytorch_model-00002-of-00007.bin
 ```
 If so you have successfully decoded the weights and should be able to use the model with HuggingFace Transformers. **If your checksums do not match those above, there is a problem.**

 # OpenAssistant LLaMa 30B SFT 6
 Due to the license attached to LLaMa models by Meta AI it is not possible to directly distribute LLaMa-based models. Instead we provide XOR weights for the OA models.
 Thanks to Mick for writing the `xor_codec.py` script which enables this process
 ```
 If so you have successfully decoded the weights and should be able to use the model with HuggingFace Transformers. **If your checksums do not match those above, there is a problem.**
+### Configuration
+```
+llama-30b-sft-6:
+  dtype: fp16
+  log_dir: "llama_log_30b"
+  learning_rate: 1e-5
+  model_name: /home/ubuntu/Open-Assistant/model/model_training/.saved/llama-30b-super-pretrain/checkpoint-3500
+  output_dir: llama_model_30b
+  deepspeed_config: configs/zero3_config_sft.json
+  weight_decay: 0.0
+  residual_dropout: 0.0
+  max_length: 2048
+  use_flash_attention: true
+  warmup_steps: 20
+  gradient_checkpointing: true
+  gradient_accumulation_steps: 16
+  per_device_train_batch_size: 2
+  per_device_eval_batch_size: 3
+  eval_steps: 101
+  save_steps: 292
+  num_train_epochs: 8
+  save_total_limit: 3
+  use_custom_sampler: true
+  sort_by_length: false
+  save_strategy: steps
+  datasets:
+    - oasst_export:
+        lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk"
+        input_file_path: 2023-04-12_oasst_release_ready_synth.jsonl.gz
+        val_split: 0.05
+    - vicuna:
+        val_split: 0.05
+        max_val_set: 800
+        fraction: 0.8
+    - dolly15k:
+        val_split: 0.05
+        max_val_set: 300
+    - grade_school_math_instructions:
+        val_split: 0.05
+    - code_alpaca:
+        val_split: 0.05
+        max_val_set: 250
+```
+- **OASST dataset paper:** https://arxiv.org/abs/2304.07327