evilfreelancer
/

ruGPT3.5-13B-lora-chain-of-thought

Text Generation

chain of thought

Model card Files Files and versions Community

evilfreelancer commited on Oct 9

Commit

cca32e9

•

1 Parent(s): cac6871

Update README.md

Files changed (1) hide show

README.md +81 -3

README.md CHANGED Viewed

@@ -1,3 +1,81 @@
----
-license: mit
----

+---
+base_model: ai-forever/ruGPT-3.5-13B
+library_name: peft
+license: mit
+datasets:
+- evilfreelancer/ru-chain-of-thought-sharegpt
+language:
+- ru
+tags:
+- impruver
+- russian
+- cot
+- chain of thought
+- lora
+pipeline_tag: text-generation
+---
+# ruGPT-3.5-13B / chain of thought
+LoRA адаптер для ruGPT3.5-13B обученный на датасете [evilfreelancer/ru-chain-of-thought-sharegpt](https://huggingface.co/datasets/evilfreelancer/ru-chain-of-thought-sharegpt)
+данный датасет представляет из себя перевод на русский
+датасета [isaiahbjork/chain-of-thought-sharegpt](https://huggingface.co/datasets/isaiahbjork/chain-of-thought-sharegpt) при
+помощи модели [utrobinmv/t5_translate_en_ru_zh_small_1024](https://huggingface.co/utrobinmv/t5_translate_en_ru_zh_small_1024)
+прикладываю скрипт [перевода](https://gist.github.com/EvilFreelancer/230fb48329889506cf88c03b8893e4b9) на Gist.
+Конфигурация: https://github.com/EvilFreelancer/impruver/blob/main/configs/ruGPT35_13B_cot_lora.yml
+Адаптер обучался на 1x RTX 4090, для этого потребовалось примерно 20Gb VRAM и заняло 19m.
+```yaml
+output_dir: ./models/ruGPT35_13B_lora_cot
+train_path: ./train.ruGPT35_13B_cot.jsonl
+val_path: ./val.ruGPT35_13B_cot.jsonl
+datasets:
+  - name: evilfreelancer/ru-chain-of-thought-sharegpt
+    converter: impruver.conversations_to_messages
+model:
+  class: transformers.AutoModelForCausalLM
+  name: ai-forever/ruGPT-3.5-13B
+  load_in_4bit: true
+  load_in_8bit: false
+  dtype: bf16
+lora:
+  r: 16
+  lora_alpha: 16
+  lora_dropout: 0.05
+  bias: none
+  target_modules: [ c_attn ]
+  task_type: CAUSAL_LM
+tokenizer:
+  class: transformers.AutoTokenizer
+  name: ai-forever/ruGPT-3.5-13B
+  max_tokens_count: 1200
+trainer:
+  eval_strategy: steps
+  save_strategy: steps
+  eval_steps: 100
+  save_steps: 100
+  per_device_train_batch_size: 1
+  per_device_eval_batch_size: 1
+  gradient_accumulation_steps: 5
+  logging_steps: 1
+  learning_rate: 0.0002
+  num_train_epochs: 2
+  lr_scheduler_type: cosine
+  warmup_steps: 16
+  optim: adamw_8bit
+  metric_for_best_model: eval_loss
+  load_best_model_at_end: true
+  save_total_limit: 2
+  seed: 42
+  remove_unused_columns: false
+  max_grad_norm: 1.0
+  weight_decay: 0.08
+  torch_compile: false
+```