ruGPT-3.5-13B / function call
LoRA адаптер для ruGPT3.5-13B обученный на датасете function call.
Конфигурация: https://github.com/EvilFreelancer/impruver/blob/main/configs/ruGPT35_13B_fc_lora.yml
Адаптер обучался на 1x RTX 4090, для этого потребовалось примерно 20Gb VRAM и заняло 11h 8m.
output_dir: ./models/ruGPT35_13B_lora_fc
train_path: ./train.ruGPT35_13B_fc.jsonl
val_path: ./val.ruGPT35_13B_fc.jsonl
datasets:
- name: korotkov/glaive-function-calling-v2-ru-parsed
split: train
model:
class: transformers.AutoModelForCausalLM
name: ai-forever/ruGPT-3.5-13B
load_in_4bit: true
load_in_8bit: false
dtype: bf16
lora:
r: 16
lora_alpha: 16
lora_dropout: 0.05
bias: none
target_modules: [ c_attn ]
task_type: CAUSAL_LM
tokenizer:
class: transformers.AutoTokenizer
name: ai-forever/ruGPT-3.5-13B
max_tokens_count: 1200
trainer:
eval_strategy: steps
save_strategy: steps
eval_steps: 100
save_steps: 100
per_device_train_batch_size: 1
per_device_eval_batch_size: 1
gradient_accumulation_steps: 128
logging_steps: 1
learning_rate: 0.0002
num_train_epochs: 2
lr_scheduler_type: cosine
warmup_steps: 16
optim: adamw_8bit
metric_for_best_model: eval_loss
load_best_model_at_end: true
save_total_limit: 2
seed: 42
remove_unused_columns: false
max_grad_norm: 1.0
weight_decay: 0.08
torch_compile: false
- Downloads last month
- 42
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for evilfreelancer/ruGPT3.5-13B-lora-function-call
Base model
ai-forever/ruGPT-3.5-13B