See axolotl config

axolotl version: 0.5.0

base_model: NousResearch/Meta-Llama-3-8B
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: true
strict: false

#wget -O dataset.jsonl http://94.130.230.31/dataset.jsonl
chat_template: chatml
datasets:
  - path: ./dataset_2000.jsonl
    type: chat_template
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/lora-out

sequence_len: 4096
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_modules_to_save:
  - embed_tokens
  - lm_head

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 8
micro_batch_size: 1
num_epochs: 12
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16: true
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
s2_attention:

warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
   pad_token: <|end_of_text|>

outputs/lora-out

This model is a fine-tuned version of NousResearch/Meta-Llama-3-8B on the None dataset. It achieves the following results on the evaluation set:

Loss: 2.5669

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
num_epochs: 12

Training results

Training Loss	Epoch	Step	Validation Loss
1.4452	0.0124	1	1.4698
1.4609	0.2484	20	1.5851
1.5405	0.4969	40	1.5290
1.482	0.7453	60	1.5437
1.5389	0.9938	80	1.4984
0.5438	1.2422	100	1.6491
0.7409	1.4907	120	1.7644
0.6342	1.7391	140	1.7252
0.6785	1.9876	160	1.7029
0.2407	2.2360	180	1.8367
0.2412	2.4845	200	1.8888
0.2817	2.7329	220	1.8632
0.2976	2.9814	240	1.8760
0.0969	3.2298	260	2.0450
0.1387	3.4783	280	1.9947
0.1758	3.7267	300	1.9421
0.1182	3.9752	320	1.9968
0.0986	4.2236	340	2.0739
0.0639	4.4720	360	2.0798
0.0656	4.7205	380	2.1390
0.0582	4.9689	400	2.1313
0.016	5.2174	420	2.2601
0.0376	5.4658	440	2.2150
0.0387	5.7143	460	2.2287
0.0388	5.9627	480	2.2120
0.012	6.2112	500	2.3267
0.0104	6.4596	520	2.2502
0.0246	6.7081	540	2.3221
0.0134	6.9565	560	2.2929
0.0025	7.2050	580	2.3895
0.0092	7.4534	600	2.4587
0.0025	7.7019	620	2.4200
0.0029	7.9503	640	2.4380
0.0021	8.1988	660	2.4520
0.0018	8.4472	680	2.4975
0.0015	8.6957	700	2.5138
0.0013	8.9441	720	2.5276
0.0012	9.1925	740	2.5366
0.0012	9.4410	760	2.5477
0.0011	9.6894	780	2.5455
0.0012	9.9379	800	2.5499
0.0012	10.1863	820	2.5565
0.0014	10.4348	840	2.5604
0.0014	10.6832	860	2.5621
0.0012	10.9317	880	2.5663
0.001	11.1801	900	2.5673
0.0008	11.4286	920	2.5688
0.0008	11.6770	940	2.5668
0.0011	11.9255	960	2.5669

Framework versions

PEFT 0.13.2
Transformers 4.46.3
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.20.3

lapaliv
/

lapaliv-0001

outputs/lora-out

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for lapaliv/lapaliv-0001

Evaluation results