pamparamm commited on
Commit
7ef8dca
0 Parent(s):
.gitattributes ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - ru
5
+ tags:
6
+ - generated_from_trainer
7
+ base_model: WlappaAI/Mistral-7B-wikipedia_ru_pruned-0.1_merged
8
+ model-index:
9
+ - name: dracor-ru-small-lora_merged
10
+ results: []
11
+ ---
12
+
13
+
14
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
15
+ <details><summary>See axolotl config</summary>
16
+
17
+ axolotl version: `0.4.0`
18
+ ```yaml
19
+ base_model: WlappaAI/Mistral-7B-wikipedia_ru_pruned-0.1_merged
20
+ model_type: MistralForCausalLM
21
+ tokenizer_type: LlamaTokenizer
22
+ is_mistral_derived_model: true
23
+
24
+ load_in_8bit: true
25
+ load_in_4bit: false
26
+ strict: false
27
+
28
+ datasets:
29
+ - path: ./datasets/ru-dracor
30
+ type: completion
31
+ field: text
32
+ dataset_prepared_path: last_run_prepared
33
+ val_set_size: 0.05
34
+ output_dir: ./models/output/dracor_ru_lora
35
+
36
+ adapter: lora
37
+ lora_model_dir:
38
+
39
+ sequence_len: 1024
40
+ sample_packing: true
41
+ pad_to_sequence_len: true
42
+
43
+ lora_r: 32
44
+ lora_alpha: 16
45
+ lora_dropout: 0.05
46
+ lora_target_linear: true
47
+ lora_fan_in_fan_out:
48
+ lora_target_modules:
49
+ - gate_proj
50
+ - down_proj
51
+ - up_proj
52
+ - q_proj
53
+ - v_proj
54
+ - k_proj
55
+ - o_proj
56
+
57
+ wandb_project:
58
+ wandb_entity:
59
+ wandb_watch:
60
+ wandb_name:
61
+ wandb_log_model:
62
+
63
+ gradient_accumulation_steps: 1
64
+ micro_batch_size: 6
65
+ num_epochs: 1
66
+ optimizer: adamw_torch
67
+ lr_scheduler: cosine
68
+ learning_rate: 0.0002
69
+
70
+ train_on_inputs: false
71
+ group_by_length: false
72
+ bf16: auto
73
+ fp16:
74
+ tf32: false
75
+
76
+ gradient_checkpointing: true
77
+ early_stopping_patience:
78
+ resume_from_checkpoint:
79
+ local_rank:
80
+ logging_steps:
81
+ xformers_attention:
82
+ flash_attention: true
83
+
84
+ loss_watchdog_threshold: 5.0
85
+ loss_watchdog_patience: 3
86
+
87
+ warmup_steps: 10
88
+ evals_per_epoch: 1
89
+ eval_table_size:
90
+ eval_max_new_tokens: 128
91
+ saves_per_epoch: 1
92
+ debug:
93
+ deepspeed:
94
+ weight_decay: 0.0
95
+ fsdp:
96
+ fsdp_config:
97
+ special_tokens:
98
+
99
+ ```
100
+
101
+ </details><br>
102
+
103
+ # dracor-ru-small-lora_merged
104
+
105
+ This model is a Q8_0 GGUF merge of [WlappaAI/dracor-ru-small-lora](https://huggingface.co/WlappaAI/dracor-ru-small-lora) together with [WlappaAI/Mistral-7B-wikipedia_ru_pruned-0.1_merged](https://huggingface.co/WlappaAI/Mistral-7B-wikipedia_ru_pruned-0.1_merged). It's trained on Russian DraCor dataset.
106
+ It achieves the following results on the evaluation set:
107
+ - Loss: 1.1876
108
+
109
+ ## Training procedure
110
+
111
+ ### Training hyperparameters
112
+
113
+ The following hyperparameters were used during training:
114
+ - learning_rate: 0.0002
115
+ - train_batch_size: 6
116
+ - eval_batch_size: 6
117
+ - seed: 42
118
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
119
+ - lr_scheduler_type: cosine
120
+ - lr_scheduler_warmup_steps: 10
121
+ - num_epochs: 1
122
+
123
+ ### Training results
124
+
125
+ | Training Loss | Epoch | Step | Validation Loss |
126
+ |:-------------:|:-----:|:----:|:---------------:|
127
+ | 1.7921 | 1.0 | 1056 | 1.6606 |
128
+
129
+
130
+ ### Framework versions
131
+
132
+ - PEFT 0.10.0
133
+ - Transformers 4.40.0.dev0
134
+ - Pytorch 2.2.2+cu121
135
+ - Datasets 2.18.0
136
+ - Tokenizers 0.15.0
137
+ - GGUF 0.9.0
config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "model_type": "mistral"
3
+ }
dracor-ru-split-small-lora_merged.Q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a330a42e5d55f1e85a248bb3936e576a8b71336fd84ca2c2e1ac4d83eebeddcc
3
+ size 7695857344