ruslandev commited on
Commit
9aa3ff1
1 Parent(s): cc835e5

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -0
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - text-generation-inference
7
+ - transformers
8
+ - unsloth
9
+ - llama
10
+ - trl
11
+ base_model: unsloth/llama-3-70b-bnb-4bit
12
+ datasets:
13
+ - lightblue/tagengo-gpt4
14
+ ---
15
+
16
+ # Uploaded model
17
+
18
+ - **Developed by:** ruslandev
19
+ - **License:** apache-2.0
20
+ - **Finetuned from model :** unsloth/llama-3-70b-bnb-4bit
21
+
22
+ This model is finetuned on the Tagengo dataset.
23
+ Please note - this model has been created for educational purposes and it needs further training/fine tuning.
24
+
25
+ # How to use
26
+
27
+ The easiest way to use this model on your own computer is to use the GGUF version of this model ([ruslandev/llama-3-70b-tagengo-GGUF](https://huggingface.co/ruslandev/llama-3-70b-tagengo-GGUF)) using a program such as [llama.cpp](https://github.com/ggerganov/llama.cpp).
28
+ If you want to use this model directly with the Huggingface Transformers stack, I recommend using my framework [gptchain](https://github.com/RuslanPeresy/gptchain).
29
+
30
+ ```
31
+ git clone https://github.com/RuslanPeresy/gptchain.git
32
+ cd gptchain
33
+ pip install -r requirements-train.txt
34
+ python gptchain.py chat -m ruslandev/llama-3-70b-tagengo \
35
+ --chatml true \
36
+ -q '[{"from": "human", "value": "Из чего состоит нейронная сеть?"}]'
37
+ ```
38
+
39
+ # Training
40
+ [gptchain](https://github.com/RuslanPeresy/gptchain) framework has been used for training.
41
+
42
+ ```
43
+ python gptchain.py train -m unsloth/llama-3-70b-bnb-4bit \
44
+ -dn tagengo_gpt4 \
45
+ -sp checkpoints/llama-3-70b-tagengo \
46
+ -hf llama-3-70b-tagengo \
47
+ --max-steps 2400
48
+ ```
49
+
50
+ # Training hyperparameters
51
+
52
+ - learning_rate: 2e-4
53
+ - seed: 3407
54
+ - gradient_accumulation_steps: 4
55
+ - per_device_train_batch_size: 2
56
+ - optimizer: adamw_8bit
57
+ - lr_scheduler_type: linear
58
+ - warmup_steps: 5
59
+ - max_steps: 2400
60
+ - weight_decay: 0.01
61
+
62
+ # Training results
63
+ [wandb report](https://api.wandb.ai/links/ruslandev/rilj60ra)
64
+
65
+ 2400 steps took 7 hours on a single H100
66
+
67
+ [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)