Awan LLM
commited on
Commit
•
f261431
1
Parent(s):
680d906
Update README.md
Browse files
README.md
CHANGED
@@ -2,7 +2,7 @@
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
Based on Meta-Llama-3-8b-Instruct, and is governed by Meta Llama 3 License agreement:
|
5 |
-
https://huggingface.co/
|
6 |
|
7 |
|
8 |
We don't know how good this model is exactly in benchmarks since we have not benched this yet, but we think real prompts and usage is more telling anyways.
|
@@ -19,19 +19,20 @@ We are happy for anyone to try it out and give some feedback.
|
|
19 |
You can also try this model on our API at https://www.awanllm.com/
|
20 |
|
21 |
|
22 |
-
|
|
|
|
|
|
|
23 |
|
24 |
-
Trained using Cognitive Computations Eric Hartford's https://huggingface.co/datasets/cognitivecomputations/dolphin dataset as we've found great results from their dolphin models in previous Llama models.
|
25 |
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
The goal for this model is to have the model less-censored and great at general tasks like the previous dolphin models by Eric Hartford.
|
30 |
-
We started training this BEFORE they launched their own full weight trained Llama-3-8B-Dolphin-2.9 with their own curated datasets and the newer "Dolphin 2.9" dataset.
|
31 |
https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b
|
32 |
|
33 |
|
34 |
-
The difference is that we train this using Meta's new Llama 3 instruct format and not the regular ChatML format that Dolphin models are usually trained on.
|
|
|
|
|
35 |
Instruct format:
|
36 |
```
|
37 |
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
|
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
Based on Meta-Llama-3-8b-Instruct, and is governed by Meta Llama 3 License agreement:
|
5 |
+
https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
|
6 |
|
7 |
|
8 |
We don't know how good this model is exactly in benchmarks since we have not benched this yet, but we think real prompts and usage is more telling anyways.
|
|
|
19 |
You can also try this model on our API at https://www.awanllm.com/
|
20 |
|
21 |
|
22 |
+
Training:
|
23 |
+
- 2048 sequence length, while the base model is 8192 sequence length. From testing it still performs the same 8192 context just fine.
|
24 |
+
- Trained on a modified and improved version of Cognitive Computations Eric Hartford's Dolphin dataset. https://huggingface.co/datasets/cognitivecomputations/dolphin
|
25 |
+
- Training duration is around 2 days on 2x RTX3090 on our own machine, using 4-bit loading and Qlora 64-rank 128-alpha resulting in ~2% trainable weights.
|
26 |
|
|
|
27 |
|
28 |
+
The goal for this model is to have the model less-censored and great at general tasks like the previous dolphin based models by Eric Hartford.
|
29 |
+
We started training this BEFORE they launched their own full weight trained Llama-3-8B-Dolphin-2.9 with their own curated datasets and the newer "Dolphin 2.9" dataset, but we think this model is still a unique take on Llama 3 8B Instruct and the dolphin dataset.
|
|
|
|
|
|
|
30 |
https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b
|
31 |
|
32 |
|
33 |
+
The difference with their dolphin 2.9 model is that we train this using Meta's new Llama 3 instruct format and not the regular ChatML format that Dolphin models are usually trained on.
|
34 |
+
This is because we think that it performed better using the format it was originally trained on.
|
35 |
+
|
36 |
Instruct format:
|
37 |
```
|
38 |
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
|