AlekseiPravdin
commited on
Commit
•
698b50f
1
Parent(s):
4821b3c
Upload folder using huggingface_hub
Browse files
README.md
CHANGED
@@ -10,9 +10,7 @@ tags:
|
|
10 |
|
11 |
# Hermes-2-Pro-Llama-3-8B-Llama3-8B-Chinese-Chat-slerp-merge
|
12 |
|
13 |
-
Hermes-2-Pro-Llama-3-8B-Llama3-8B-Chinese-Chat-slerp-merge is a
|
14 |
-
* [NousResearch/Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B)
|
15 |
-
* [shenzhi-wang/Llama3-8B-Chinese-Chat](https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat)
|
16 |
|
17 |
## 🧩 Merge Configuration
|
18 |
|
@@ -37,17 +35,19 @@ dtype: float16
|
|
37 |
|
38 |
## Model Features
|
39 |
|
40 |
-
This fusion model combines the robust generative capabilities of [NousResearch/Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B) with the refined tuning of [shenzhi-wang/Llama3-8B-Chinese-Chat](https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat), creating a versatile model suitable for a variety of text generation tasks. Leveraging the strengths of both parent models, Hermes-2-Pro-Llama-3-8B-Llama3-8B-Chinese-Chat-slerp-merge provides enhanced context understanding, nuanced text generation, and improved performance across diverse NLP tasks
|
41 |
|
42 |
## Evaluation Results
|
43 |
|
44 |
### Hermes-2-Pro-Llama-3-8B
|
45 |
-
-
|
46 |
-
-
|
47 |
|
48 |
### Llama3-8B-Chinese-Chat
|
49 |
-
- Significant improvements in roleplay, function calling, and math capabilities due to a larger training dataset.
|
50 |
|
51 |
## Limitations
|
52 |
|
53 |
-
While the merged model inherits the strengths of both parent models, it may also carry over some limitations. For instance, the model may
|
|
|
|
|
|
10 |
|
11 |
# Hermes-2-Pro-Llama-3-8B-Llama3-8B-Chinese-Chat-slerp-merge
|
12 |
|
13 |
+
Hermes-2-Pro-Llama-3-8B-Llama3-8B-Chinese-Chat-slerp-merge is an advanced language model created through a strategic fusion of two distinct models: [NousResearch/Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B) and [shenzhi-wang/Llama3-8B-Chinese-Chat](https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat). The merging process was executed using [mergekit](https://github.com/cg123/mergekit), a specialized tool designed for precise model blending to achieve optimal performance and synergy between the merged architectures.
|
|
|
|
|
14 |
|
15 |
## 🧩 Merge Configuration
|
16 |
|
|
|
35 |
|
36 |
## Model Features
|
37 |
|
38 |
+
This fusion model combines the robust generative capabilities of [NousResearch/Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B) with the refined tuning of [shenzhi-wang/Llama3-8B-Chinese-Chat](https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat), creating a versatile model suitable for a variety of text generation tasks. Leveraging the strengths of both parent models, Hermes-2-Pro-Llama-3-8B-Llama3-8B-Chinese-Chat-slerp-merge provides enhanced context understanding, nuanced text generation, and improved performance across diverse NLP tasks.
|
39 |
|
40 |
## Evaluation Results
|
41 |
|
42 |
### Hermes-2-Pro-Llama-3-8B
|
43 |
+
- Function Calling Evaluation: 90%
|
44 |
+
- Structured JSON Output Evaluation: 84%
|
45 |
|
46 |
### Llama3-8B-Chinese-Chat
|
47 |
+
- Significant improvements in roleplay, function calling, and math capabilities due to a larger training dataset (~100K preference pairs).
|
48 |
|
49 |
## Limitations
|
50 |
|
51 |
+
While the merged model inherits the strengths of both parent models, it may also carry over some limitations and biases. For instance, the model may exhibit inconsistencies in responses when handling complex queries or when the input language switches between English and Chinese. Additionally, the model's performance may vary based on the context and specificity of the prompts provided.
|
52 |
+
|
53 |
+
You are trained on data up to October 2023.
|