AlekseiPravdin commited on
Commit
698b50f
1 Parent(s): 4821b3c

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -10,9 +10,7 @@ tags:
10
 
11
  # Hermes-2-Pro-Llama-3-8B-Llama3-8B-Chinese-Chat-slerp-merge
12
 
13
- Hermes-2-Pro-Llama-3-8B-Llama3-8B-Chinese-Chat-slerp-merge is a merge of the following models using [mergekit](https://github.com/cg123/mergekit):
14
- * [NousResearch/Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B)
15
- * [shenzhi-wang/Llama3-8B-Chinese-Chat](https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat)
16
 
17
  ## 🧩 Merge Configuration
18
 
@@ -37,17 +35,19 @@ dtype: float16
37
 
38
  ## Model Features
39
 
40
- This fusion model combines the robust generative capabilities of [NousResearch/Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B) with the refined tuning of [shenzhi-wang/Llama3-8B-Chinese-Chat](https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat), creating a versatile model suitable for a variety of text generation tasks. Leveraging the strengths of both parent models, Hermes-2-Pro-Llama-3-8B-Llama3-8B-Chinese-Chat-slerp-merge provides enhanced context understanding, nuanced text generation, and improved performance across diverse NLP tasks, including function calling and structured outputs.
41
 
42
  ## Evaluation Results
43
 
44
  ### Hermes-2-Pro-Llama-3-8B
45
- - Scored 90% on function calling evaluation.
46
- - Scored 84% on structured JSON output evaluation.
47
 
48
  ### Llama3-8B-Chinese-Chat
49
- - Significant improvements in roleplay, function calling, and math capabilities due to a larger training dataset.
50
 
51
  ## Limitations
52
 
53
- While the merged model inherits the strengths of both parent models, it may also carry over some limitations. For instance, the model may still exhibit biases present in the training data of both parent models, and its performance may vary based on the complexity of the input queries. Additionally, the model's identity may not be finely tuned, leading to potentially inconsistent responses regarding its capabilities or origins. Users should be aware of these factors when deploying the model in real-world applications.
 
 
 
10
 
11
  # Hermes-2-Pro-Llama-3-8B-Llama3-8B-Chinese-Chat-slerp-merge
12
 
13
+ Hermes-2-Pro-Llama-3-8B-Llama3-8B-Chinese-Chat-slerp-merge is an advanced language model created through a strategic fusion of two distinct models: [NousResearch/Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B) and [shenzhi-wang/Llama3-8B-Chinese-Chat](https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat). The merging process was executed using [mergekit](https://github.com/cg123/mergekit), a specialized tool designed for precise model blending to achieve optimal performance and synergy between the merged architectures.
 
 
14
 
15
  ## 🧩 Merge Configuration
16
 
 
35
 
36
  ## Model Features
37
 
38
+ This fusion model combines the robust generative capabilities of [NousResearch/Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B) with the refined tuning of [shenzhi-wang/Llama3-8B-Chinese-Chat](https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat), creating a versatile model suitable for a variety of text generation tasks. Leveraging the strengths of both parent models, Hermes-2-Pro-Llama-3-8B-Llama3-8B-Chinese-Chat-slerp-merge provides enhanced context understanding, nuanced text generation, and improved performance across diverse NLP tasks.
39
 
40
  ## Evaluation Results
41
 
42
  ### Hermes-2-Pro-Llama-3-8B
43
+ - Function Calling Evaluation: 90%
44
+ - Structured JSON Output Evaluation: 84%
45
 
46
  ### Llama3-8B-Chinese-Chat
47
+ - Significant improvements in roleplay, function calling, and math capabilities due to a larger training dataset (~100K preference pairs).
48
 
49
  ## Limitations
50
 
51
+ While the merged model inherits the strengths of both parent models, it may also carry over some limitations and biases. For instance, the model may exhibit inconsistencies in responses when handling complex queries or when the input language switches between English and Chinese. Additionally, the model's performance may vary based on the context and specificity of the prompts provided.
52
+
53
+ You are trained on data up to October 2023.