AlekseiPravdin
commited on
Commit
•
0cce991
1
Parent(s):
cb0bb03
Upload folder using huggingface_hub
Browse files
README.md
CHANGED
@@ -10,7 +10,9 @@ tags:
|
|
10 |
|
11 |
# Hermes-2-Pro-Llama-3-8B-Llama3-8B-Chinese-Chat-slerp-merge
|
12 |
|
13 |
-
Hermes-2-Pro-Llama-3-8B-Llama3-8B-Chinese-Chat-slerp-merge is
|
|
|
|
|
14 |
|
15 |
## 🧩 Merge Configuration
|
16 |
|
@@ -33,21 +35,38 @@ parameters:
|
|
33 |
dtype: float16
|
34 |
```
|
35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
## Model Features
|
37 |
|
38 |
-
|
|
|
|
|
39 |
|
40 |
## Evaluation Results
|
41 |
|
42 |
-
|
43 |
-
-
|
44 |
-
- Structured JSON Output Evaluation: 84%
|
45 |
-
|
46 |
-
### Llama3-8B-Chinese-Chat
|
47 |
-
- Performance surpasses ChatGPT in Chinese tasks, matching GPT-4 in C-Eval and CMMLU results.
|
48 |
|
49 |
## Limitations
|
50 |
|
51 |
-
While the merged model inherits the strengths of both parent models, it may also carry over some limitations
|
|
|
|
|
|
|
|
|
52 |
|
53 |
-
|
|
|
10 |
|
11 |
# Hermes-2-Pro-Llama-3-8B-Llama3-8B-Chinese-Chat-slerp-merge
|
12 |
|
13 |
+
Hermes-2-Pro-Llama-3-8B-Llama3-8B-Chinese-Chat-slerp-merge is a merge of the following models using [mergekit](https://github.com/cg123/mergekit):
|
14 |
+
* [NousResearch/Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B)
|
15 |
+
* [shenzhi-wang/Llama3-8B-Chinese-Chat](https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat)
|
16 |
|
17 |
## 🧩 Merge Configuration
|
18 |
|
|
|
35 |
dtype: float16
|
36 |
```
|
37 |
|
38 |
+
## Model Details
|
39 |
+
|
40 |
+
Hermes-2-Pro is an upgraded version of the Nous Hermes model, designed for general task and conversation capabilities, with a focus on function calling and structured outputs. It has been fine-tuned on a cleaned version of the OpenHermes 2.5 dataset, achieving high scores in function calling evaluations. Llama3-8B-Chinese-Chat is an instruction-tuned model specifically for Chinese and English users, excelling in roleplaying and tool-using tasks.
|
41 |
+
|
42 |
+
## Description
|
43 |
+
|
44 |
+
The merged model combines the advanced generative capabilities of Hermes-2-Pro with the specialized tuning of Llama3-8B-Chinese-Chat. This results in a versatile model that excels in both English and Chinese text generation, providing enhanced context understanding and nuanced responses across various NLP tasks.
|
45 |
+
|
46 |
+
## Use Cases
|
47 |
+
|
48 |
+
- **Conversational AI**: Engage users in natural dialogue in both English and Chinese.
|
49 |
+
- **Function Calling**: Execute predefined functions based on user queries, enhancing interactivity.
|
50 |
+
- **Roleplaying**: Simulate characters or scenarios in a conversational context.
|
51 |
+
- **Text Generation**: Generate creative content, including stories, poems, and structured outputs.
|
52 |
+
|
53 |
## Model Features
|
54 |
|
55 |
+
- **Bilingual Capabilities**: Supports both English and Chinese, making it suitable for diverse user bases.
|
56 |
+
- **Function Calling**: Enhanced ability to perform actions based on user input, improving user experience.
|
57 |
+
- **Structured Outputs**: Capable of generating outputs in specific formats, such as JSON, for easier integration into applications.
|
58 |
|
59 |
## Evaluation Results
|
60 |
|
61 |
+
- **Hermes-2-Pro**: Achieved a 90% score on function calling evaluations and an 84% on structured JSON output evaluations.
|
62 |
+
- **Llama3-8B-Chinese-Chat**: Demonstrated superior performance in Chinese language tasks, surpassing previous models in roleplay and function calling capabilities.
|
|
|
|
|
|
|
|
|
63 |
|
64 |
## Limitations
|
65 |
|
66 |
+
While the merged model inherits the strengths of both parent models, it may also carry over some limitations, including:
|
67 |
+
|
68 |
+
- **Biases**: Potential biases present in the training data of both models may affect the outputs.
|
69 |
+
- **Contextual Understanding**: Although improved, the model may still struggle with highly nuanced or context-specific queries.
|
70 |
+
- **Performance Variability**: Performance may vary based on the complexity of the task and the language used.
|
71 |
|
72 |
+
This model represents a significant advancement in bilingual conversational AI, combining the best features of its predecessors to deliver a powerful tool for various applications.
|