AlekseiPravdin
commited on
Commit
•
4821b3c
1
Parent(s):
b449d2b
Upload folder using huggingface_hub
Browse files
README.md
CHANGED
@@ -35,44 +35,19 @@ parameters:
|
|
35 |
dtype: float16
|
36 |
```
|
37 |
|
38 |
-
## Model Details
|
39 |
-
|
40 |
-
Hermes-2-Pro-Llama-3-8B is an upgraded version of the original Hermes model, designed for enhanced conversational capabilities and function calling. It excels in generating structured outputs and has been fine-tuned on a diverse dataset, including the OpenHermes 2.5 dataset. The model is particularly adept at handling complex queries and providing coherent responses.
|
41 |
-
|
42 |
-
Llama3-8B-Chinese-Chat, on the other hand, is specifically fine-tuned for Chinese and English users, focusing on roleplaying and tool-using capabilities. It has been trained on a significantly larger dataset, improving its performance in various tasks, including math and function calling.
|
43 |
-
|
44 |
-
## Description
|
45 |
-
|
46 |
-
The merged model combines the strengths of both parent models, providing a robust solution for multilingual text generation and understanding. It leverages the advanced generative capabilities of Hermes-2-Pro while incorporating the specialized training of Llama3-8B-Chinese-Chat, making it suitable for a wide range of applications, from casual conversation to structured data generation.
|
47 |
-
|
48 |
-
## Use Cases
|
49 |
-
|
50 |
-
- **Conversational AI**: Engage users in natural dialogues across multiple languages.
|
51 |
-
- **Function Calling**: Execute predefined functions based on user queries, enhancing interactivity.
|
52 |
-
- **Structured Outputs**: Generate JSON or other structured formats for data processing tasks.
|
53 |
-
- **Roleplaying**: Simulate characters or scenarios in both English and Chinese.
|
54 |
-
|
55 |
## Model Features
|
56 |
|
57 |
-
-
|
58 |
-
- **Enhanced Context Understanding**: Improved ability to maintain context over longer conversations.
|
59 |
-
- **Function Calling**: Supports advanced function calling capabilities for dynamic interactions.
|
60 |
-
- **Structured Output Generation**: Can produce outputs in structured formats like JSON.
|
61 |
|
62 |
## Evaluation Results
|
63 |
|
64 |
### Hermes-2-Pro-Llama-3-8B
|
65 |
- Scored 90% on function calling evaluation.
|
66 |
-
-
|
67 |
|
68 |
### Llama3-8B-Chinese-Chat
|
69 |
-
-
|
70 |
|
71 |
## Limitations
|
72 |
|
73 |
-
While the merged model
|
74 |
-
- Potential biases present in the training data.
|
75 |
-
- Challenges in handling highly specialized or niche topics.
|
76 |
-
- Variability in performance based on the complexity of user queries.
|
77 |
-
|
78 |
-
Users are encouraged to provide feedback and report any issues encountered during usage to facilitate ongoing improvements.
|
|
|
35 |
dtype: float16
|
36 |
```
|
37 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
## Model Features
|
39 |
|
40 |
+
This fusion model combines the robust generative capabilities of [NousResearch/Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B) with the refined tuning of [shenzhi-wang/Llama3-8B-Chinese-Chat](https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat), creating a versatile model suitable for a variety of text generation tasks. Leveraging the strengths of both parent models, Hermes-2-Pro-Llama-3-8B-Llama3-8B-Chinese-Chat-slerp-merge provides enhanced context understanding, nuanced text generation, and improved performance across diverse NLP tasks, including function calling and structured outputs.
|
|
|
|
|
|
|
41 |
|
42 |
## Evaluation Results
|
43 |
|
44 |
### Hermes-2-Pro-Llama-3-8B
|
45 |
- Scored 90% on function calling evaluation.
|
46 |
+
- Scored 84% on structured JSON output evaluation.
|
47 |
|
48 |
### Llama3-8B-Chinese-Chat
|
49 |
+
- Significant improvements in roleplay, function calling, and math capabilities due to a larger training dataset.
|
50 |
|
51 |
## Limitations
|
52 |
|
53 |
+
While the merged model inherits the strengths of both parent models, it may also carry over some limitations. For instance, the model may still exhibit biases present in the training data of both parent models, and its performance may vary based on the complexity of the input queries. Additionally, the model's identity may not be finely tuned, leading to potentially inconsistent responses regarding its capabilities or origins. Users should be aware of these factors when deploying the model in real-world applications.
|
|
|
|
|
|
|
|
|
|