license: apache-2.0
tags:
- merge
- mergekit
- lazymergekit
- NousResearch/Hermes-2-Pro-Llama-3-8B
- shenzhi-wang/Llama3-8B-Chinese-Chat
Hermes-2-Pro-Llama-3-8B-Llama3-8B-Chinese-Chat-slerp-merge
Hermes-2-Pro-Llama-3-8B-Llama3-8B-Chinese-Chat-slerp-merge is a merge of the following models using mergekit:
🧩 Merge Configuration
slices:
- sources:
- model: NousResearch/Hermes-2-Pro-Llama-3-8B
layer_range: [0, 31]
- model: shenzhi-wang/Llama3-8B-Chinese-Chat
layer_range: [0, 31]
merge_method: slerp
base_model: NousResearch/Hermes-2-Pro-Llama-3-8B
parameters:
t:
- filter: self_attn
value: [0, 0.5, 0.3, 0.7, 1]
- filter: mlp
value: [1, 0.5, 0.7, 0.3, 0]
- value: 0.5
dtype: float16
Model Details
Hermes-2-Pro is an upgraded version of the Nous Hermes model, designed for general task and conversation capabilities, with a focus on function calling and structured outputs. It has been fine-tuned on a cleaned version of the OpenHermes 2.5 dataset, achieving high scores in function calling evaluations. Llama3-8B-Chinese-Chat is an instruction-tuned model specifically for Chinese and English users, excelling in roleplaying and tool-using tasks.
Description
The merged model combines the advanced generative capabilities of Hermes-2-Pro with the specialized tuning of Llama3-8B-Chinese-Chat. This results in a versatile model that excels in both English and Chinese text generation, providing enhanced context understanding and nuanced responses across various NLP tasks.
Use Cases
- Conversational AI: Engage users in natural dialogue in both English and Chinese.
- Function Calling: Execute predefined functions based on user queries, enhancing interactivity.
- Roleplaying: Simulate characters or scenarios in a conversational context.
- Text Generation: Generate creative content, including stories, poems, and structured outputs.
Model Features
- Bilingual Capabilities: Supports both English and Chinese, making it suitable for diverse user bases.
- Function Calling: Enhanced ability to perform actions based on user input, improving user experience.
- Structured Outputs: Capable of generating outputs in specific formats, such as JSON, for easier integration into applications.
Evaluation Results
- Hermes-2-Pro: Achieved a 90% score on function calling evaluations and an 84% on structured JSON output evaluations.
- Llama3-8B-Chinese-Chat: Demonstrated superior performance in Chinese language tasks, surpassing previous models in roleplay and function calling capabilities.
Limitations
While the merged model inherits the strengths of both parent models, it may also carry over some limitations, including:
- Biases: Potential biases present in the training data of both models may affect the outputs.
- Contextual Understanding: Although improved, the model may still struggle with highly nuanced or context-specific queries.
- Performance Variability: Performance may vary based on the complexity of the task and the language used.
This model represents a significant advancement in bilingual conversational AI, combining the best features of its predecessors to deliver a powerful tool for various applications.