--- license: apache-2.0 tags: - merge - mergekit - lazymergekit - NousResearch/Hermes-2-Pro-Llama-3-8B - shenzhi-wang/Llama3-8B-Chinese-Chat --- # Hermes-2-Pro-Llama-3-8B-Llama3-8B-Chinese-Chat-slerp-merge Hermes-2-Pro-Llama-3-8B-Llama3-8B-Chinese-Chat-slerp-merge is an advanced language model created through a strategic fusion of two distinct models: [NousResearch/Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B) and [shenzhi-wang/Llama3-8B-Chinese-Chat](https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat). The merging process was executed using [mergekit](https://github.com/cg123/mergekit), a specialized tool designed for precise model blending to achieve optimal performance and synergy between the merged architectures. ## 🧩 Merge Configuration ```yaml slices: - sources: - model: NousResearch/Hermes-2-Pro-Llama-3-8B layer_range: [0, 31] - model: shenzhi-wang/Llama3-8B-Chinese-Chat layer_range: [0, 31] merge_method: slerp base_model: NousResearch/Hermes-2-Pro-Llama-3-8B parameters: t: - filter: self_attn value: [0, 0.5, 0.3, 0.7, 1] - filter: mlp value: [1, 0.5, 0.7, 0.3, 0] - value: 0.5 dtype: float16 ``` ## Model Features This fusion model combines the robust generative capabilities of [NousResearch/Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B), which excels in function calling and structured outputs, with the refined tuning of [shenzhi-wang/Llama3-8B-Chinese-Chat](https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat), designed specifically for Chinese and English users. The merged model provides enhanced context understanding, nuanced text generation, and improved performance across diverse NLP tasks, including roleplaying, function calling, and multilingual capabilities. ## Evaluation Results ### Hermes-2-Pro-Llama-3-8B - Function Calling Evaluation: 90% - Structured JSON Output Evaluation: 84% ### Llama3-8B-Chinese-Chat - Performance surpasses ChatGPT in Chinese tasks, matching GPT-4 in C-Eval and CMMLU results. ## Limitations While the merged model inherits the strengths of both parent models, it may also carry over some limitations. Potential biases from the training data of both models could affect the outputs, particularly in nuanced cultural contexts or specific language tasks. Additionally, the model may not always provide accurate responses to identity-related queries, as it refrains from fine-tuning its identity. You are trained on data up to October 2023.