pravdin
/

Mistral-Nemo-Instruct-2407-Mistral-Nemo-Base-2407-linear-merge

@@ -10,11 +10,11 @@ tags:
 # Mistral-Nemo-Instruct-2407-Mistral-Nemo-Base-2407-linear-merge
-Mistral-Nemo-Instruct-2407-Mistral-Nemo-Base-2407-linear-merge is a sophisticated language model resulting from the strategic merging of two distinct models: [Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) and [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407). This merging process was executed using [mergekit](https://github.com/cg123/mergekit), a specialized tool designed for precise model blending, ensuring optimal performance and synergy between the merged architectures.
 ## 🧩 Merge Configuration
-The models were merged using a linear interpolation method, which allows for a balanced integration of the two models' capabilities. The configuration details are as follows:
 ```yaml
 models:
@@ -32,8 +32,8 @@ dtype: float16
 ## Model Features
-This merged model combines the instructive capabilities of Mistral-Nemo-Instruct-2407 with the foundational strengths of Mistral-Nemo-Base-2407. The result is a versatile model that excels in various text generation tasks, offering enhanced context understanding and nuanced text generation. By leveraging the strengths of both parent models, this linear merge provides improved performance across diverse NLP applications.
 ## Limitations
-While the Mistral-Nemo-Instruct-2407-Mistral-Nemo-Base-2407-linear-merge inherits many strengths from its parent models, it may also carry over certain limitations or biases present in those models. Users should be aware of potential biases in generated outputs and the need for careful evaluation in sensitive applications.

 # Mistral-Nemo-Instruct-2407-Mistral-Nemo-Base-2407-linear-merge
+Mistral-Nemo-Instruct-2407-Mistral-Nemo-Base-2407-linear-merge is a sophisticated language model created by merging two distinct models: [mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) and [mistralai/Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407). This merging process was executed using [mergekit](https://github.com/cg123/mergekit), a specialized tool designed for precise model blending, ensuring optimal performance and synergy between the merged architectures.
 ## 🧩 Merge Configuration
+The models were merged using a linear interpolation method, which allows for a balanced integration of the two models. The configuration details are as follows:
 ```yaml
 models:
 ## Model Features
+This merged model combines the instructive capabilities of Mistral-Nemo-Instruct-2407 with the foundational strengths of Mistral-Nemo-Base-2407. The result is a versatile model that excels in various text generation tasks, offering enhanced context understanding and nuanced text generation. By leveraging the strengths of both parent models, this linear merge aims to provide improved performance across diverse NLP applications.
 ## Limitations
+While the merged model benefits from the strengths of both parent models, it may also inherit certain limitations or biases present in them. Users should be aware that the performance can vary depending on the specific task and context, and it is advisable to evaluate the model's outputs critically.

config.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "_name_or_path": "mistralai/Mistral-Nemo-Base-2407",
   "architectures": [
     "MistralForCausalLM"
   ],

 {
+  "_name_or_path": "mistralai/Mistral-Nemo-Instruct-2407",
   "architectures": [
     "MistralForCausalLM"
   ],

tokenizer_config.json CHANGED Viewed

@@ -8005,8 +8005,13 @@
     }
   },
   "bos_token": "<s>",
   "clean_up_tokenization_spaces": false,
   "eos_token": "</s>",
   "model_max_length": 1000000000000000019884624838656,
   "tokenizer_class": "PreTrainedTokenizerFast",
   "unk_token": "<unk>"

     }
   },
   "bos_token": "<s>",
+  "chat_template": "{%- if messages[0][\"role\"] == \"system\" %}\n    {%- set system_message = messages[0][\"content\"] %}\n    {%- set loop_messages = messages[1:] %}\n{%- else %}\n    {%- set loop_messages = messages %}\n{%- endif %}\n{%- if not tools is defined %}\n    {%- set tools = none %}\n{%- endif %}\n{%- set user_messages = loop_messages | selectattr(\"role\", \"equalto\", \"user\") | list %}\n\n{#- This block checks for alternating user/assistant messages, skipping tool calling messages #}\n{%- set ns = namespace() %}\n{%- set ns.index = 0 %}\n{%- for message in loop_messages %}\n    {%- if not (message.role == \"tool\" or message.role == \"tool_results\" or (message.tool_calls is defined and message.tool_calls is not none)) %}\n        {%- if (message[\"role\"] == \"user\") != (ns.index % 2 == 0) %}\n            {{- raise_exception(\"After the optional system message, conversation roles must alternate user/assistant/user/assistant/...\") }}\n        {%- endif %}\n        {%- set ns.index = ns.index + 1 %}\n    {%- endif %}\n{%- endfor %}\n\n{{- bos_token }}\n{%- for message in loop_messages %}\n    {%- if message[\"role\"] == \"user\" %}\n        {%- if tools is not none and (message == user_messages[-1]) %}\n            {{- \"[AVAILABLE_TOOLS][\" }}\n            {%- for tool in tools %}\n                {%- set tool = tool.function %}\n                {{- '{\"type\": \"function\", \"function\": {' }}\n                {%- for key, val in tool.items() if key != \"return\" %}\n                    {%- if val is string %}\n                        {{- '\"' + key + '\": \"' + val + '\"' }}\n                    {%- else %}\n                        {{- '\"' + key + '\": ' + val|tojson }}\n                    {%- endif %}\n                    {%- if not loop.last %}\n                        {{- \", \" }}\n                    {%- endif %}\n                {%- endfor %}\n                {{- \"}}\" }}\n                {%- if not loop.last %}\n                    {{- \", \" }}\n                {%- else %}\n                    {{- \"]\" }}\n                {%- endif %}\n            {%- endfor %}\n            {{- \"[/AVAILABLE_TOOLS]\" }}\n            {%- endif %}\n        {%- if loop.last and system_message is defined %}\n            {{- \"[INST]\" + system_message + \"\\n\\n\" + message[\"content\"] + \"[/INST]\" }}\n        {%- else %}\n            {{- \"[INST]\" + message[\"content\"] + \"[/INST]\" }}\n        {%- endif %}\n    {%- elif (message.tool_calls is defined and message.tool_calls is not none) %}\n        {{- \"[TOOL_CALLS][\" }}\n        {%- for tool_call in message.tool_calls %}\n            {%- set out = tool_call.function|tojson %}\n            {{- out[:-1] }}\n            {%- if not tool_call.id is defined or tool_call.id|length != 9 %}\n                {{- raise_exception(\"Tool call IDs should be alphanumeric strings with length 9!\") }}\n            {%- endif %}\n            {{- ', \"id\": \"' + tool_call.id + '\"}' }}\n            {%- if not loop.last %}\n                {{- \", \" }}\n            {%- else %}\n                {{- \"]\" + eos_token }}\n            {%- endif %}\n        {%- endfor %}\n    {%- elif message[\"role\"] == \"assistant\" %}\n        {{- message[\"content\"] + eos_token}}\n    {%- elif message[\"role\"] == \"tool_results\" or message[\"role\"] == \"tool\" %}\n        {%- if message.content is defined and message.content.content is defined %}\n            {%- set content = message.content.content %}\n        {%- else %}\n            {%- set content = message.content %}\n        {%- endif %}\n        {{- '[TOOL_RESULTS]{\"content\": ' + content|string + \", \" }}\n        {%- if not message.tool_call_id is defined or message.tool_call_id|length != 9 %}\n            {{- raise_exception(\"Tool call IDs should be alphanumeric strings with length 9!\") }}\n        {%- endif %}\n        {{- '\"call_id\": \"' + message.tool_call_id + '\"}[/TOOL_RESULTS]' }}\n    {%- else %}\n        {{- raise_exception(\"Only user and assistant roles are supported, with the exception of an initial optional system message!\") }}\n    {%- endif %}\n{%- endfor %}\n",
   "clean_up_tokenization_spaces": false,
   "eos_token": "</s>",
+  "model_input_names": [
+    "input_ids",
+    "attention_mask"
+  ],
   "model_max_length": 1000000000000000019884624838656,
   "tokenizer_class": "PreTrainedTokenizerFast",
   "unk_token": "<unk>"