Upload folder using huggingface_hub

Browse files

Files changed (15) hide show

.gitattributes +2 -0
README.md +277 -0
chat_template.jinja +22 -0
config.json +73 -0
generation_config.json +13 -0
model-00001-of-00003.safetensors +3 -0
model-00002-of-00003.safetensors +3 -0
model-00003-of-00003.safetensors +3 -0
model.safetensors.index.json +452 -0
next_rosetta.png +3 -0
special_tokens_map.json +30 -0
tokenizer.json +3 -0
tokenizer.model +3 -0
tokenizer_config.json +0 -0
wmt24pp_12b.md +37 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+next_rosetta.png filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,277 @@

+---
+library_name: transformers
+tags:
+  - translation
+license: gemma
+language:
+  - ar
+  - bg
+  - zh
+  - cs
+  - da
+  - nl
+  - en
+  - fi
+  - fr
+  - de
+  - el
+  - gu
+  - he
+  - hi
+  - hu
+  - id
+  - it
+  - ja
+  - ko
+  - fa
+  - pl
+  - pt
+  - ro
+  - ru
+  - sk
+  - es
+  - sv
+  - tl
+  - th
+  - tr
+  - uk
+  - vi
+---
+# YanoljaNEXT-Rosetta-4B-2511
+<p style="text-align: center; margin: 0 auto 64px">
+  <img src="next_rosetta.png" style="width: 1096px">
+</p>
+This model is a fine-tuned version of [`google/gemma-3-4b-pt`](https://huggingface.co/google/gemma-3-4b-pt). As it is intended solely for text generation, we have extracted and utilized only the `Gemma3ForCausalLM` component from the original architecture.
+Unlike our previous EEVE models, this model does not feature an expanded tokenizer.
+- **Model Name:** `yanolja/YanoljaNEXT-Rosetta-4B-2511`
+- **Base Model:** `google/gemma-3-4b-pt`
+## Model Description
+This model is a 4-billion parameter, decoder-only language model built on the Gemma3 architecture and fine-tuned by Yanolja NEXT. It is specifically designed to translate structured data (JSON, YAML, XML formats) while preserving the original data structure.
+The model was trained on a multilingual dataset covering the following languages equally:
+- Arabic
+- Bulgarian
+- Chinese
+- Czech
+- Danish
+- Dutch
+- English
+- Finnish
+- French
+- German
+- Greek
+- Gujarati
+- Hebrew
+- Hindi
+- Hungarian
+- Indonesian
+- Italian
+- Japanese
+- Korean
+- Persian
+- Polish
+- Portuguese
+- Romanian
+- Russian
+- Slovak
+- Spanish
+- Swedish
+- Tagalog
+- Thai
+- Turkish
+- Ukrainian
+- Vietnamese
+While optimized for these languages, it may also perform effectively on other languages supported by the base Gemma3 model.
+## How to use
+You can use this model with the `transformers` library as follows:
+```python
+import json
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+model_id = "yanolja/YanoljaNEXT-Rosetta-4B-2511"
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    dtype=torch.bfloat16,
+    device_map="auto",
+    max_memory={0: "23GB"},
+)
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+target_language = "Korean"
+context = {
+  "context": "Simple introduction about a tech company.",
+  "tone": "Informative and helpful",
+  "glossary": {
+    "Yanolja NEXT": "야놀자넥스트",
+    "travel industry": "여행 산업",
+  },
+  "output format": "JSON",
+}
+system = [f"Translate the user's text to {target_language}."]
+for key, value in context.items():
+  key_pascal = key.capitalize()
+  if isinstance(value, dict):
+    system.append(f"{key_pascal}:")
+    for f, t in value.items():
+      system.append(f"- {f} -> {t}")
+  else:
+    system.append(f"{key_pascal}: {value}")
+system.append("Provide the final translation immediately without any other text.")
+source = {
+  "company_name": "Yanolja NEXT",
+  "description": "Yanolja NEXT is a company that provides cutting-edge "
+                 "technology for the global travel industry.",
+}
+messages = [
+    {"role": "system", "content": "\n".join(system)},
+    {"role": "user", "content": json.dumps(source, ensure_ascii=False)},
+]
+prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+print(prompt)
+# <bos><start_of_turn>instruction
+# Translate the user's text to Korean.
+# Context: Simple introduction about a tech company.
+# Tone: Informative and helpful
+# Glossary:
+# - Yanolja NEXT -> 야놀자넥스트
+# - travel industry -> 여행 산업
+# Output format: JSON
+# Provide the final translation immediately without any other text.<end_of_turn>
+# <start_of_turn>source
+# {"company_name": "Yanolja NEXT", "description": "Yanolja NEXT is a company that provides cutting-edge technology for the global travel industry."}<end_of_turn>
+# <start_of_turn>translation
+inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
+input_length = inputs["input_ids"].shape[1]
+with torch.inference_mode():
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=64,
+    )
+generated_tokens = outputs[0][input_length:]
+translation = tokenizer.decode(generated_tokens, skip_special_tokens=True)
+print(json.dumps(json.loads(translation), indent=2, ensure_ascii=False))
+# {
+#   "company_name": "야놀자넥스트",
+#   "description": "야놀자넥스트는 글로벌 여행 산업에 최첨단 기술을 제공하는 회사입니다."
+# }
+```
+The model outputs the final translation in the same structured format as the input (JSON, YAML, XML) when appropriate, or plain text for simple translations.
+## Training Procedure
+### Training Data
+The translation datasets were synthesized using fineweb corpora.
+- [FineWeb Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu)
+- [FineWeb2](https://huggingface.co/datasets/HuggingFaceFW/fineweb-2)
+The model was fine-tuned on synthetic multilingual translation data to optimize performance across the supported language pairs.
+## Performance
+### Translation Quality Benchmarks
+The following CHrF++ scores (WMT24++) demonstrate the model's competitive performance compared to other state-of-the-art translation models on English to Korean translation:
+| Model                              | CHrF++ Score (WMT24++) |
+|------------------------------------|--------------|
+| openai/gpt-4o                       | 36.08     |
+| **yanolja/YanoljaNEXT-Rosetta-4B-2511** | **35.64**    |
+| google/gemini-2.5-flash             | 35.25     |
+| **yanolja/YanoljaNEXT-Rosetta-4B-2510** | **35.09**    |
+| tencent/Hunyuan-MT-7B               | 34.76     |
+| yanolja/YanoljaNEXT-Rosetta-20B     | 33.87        |
+| AIDC-AI/Marco-MT-Algharb            | 33.40        |
+| openai/gpt-oss-120b                 | 31.51        |
+| **yanolja/YanoljaNEXT-Rosetta-4B** | **31.31**    |
+| ByteDance-Seed/Seed-X-PPO-7B        | 30.48        |
+| google/gemma-3-27b-it               |        30.05 |
+| google/gemma-3-12b-it               |        29.31 |
+| google/gemma-3-4b-it                |        27.53 |
+YanoljaNEXT-Rosetta-4B-2511 achieves competitive translation quality while maintaining the efficiency of a 4B parameter model.
+Scores for the other language pairs can be found in the [WMT24++ Evaluation Results](wmt24pp_12b.md).
+## Intended Uses & Limitations
+This model is intended for translating structured data (JSON, YAML, XML formats) while preserving the original structure. It is particularly well-suited for tasks such as localizing product catalogs, translating hotel reviews, or handling any other structured content that requires accurate translation.
+### Limitations
+The model is primarily optimized for processing structured data (JSON, YAML, XML).
+Its performance on unstructured text or other data formats may vary.
+In some cases, the model may produce invalid JSON, repetitive output, or inaccurate translations.
+### License
+This model is released under the Gemma license, inherited from its base model, [`google/gemma-3-4b-pt`](https://huggingface.co/google/gemma-3-4b-pt). Please consult the official [Gemma license terms](https://ai.google.dev/gemma/terms) for detailed usage guidelines.
+## Acknowledgments
+This work was supported by the Korea Creative Content Agency (KOCCA) grant, funded by the Ministry of Culture, Sports and Tourism (MCST) in 2025 (Project Name: _Cultivating Masters and Doctoral Experts to Lead Digital-Tech Tourism_, Project Number: RS-2024-00442006, Contribution Rate: 100%).
+## Citation
+If you use this model, please consider citing:
+```
+@misc{yanolja2025yanoljanextrosetta,
+  author = {Yanolja NEXT Co., Ltd.},
+  title = {YanoljaNEXT-Rosetta-4B-2511},
+  year = {2025},
+  publisher = {Hugging Face},
+  journal = {Hugging Face repository},
+  howpublished = {\\url{https://huggingface.co/yanolja/YanoljaNEXT-Rosetta-4B-2511}}
+}
+```
+## References
+This work utilizes several models and datasets. We would like to acknowledge the original authors for their valuable contributions to the field.
+```
+@misc{gemma3,
+  author = {Google},
+  title = {Gemma 3},
+  year = {2024},
+  publisher = {Google DeepMind},
+  howpublished = {\\url{https://deepmind.google/models/gemma/gemma-3/}}
+}
+@misc{penedo2025fineweb2pipelinescale,
+  title = {FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language},
+  author = {Guilherme Penedo and Hynek Kydlíček and Vinko Sabolčec and Bettina Messmer and Negar Foroutan and Amir Hossein Kargaran and Colin Raffel and Martin Jaggi and Leandro Von Werra and Thomas Wolf},
+  year = {2025},
+  eprint = {2506.20920},
+  archivePrefix = {arXiv},
+  primaryClass = {cs.CL},
+  url = {https://arxiv.org/abs/2506.20920},
+}
+@misc{lozhkov2024fineweb-edu,
+  author = {Lozhkov, Anton and Ben Allal, Loubna and von Werra, Leandro and Wolf, Thomas},
+  title = {FineWeb-Edu: the Finest Collection of Educational Content},
+  year = 2024,
+  url = {https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu},
+  doi = {10.57967/hf/2497},
+  publisher={Hugging Face}
+}
+```

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,22 @@

+{%- set system_msg = messages | selectattr('role', 'eq', 'system') | list | first | default(none) -%}
+{%- set last_user = messages | selectattr('role', 'eq', 'user') | list | last | default(none) -%}
+{%- set last_assistant = messages | selectattr('role', 'eq', 'assistant') | list | last | default(none) -%}
+{{- bos_token -}}
+{%- if system_msg -%}
+  <start_of_turn>instruction{{ '\n' }}
+  {{- system_msg['content'] | trim -}}<end_of_turn>{{ '\n' }}
+{%- endif -%}
+{%- if last_user -%}
+  <start_of_turn>source{{ '\n' }}
+  {{- last_user['content'] | trim -}}<end_of_turn>{{ '\n' }}
+{%- endif -%}
+{%- if add_generation_prompt -%}
+  <start_of_turn>translation{{ '\n' }}
+{%- elif last_assistant -%}
+  <start_of_turn>translation{{ '\n' }}
+  {{- last_assistant['content'] | trim -}}<end_of_turn>{{ '\n' }}
+{%- endif -%}

config.json ADDED Viewed

	@@ -0,0 +1,73 @@

+{
+  "_sliding_window_pattern": 6,
+  "architectures": [
+    "Gemma3ForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "attn_logit_softcapping": null,
+  "bos_token_id": 2,
+  "cache_implementation": "hybrid",
+  "dtype": "bfloat16",
+  "eos_token_id": 106,
+  "final_logit_softcapping": null,
+  "head_dim": 256,
+  "hidden_activation": "gelu_pytorch_tanh",
+  "hidden_size": 2560,
+  "initializer_range": 0.02,
+  "intermediate_size": 10240,
+  "layer_types": [
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention"
+  ],
+  "max_position_embeddings": 131072,
+  "model_type": "gemma3_text",
+  "num_attention_heads": 8,
+  "num_hidden_layers": 34,
+  "num_key_value_heads": 4,
+  "pad_token_id": 0,
+  "query_pre_attn_scalar": 256,
+  "rms_norm_eps": 1e-06,
+  "rope_local_base_freq": 10000.0,
+  "rope_scaling": {
+    "factor": 8.0,
+    "rope_type": "linear"
+  },
+  "rope_theta": 1000000.0,
+  "sliding_window": 1024,
+  "transformers_version": "4.56.1",
+  "use_cache": false,
+  "vocab_size": 262208
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+  "bos_token_id": 2,
+  "cache_implementation": "hybrid",
+  "do_sample": true,
+  "eos_token_id": [
+    1,
+    106
+  ],
+  "pad_token_id": 0,
+  "top_k": 64,
+  "top_p": 0.95,
+  "transformers_version": "4.56.1"
+}

model-00001-of-00003.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:22d0def2f3981d4ecf169f9fc0f1fd86915db7d65262e3c5a458dbf8ce334179
+size 1342505104

model-00002-of-00003.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7916de5608ed8ab0aca2c05328177dc8d37b77f94c48b43a645f8ad250e30d89
+size 4991821976

model-00003-of-00003.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e5be2b95a648f126239778e8496d743b0979096058a9c87192b41c071a987b58
+size 1426250824

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,452 @@

+{
+  "metadata": {
+    "total_parameters": 3880263168,
+    "total_size": 14177811456
+  },
+  "weight_map": {
+    "model.embed_tokens.weight": "model-00001-of-00003.safetensors",
+    "model.layers.0.input_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.0.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.0.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.0.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.0.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.0.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.0.pre_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.0.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.0.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.0.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.0.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.0.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.0.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.1.input_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.1.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.1.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.1.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.1.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.1.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.1.pre_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.1.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.1.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.1.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.1.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.1.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.1.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.10.input_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.10.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.10.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.10.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.10.pre_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.10.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.10.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.11.input_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.11.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.11.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.11.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.11.pre_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.11.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.11.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.12.input_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.12.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.12.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.12.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.12.pre_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.12.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.12.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.13.input_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.13.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.13.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.13.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.13.pre_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.13.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.13.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.14.input_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.14.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.14.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.14.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.14.pre_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.14.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.14.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.15.input_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.15.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.15.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.15.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.15.pre_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.15.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.15.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.16.input_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.16.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.16.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.16.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.16.pre_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.16.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.16.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.17.input_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.17.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.17.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.17.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.17.pre_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.17.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.17.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.18.input_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.18.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.18.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.18.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.18.pre_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.18.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.18.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.19.input_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.19.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.19.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.19.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.19.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.19.pre_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.19.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.19.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.19.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.19.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.19.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.2.input_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.2.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.2.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.2.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.2.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.2.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.2.pre_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.2.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.2.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.2.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.2.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.2.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.2.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.20.input_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.20.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.20.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.20.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.20.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.20.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.20.pre_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.20.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.20.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.20.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.20.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.20.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.21.input_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.21.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.21.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.21.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.21.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.21.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.21.pre_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.21.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.21.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.21.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.21.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.21.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.21.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.22.input_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.22.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.22.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.22.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.22.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.22.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.22.pre_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.22.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.22.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.22.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.22.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.22.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.22.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.23.input_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.23.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.23.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.23.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.23.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.23.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.23.pre_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.23.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.23.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.23.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.23.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.23.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.23.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.24.input_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.24.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.24.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.24.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.24.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.24.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.24.pre_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.24.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.24.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.24.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.24.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.24.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.24.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.25.input_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.25.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.25.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.25.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.25.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.25.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.25.pre_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.25.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.25.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.25.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.25.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.25.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.25.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.26.input_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.26.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.26.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.26.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.26.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.26.pre_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.26.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.26.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.26.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.26.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.26.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.26.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.27.input_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.27.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.27.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.27.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.27.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.27.pre_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.27.self_attn.k_norm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.27.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.27.self_attn.q_norm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.27.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.28.input_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.28.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.28.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.28.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.28.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.28.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.28.pre_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.28.self_attn.k_norm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.28.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.28.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.28.self_attn.q_norm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.28.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.28.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.29.input_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.29.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.29.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.29.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.29.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.29.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.29.pre_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.29.self_attn.k_norm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.29.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.29.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.29.self_attn.q_norm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.29.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.29.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.3.input_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.3.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.3.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.3.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.3.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.3.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.3.pre_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.3.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.3.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.3.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.3.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.3.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.3.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.30.input_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.30.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.30.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.30.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.30.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.30.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.30.pre_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.30.self_attn.k_norm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.30.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.30.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.30.self_attn.q_norm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.30.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.30.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.31.input_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.31.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.31.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.31.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.31.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.31.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.31.pre_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.31.self_attn.k_norm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.31.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.31.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.31.self_attn.q_norm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.31.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.31.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.32.input_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.32.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.32.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.32.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.32.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.32.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.32.pre_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.32.self_attn.k_norm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.32.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.32.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.32.self_attn.q_norm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.32.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.32.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.33.input_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.33.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.33.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.33.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.33.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.33.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.33.pre_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.33.self_attn.k_norm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.33.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.33.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.33.self_attn.q_norm.weight": "model-00003-of-00003.safetensors",
+    "model.layers.33.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.33.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
+    "model.layers.4.input_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.4.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.4.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.4.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.4.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.4.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.4.pre_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.4.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.4.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.4.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.4.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.4.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.4.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.5.input_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.5.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.5.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.5.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.5.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.5.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.5.pre_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.5.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.5.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.5.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.5.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.5.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.5.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.6.input_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.6.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.6.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.6.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.6.pre_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.6.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.6.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.6.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.6.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.7.input_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.7.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.7.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.7.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.7.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.7.pre_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.7.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.7.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.8.input_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.8.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.8.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.8.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.8.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.8.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.8.pre_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.8.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.8.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.8.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.8.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.9.input_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.9.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.9.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.9.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.9.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.9.pre_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.9.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.9.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.9.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
+    "model.layers.9.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
+    "model.layers.9.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
+    "model.norm.weight": "model-00003-of-00003.safetensors"
+  }
+}

next_rosetta.png ADDED Viewed

Git LFS Details

SHA256: dd3a969ec1e33079065088b9e10180c0f7c220e9444f71fc41f31080f9084962
Pointer size: 131 Bytes
Size of remote file: 630 kB

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<bos>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<end_of_turn>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a6b7c9043ba3b559295e6032728ca44ba21879713a32d4a35240794b2ed66d78
+size 33384556

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
+size 4689074

tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

wmt24pp_12b.md ADDED Viewed

	@@ -0,0 +1,37 @@

+WMT24++ ChrF++ Metrics - Updated with YanoljaNEXT-Rosetta-12B-2510
+| Model                   | Avg  | ar_EG | ar_SA | bg_BG | bn_IN | ca_ES | cs_CZ | da_DK | de_DE | el_GR | es_MX | et_EE | fa_IR | fi_FI | fil_PH | fr_CA | fr_FR | gu_IN | he_IL | hi_IN | hr_HR | hu_HU | id_ID | is_IS | it_IT | ja_JP | kn_IN |
+| ----------------------- | ---- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ------ | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
+| DeepL Translate         | 54.9 | 33.4  | 37.8  | 61.8  |       |       | 55.7  | 62.1  | 60.5  | 61.5  | 67.7  | 57.5  |       | 62.0  |        | 64.2  | 61.5  |       |       | 53.3  | 60.9  |       | 66.0  | 34.9  |       |       |       |
+| YanoljaNEXT-Rosetta-12B | 54.4 | 32.8  | 37.9  | 62.0  | 40.7  | 60.3  | 55.3  | 65.9  | 58.1  | 62.3  | 67.5  | 50.2  | 54.0  | 59.6  | 61.5   | 68.3  | 62.7  | 51.8  | 54.1  | 39.4  |       | 53.2  | 61.5  |       | 65.0  | 32.1  |       |
+| GPT-4o                  | 54.4 | 35.4  | 38.4  | 62.7  | 45.0  | 64.2  | 55.8  | 63.0  | 60.8  | 62.5  | 65.7  | 60.4  | 53.0  | 63.1  | 57.8   | 69.1  | 63.7  | 47.7  | 54.9  | 41.5  | 59.4  | 54.9  | 61.9  | 49.9  | 65.0  | 36.1  | 54.8  |
+| Google Translate        | 53.9 | 35.0  | 39.3  | 62.2  | 43.8  | 62.4  | 53.7  | 59.9  | 60.9  | 60.5  | 66.8  | 58.6  | 48.5  | 60.3  | 59.0   | 66.5  | 62.0  | 49.8  | 53.1  | 41.1  | 57.1  | 52.2  | 61.5  | 47.2  | 64.9  | 35.3  | 59.0  |
+| YanoljaNEXT-Rosetta-4B  | 53.6 | 32.4  | 37.3  | 61.2  |       |       | 54.3  | 65.1  | 56.9  | 61.6  | 66.4  |       | 53.3  | 58.5  | 61.2   | 66.3  | 61.4  | 51.2  | 53.2  | 39.1  |       | 52.1  | 60.9  |       | 63.7  | 31.4  |      |
+| Claude-3.5              | 52.7 | 38.9  | 36.5  | 59.4  | 42.6  | 63.8  | 53.8  | 60.7  | 58.7  | 60.8  | 64.3  | 57.4  | 49.3  | 60.4  | 55.2   | 65.9  | 62.4  | 47.5  | 52.6  | 40.9  | 57.5  | 52.7  | 60.0  | 48.8  | 63.4  | 32.8  | 53.9  |
+| Microsoft Translator    | 52.6 | 33.4  | 37.3  | 61.6  | 43.7  | 63.0  | 54.5  | 61.1  | 59.0  | 59.8  | 65.2  | 57.6  | 49.3  | 61.0  | 55.1   | 61.8  | 58.2  | 49.1  | 54.0  | 37.5  | 57.3  | 51.5  | 59.0  | 47.7  | 64.7  | 32.8  | 60.9  |
+| OpenAI o1               | 52.4 | 35.2  | 36.5  | 60.3  | 41.9  | 62.3  | 54.1  | 61.3  | 59.5  | 60.9  | 65.0  | 58.7  | 51.1  | 61.7  | 56.2   | 65.7  | 60.4  | 40.4  | 53.8  | 41.1  | 58.4  | 53.5  | 59.9  | 49.5  | 63.2  | 34.9  | 45.6  |
+| Unbabel Tower           | 52.0 |       |       |       |       |       | 53.3  |       | 59.4  |       | 65.3  |       |       |       |        | 66.0  | 61.8  |       |       | 40.3  |       |       |       | 47.9  | 64.8  | 33.1  |       |
+| Gemini-1.5-Pro          | 52.0 | 40.0  | 36.7  | 58.6  | 41.7  | 63.4  | 53.4  | 62.6  | 58.8  | 58.3  | 63.8  | 56.7  | 47.4  | 58.1  | 53.4   | 64.0  | 60.9  | 46.8  | 52.3  | 41.3  | 56.4  | 51.5  | 58.7  | 47.6  | 62.9  | 33.8  | 52.4  |
+| OpenAI o1-mini          | 50.9 | 36.2  | 36.9  | 58.1  | 41.5  | 61.2  | 51.5  | 59.4  | 58.9  | 56.5  | 64.6  | 54.3  | 50.7  | 58.4  | 55.7   | 65.5  | 61.2  | 43.2  | 49.3  | 39.0  | 55.1  | 49.9  | 59.4  | 41.8  | 63.0  | 33.8  | 49.9  |
+| Gemini-1.5-Flash        | 50.4 | 33.4  | 35.7  | 56.8  | 42.1  | 60.6  | 51.4  | 59.2  | 57.8  | 56.4  | 64.2  | 53.9  | 47.5  | 56.5  | 54.6   | 62.9  | 59.4  | 45.1  | 49.4  | 39.9  | 54.4  | 49.8  | 58.6  | 43.4  | 61.8  | 32.6  | 53.0  |
+| Yandex Translate        | 49.2 | 30.3  | 34.0  | 57.4  | 38.8  | 58.8  | 51.2  | 55.9  | 57.1  | 58.2  | 64.5  | 54.9  | 42.5  | 59.0  | 56.9   | 64.1  | 60.2  | 44.4  | 49.5  | 35.3  | 50.0  | 49.6  | 59.2  | 43.5  | 61.6  | 25.5  | 55.4  |
+| CommandR-plus           | 48.7 | 32.1  | 35.7  |       |       | 49.9  |       | 56.7  | 56.1  | 62.2  |       | 46.0  |       |       | 63.0   | 59.4  |       | 49.2  | 37.6  |       |       | 57.3  |       | 61.5  | 29.8  |       |       |
+| Aya23                   | 48.1 | 31.3  | 35.2  |       |       | 48.7  |       | 55.4  | 55.1  | 62.4  |       | 47.4  |       |       | 62.8   | 58.3  |       | 47.6  | 38.2  |       |       | 57.4  |       | 60.7  | 28.9  |       |       |
+| Model                   | ko_KR | lt_LT | lv_LV | ml_IN | mr_IN | nl_NL | no_NO | pa_IN | pl_PL | pt_BR | pt_PT | ro_RO | ru_RU | sk_SK | sl_SI | sr_RS | sv_SE | sw_KE | sw_TZ | ta_IN | te_IN | th_TH | tr_TR | uk_UA | ur_PK | vi_VN | zh_CN | zh_TW | zu_ZA |
+| ----------------------- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
+| DeepL Translate         | 37.0  | 53.0  | 55.2  |       |       | 60.6  | 67.8  |       | 50.4  | 64.7  | 62.2  | 61.1  | 48.4  | 52.2  | 56.4  |       | 65.6  |       |       |       |       | 60.3  | 61.5  |       |       | 37.4  | 18.3  |       |       |
+| YanoljaNEXT-Rosetta-12B | 37.1  |       | 51.4  |       |       | 60.1  | 65.2  |       | 49.1  | 63.4  | 57.1  | 63.2  | 50.1  | 51.2  |       |       | 64.7  |       |       |       |       | 44.6  | 57.7  | 56.6  |       | 57.0  | 35.5  |       |       |
+| GPT-4o                  | 36.4  | 52.5  | 56.8  | 47.7  | 44.9  | 61.3  | 69.0  | 48.6  | 51.0  | 65.7  | 61.6  | 62.8  | 50.3  | 53.0  | 57.0  | 54.3  | 66.3  | 54.7  | 58.6  | 47.3  | 47.0  | 50.5  | 59.3  | 54.5  | 50.6  | 54.4  | 41.0  | 33.6  | 56.1  |
+| Google Translate        | 36.5  | 50.6  | 56.1  | 52.9  | 47.8  | 60.5  | 66.7  | 56.0  | 50.9  | 65.7  | 55.7  | 60.6  | 48.9  | 50.7  | 57.7  | 50.6  | 63.4  | 51.6  | 56.3  | 50.1  | 49.1  | 50.4  | 58.9  | 54.1  | 51.8  | 54.2  | 39.9  | 37.4  | 59.5  |
+| YanoljaNEXT-Rosetta-4B  | 35.1  |       |       |       |       | 59.3  |       |       | 48.3  | 62.4  | 56.6  | 62.9  | 49.4  | 50.9  |       |       | 64.4  |       |       |       |       | 44.2  | 56.5  | 55.5  |       | 56.2  | 33.7  |       |       |
+| Claude-3.5              | 33.7  | 50.4  | 55.9  | 49.6  | 43.9  | 60.0  | 65.2  | 50.7  | 49.6  | 64.4  | 61.3  | 60.8  | 50.0  | 51.6  | 55.0  | 53.2  | 64.3  | 52.9  | 55.1  | 47.8  | 46.0  | 48.4  | 57.8  | 53.0  | 47.1  | 51.0  | 36.1  | 28.9  | 52.1  |
+| Microsoft Translator    | 35.9  | 49.7  | 54.1  | 53.0  | 45.8  | 59.8  | 68.8  | 53.6  | 51.4  | 64.0  | 56.0  | 61.3  | 47.8  | 51.2  | 57.2  | 49.7  | 64.4  | 48.4  | 53.1  | 50.1  | 48.1  | 47.9  | 58.8  | 53.2  | 51.4  | 50.5  | 34.8  | 21.2  | 57.5  |
+| OpenAI o1               | 33.6  | 51.5  | 55.4  | 43.9  | 44.3  | 60.8  | 67.1  | 44.2  | 49.7  | 64.2  | 60.6  | 61.5  | 48.8  | 51.9  | 55.7  | 53.6  | 64.3  | 53.1  | 56.3  | 45.2  | 43.4  | 49.5  | 56.8  | 54.0  | 48.4  | 52.2  | 37.6  | 32.1  | 53.1  |
+| Unbabel Tower           | 34.6  |       |       |       |       | 60.3  |       |       |       | 64.8  | 59.8  |       | 48.4  |       |       |       |       |       |       |       |       |       | 53.6  |       |       | 38.4  | 32.4  |       |       |
+| Gemini-1.5-Pro          | 33.3  | 50.7  | 54.1  | 48.8  | 43.0  | 60.1  | 64.0  | 48.8  | 48.6  | 63.3  | 60.5  | 59.2  | 48.4  | 51.6  | 54.1  | 53.1  | 61.7  | 51.5  | 54.7  | 42.4  | 45.0  | 46.8  | 57.7  | 53.0  | 45.6  | 49.7  | 43.0  | 29.2  | 54.2  |
+| OpenAI o1-mini          | 34.1  | 47.1  | 50.3  | 43.0  | 42.2  | 58.6  | 65.8  | 46.3  | 48.8  | 64.0  | 58.4  | 59.6  | 47.7  | 49.0  | 53.2  | 48.8  | 63.2  | 49.2  | 52.7  | 44.2  | 43.7  | 46.3  | 55.9  | 51.7  | 48.4  | 50.9  | 36.6  | 32.1  | 51.6  |
+| Gemini-1.5-Flash        | 32.2  | 48.9  | 52.7  | 45.6  | 42.2  | 58.0  | 63.2  | 46.2  | 47.5  | 62.5  | 58.8  | 59.1  | 47.0  | 49.7  | 51.5  | 50.0  | 60.7  | 50.2  | 53.3  | 43.8  | 44.6  | 45.9  | 55.7  | 51.7  | 45.8  | 49.7  | 38.2  | 29.3  | 49.7  |
+| Yandex Translate        | 26.9  | 47.7  | 51.2  | 44.7  | 42.7  | 56.6  | 63.7  | 46.7  | 47.1  |       | 56.2  | 58.4  | 48.4  | 48.4  | 54.3  | 46.3  | 60.9  | 47.4  | 52.3  | 49.1  | 42.3  | 35.7  | 56.6  | 54.1  | 48.5  | 47.3  | 29.3  | 18.7  | 57.0  |
+| CommandR-plus           | 30.3  |       |       |       |       | 57.3  |       |       | 46.7  | 62.5  | 59.0  | 57.6  | 45.7  |       |       |       |       |       |       |       |       |       | 51.7  | 49.9  |       | 48.5  | 33.5  | 27.3  |       |
+| Aya23                   | 29.2  |       |       |       |       | 56.1  |       |       | 45.4  | 62.3  | 56.8  | 56.9  | 45.0  |       |       |       |       |       |       |       |       |       | 51.7  | 48.8  |       | 48.9  | 31.7  | 28.3  |       |