pom commited on Nov 5, 2023

Commit

33e5a3a

•

1 Parent(s): 4f64cdc

update XVERSE-13B-Chat model

Browse files

Files changed (27) hide show

MODEL_LICENSE.pdf +0 -0
README.md +40 -98
config.json +2 -2
configuration_xverse.py +2 -0
modeling_xverse.py +25 -14
pytorch_model-00001-of-00015.bin → pytorch_model-00001-of-00010.bin +2 -2
pytorch_model-00002-of-00015.bin → pytorch_model-00002-of-00010.bin +2 -2
pytorch_model-00003-of-00015.bin → pytorch_model-00003-of-00010.bin +2 -2
pytorch_model-00004-of-00015.bin → pytorch_model-00004-of-00010.bin +2 -2
pytorch_model-00005-of-00010.bin +3 -0
pytorch_model-00005-of-00015.bin +0 -3
pytorch_model-00006-of-00010.bin +3 -0
pytorch_model-00006-of-00015.bin +0 -3
pytorch_model-00007-of-00010.bin +3 -0
pytorch_model-00007-of-00015.bin +0 -3
pytorch_model-00008-of-00010.bin +3 -0
pytorch_model-00008-of-00015.bin +0 -3
pytorch_model-00014-of-00015.bin → pytorch_model-00009-of-00010.bin +1 -1
pytorch_model-00009-of-00015.bin +0 -3
pytorch_model-00010-of-00010.bin +3 -0
pytorch_model-00010-of-00015.bin +0 -3
pytorch_model-00011-of-00015.bin +0 -3
pytorch_model-00012-of-00015.bin +0 -3
pytorch_model-00013-of-00015.bin +0 -3
pytorch_model-00015-of-00015.bin +0 -3
pytorch_model.bin.index.json +404 -404
tokenizer.json +269 -13

MODEL_LICENSE.pdf CHANGED Viewed

Binary files a/MODEL_LICENSE.pdf and b/MODEL_LICENSE.pdf differ

README.md CHANGED Viewed

@@ -14,8 +14,8 @@ inference: false
 **XVERSE-13B** 是由深圳元象科技自主研发的支持多语言的大语言模型（Large Language Model），主要特点如下：
 - **模型结构**：XVERSE-13B 使用主流 Decoder-only 的标准 Transformer 网络结构，支持 8K 的上下文长度（Context Length），为同尺寸模型中最长，能满足更长的多轮对话、知识问答与摘要等需求，模型应用场景更广泛。
-- **训练数据**：构建了 1.4 万亿 token 的高质量、多样化的数据对模型进行充分训练，包含中、英、俄、西等 40 多种语言，通过精细化设置不同类型数据的采样比例，使得中英两种语言表现优异，也能兼顾其他语言效果。
-- **分词**：基于 BPE（Byte-Pair Encoding）算法，使用上百 GB 语料训练了一个词表大小为 100,278 的分词器，能够同时支持多语言，而无需额外扩展词表。
 - **训练框架**：自主研发多项关键技术，包括高效算子、显存优化、并行调度策略、数据-计算-通信重叠、平台和框架协同等，让训练效率更高，模型稳定性强，在千卡集群上的峰值算力利用率可达到 58.5%，位居业界前列。
 ## Model Introduction
@@ -25,113 +25,55 @@ inference: false
 **XVERSE-13B** is a multilingual large language model, independently developed by Shenzhen Yuanxiang Technology. Its key features are as follows:
 - **Model Structure**: XVERSE-13B uses the mainstream Decoder-only Transformer network structure, supports 8k context length, the longest one among models of the same size, which can meet the need of longer multi-round dialogues, knowledge question-answering, and summarization. This makes the model more versatile in application scenarios.
-- **Training Data**: The model has been thoroughly trained on a diversified and high-quality dataset consisting of 1.4 trillion of tokens, including more than 40 languages such as Chinese, English, Russian, and Spanish. The sampling ratio of different types of data is finely set, which makes the performance of Chinese and English excellent, and also takes into account the effect of other languages.
-- **Tokenization**: Based on the BPE (Byte-Pair Encoding) algorithm, a tokenizer with a vocabulary size of 100,278 has been trained using hundreds of gigabytes of language data. This tokenizer is capable of supporting multilingual without the need for additional vocabulary expansion.
 - **Training Framework**: Several key technologies have also been independently developed, including efficient operators, memory optimization, parallel scheduling strategies, overlap of data-computation-communication, and synergy between platforms and frameworks. These advancements enhance training efficiency and model stability. With these technologies, the peak computational power utilization rate on a thousand-card cluster can reach 58.5%, ranking at the forefront of the industry.
 ## 评测结果
-为验证模型的各项能力，我们选取了多个学科综合能力评测集，包括 [MMLU](https://arxiv.org/abs/2009.03300)（英文）、 [C-Eval](https://cevalbenchmark.com/)（中文）、[AGIEval](https://arxiv.org/abs/2304.06364)（中英） 、[GAOKAO-Bench](https://github.com/OpenLMLab/GAOKAO-Bench)（中英）、[GAOKAO-English](https://github.com/ExpressAI/AI-Gaokao)（英文），评测结果如下：
-|        模型         |       类型       |       MMLU       |      C-Eval      | AGIEval<sup>1</sup> | GAOKAO-Bench<sup>1</sup> | GAOKAO-English<sup>1</sup> |
-| :------------------------: | :--------------: | :--------------: | :--------------: | :-----------------: | :----------------------: | :------------------------: |
-|        Baichuan-13B       |      底座       | 51.6<sup>2</sup> | 53.6<sup>3</sup> |        40.5         |           45.9           |            56.9            |
-|     Baichuan-13B-Chat     |     对话        | 52.1<sup>2</sup> | 51.5<sup>2</sup> |        34.6         |           46.7           |            63.8            |
-|    Chinese-Alpaca-2-13B   |     对话        |       53.2       |       41.3       |        36.6         |           38.4           |            65.1            |
-|        Llama-1-13B        |     底座        | 46.9<sup>4</sup> |       28.8       |        27.3         |           26.4           |            38.1            |
-|        Llama-2-13B        |     底座        | 54.8<sup>4</sup> |       35.6       |        33.4         |           35.4           |            60.6            |
-|  moss-moon-003-base (16B) |     底座        |       24.7       | 33.1<sup>3</sup> |        26.8         |           28.5           |            34.7            |
-|  moss-moon-003-sft (16B)  |     对话    |       25.5       |       33.6       |        27.6         |           28.8           |            29.2            |
-|       OpenLLaMA-13B       |     底座    |       42.4       |       24.7       |        24.0         |           25.6           |            33.3            |
-|          OPT-13B          |     底座    |       25.2       |       25.0       |        24.2         |           24.4           |            31.1            |
-|         Pythia-12B        |     底座    |       25.1       |       26.2       |        25.3         |           25.3           |            26.8            |
-|      Vicuna-13B-v1.5      |     对话    |       53.5       |       27.9       |        29.7         |           31.6           |            52.9            |
-| Ziya-LLaMA-13B-Pretrain-v1|     底座    |       43.9       |       30.2       |        27.2         |           26.4           |            37.6            |
-|    Ziya-LLaMA-13B-v1.1    |     对话    |       50.6       |       29.3       |        23.6         |           26.7           |            27.3            |
-|       **XVERSE-13B**      |     底座    |     **55.1**     |     **54.7**     |      **41.4**       |         **53.9**         |          **66.5**          |
-|    **XVERSE-13B-Chat**    |     对话    |     **60.2**     |     **53.1**     |      **48.3**       |         **50.7**         |          **80.6**          |
 > <sup>1：只针对其中的单项选择题进行测试，即排除了填空题、开放性问题和多项选择题</sup>
-> <sup>2：来源于 [Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B) 的汇报结果</sup>
-> <sup>3：来源于 [C-Eval](https://cevalbenchmark.com/) 的汇报结果</sup>
-> <sup>4：来源于[Llama 2 论文](https://arxiv.org/abs/2307.09288)的汇报结果</sup>
->
-> 对于 MMLU ，我们采用作者提供的[评测工具](https://github.com/hendrycks/test)，C-Eval、AGIEval、GAOKAO-Bench、GAOKAO-English 与 MMLU 的评测方式相同，且统一采用 **5-shot** 构造测试样本。
-## Model Evaluation
-In order to validate the various abilities of the model, we have chosen several comprehensive capability benchmarks across multiple disciplines, including [MMLU](https://arxiv.org/abs/2009.03300) (English), [C-Eval](https://cevalbenchmark.com/) (Chinese), [AGIEval](https://arxiv.org/abs/2304.06364) (Chinese and English), [GAOKAO-Bench](https://github.com/OpenLMLab/GAOKAO-Bench) (Chinese and English), [GAOKAO-English](https://github.com/ExpressAI/AI-Gaokao) (English), the evaluation results are as follows:
-|        Models         |       Type       |       MMLU       |      C-Eval      | AGIEval<sup>1</sup> | GAOKAO-Bench<sup>1</sup> | GAOKAO-English<sup>1</sup> |
-| :------------------------: | :--------------: | :--------------: | :--------------: | :-----------------: | :----------------------: | :------------------------: |
-|        Baichuan-13B       |      pretrained       | 51.6<sup>2</sup> | 53.6<sup>3</sup> |        40.5         |           45.9           |            56.9            |
-|     Baichuan-13B-Chat     |     fine-tuned        | 52.1<sup>2</sup> | 51.5<sup>2</sup> |        34.6         |           46.7           |            63.8            |
-|    Chinese-Alpaca-2-13B   |     fine-tuned        |       53.2       |       41.3       |        36.6         |           38.4           |            65.1            |
-|        Llama-1-13B        |     pretrained        | 46.9<sup>4</sup> |       28.8       |        27.3         |           26.4           |            38.1            |
-|        Llama-2-13B        |     pretrained        | 54.8<sup>4</sup> |       35.6       |        33.4         |           35.4           |            60.6            |
-|  moss-moon-003-base (16B) |     pretrained        |       24.7       | 33.1<sup>3</sup> |        26.8         |           28.5           |            34.7            |
-|  moss-moon-003-sft (16B)  |     fine-tuned    |       25.5       |       33.6       |        27.6         |           28.8           |            29.2            |
-|       OpenLLaMA-13B       |     pretrained    |       42.4       |       24.7       |        24.0         |           25.6           |            33.3            |
-|          OPT-13B          |     pretrained    |       25.2       |       25.0       |        24.2         |           24.4           |            31.1            |
-|         Pythia-12B        |     pretrained    |       25.1       |       26.2       |        25.3         |           25.3           |            26.8            |
-|      Vicuna-13B-v1.5      |     fine-tuned    |       53.5       |       27.9       |        29.7         |           31.6           |            52.9            |
-| Ziya-LLaMA-13B-Pretrain-v1|     pretrained    |       43.9       |       30.2       |        27.2         |           26.4           |            37.6            |
-|    Ziya-LLaMA-13B-v1.1    |     fine-tuned    |       50.6       |       29.3       |        23.6         |           26.7           |            27.3            |
-|       **XVERSE-13B**      |     pretrained    |     **55.1**     |     **54.7**     |      **41.4**       |         **53.9**         |          **66.5**          |
-|    **XVERSE-13B-Chat**    |     fine-tuned    |     **60.2**     |     **53.1**     |      **48.3**       |         **50.7**         |          **80.6**          |
 > <sup>1: Tests are conducted only on single-answer multiple-choice questions, thus excluding fill-in-the-blanks, open-ended questions, and multiple-answer multiple-choice questions.</sup>
-> <sup>2: Reporting results from [Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B).</sup>
-> <sup>3: Reporting results from [C-Eval](https://cevalbenchmark.com/).</sup>
-> <sup>4: Reporting results from [Llama 2](https://arxiv.org/abs/2307.09288).</sup>
->
-> For MMLU, we adopt the [evaluation tools](https://github.com/hendrycks/test) provided by the authors, C-Eval, AGIEval, GAOKAO-Bench, GAOKAO-English are the same as MMLU, and uniformly use **5-shot** to construct the test samples.
-### MMLU 各类别指标
-MMLU Category Results
-|         Models          |         Type          | Average  |   STEM   | Social Science | Humanities |  Others  |
-| :------------------------: | :------------------------: | :------: | :------: | :------------: | :--------: | :------: |
-|        Baichuan-13B        |   pretrained   |   51.6   |   41.6   |      60.9      |    47.4    |   58.5   |
-|     Baichuan-13B-Chat      |   fine-tuned   |   52.1   |   40.9   |      60.9      |    48.8    |   59.0   |
-|    Chinese-Alpaca-2-13B    |   fine-tuned   |   53.2   |   41.8   |      61.2      |    51.3    |   59.2   |
-|        Llama-1-13B         |   pretrained   |   46.9   |   35.8   |      53.8      |    45.0    |   53.3   |
-|        Llama-2-13B         |   pretrained   |   54.8   |   44.1   |      62.6      |    52.8    |   61.1   |
-|  moss-moon-003-base (16B)  |   pretrained   |   24.7   |   23.0   |      24.0      |    25.2    |   26.3   |
-|  moss-moon-003-sft (16B)   |   fine-tuned   |   25.5   |   25.9   |      23.8      |    27.1    |   24.4   |
-|       OpenLLaMA-13B        |   pretrained   |   42.4   |   34.7   |      48.6      |    40.0    |   47.1   |
-|          OPT-13B           |   pretrained   |   25.2   |   23.9   |      24.1      |    25.9    |   26.3   |
-|         Pythia-12B         |   pretrained   |   25.1   |   24.8   |      23.0      |    26.1    |   26.0   |
-|      Vicuna-13B-v1.5       |   fine-tuned   |   53.5   |   42.3   |      61.3      |    50.3    |   60.9   |
-| Ziya-LLaMA-13B-Pretrain-v1 |   pretrained   |   43.9   |   36.3   |      48.8      |    41.1    |   50.3   |
-|    Ziya-LLaMA-13B-v1.1     |   fine-tuned   |   50.6   |   40.7   |      57.8      |    48.1    |   56.7   |
-|       **XVERSE-13B**       |   pretrained   | **55.1** | **44.5** |    **64.4**    |  **50.5**  | **62.9** |
-|    **XVERSE-13B-Chat**     |   fine-tuned   | **60.2** | **48.1** |    **67.7**    |  **56.4**  | **68.0** |
-### C-Eval 各类别指标
-C-Eval Category Results
-|         Models          |         Type          | Average  |   STEM   | Social Science | Humanities |  Others  |
-| :------------------------: | :------------------------: | :------: | :------: | :------------: | :--------: | :------: |
-|        Baichuan-13B        |   pretrained  |   53.6   |   47.0   |      66.8      |    57.3    |   49.8   |
-|     Baichuan-13B-Chat      |   fine-tuned  |   51.5   |   43.7   |      64.6      |    56.2    |   49.2   |
-|    Chinese-Alpaca-2-13B    |   fine-tuned  |   41.3   |   37.8   |      51.1      |    42.4    |   37.8   |
-|        Llama-1-13B         |   pretrained  |   28.8   |   27.5   |      33.9      |    27.7    |   27.7   |
-|        Llama-2-13B         |   pretrained  |   35.6   |   34.5   |      39.8      |    36.2    |   33.2   |
-|  moss-moon-003-base (16B)  |   pretrained  |   33.1   |   31.6   |      37.0      |    33.4    |   32.1   |
-|  moss-moon-003-sft (16B)   |   fine-tuned  |   33.6   |   31.4   |      38.6      |    33.8    |   32.9   |
-|       OpenLLaMA-13B        |   pretrained  |   24.7   |   25.5   |      23.5      |    24.2    |   24.7   |
-|          OPT-13B           |   pretrained  |   25.0   |   24.4   |      24.6      |    25.9    |   25.4   |
-|         Pythia-12B         |   pretrained  |   26.2   |   26.8   |      25.1      |    26.7    |   25.4   |
-|      Vicuna-13B-v1.5       |   fine-tuned  |   27.9   |   25.4   |      33.2      |    29.3    |   26.2   |
-| Ziya-LLaMA-13B-Pretrain-v1 |   pretrained  |   30.2   |   27.8   |      34.3      |    32.0    |   29.0   |
-|    Ziya-LLaMA-13B-v1.1     |   fine-tuned  |   29.3   |   27.5   |      32.8      |    29.7    |   29.0   |
-|       **XVERSE-13B**       |   pretrained  | **54.7** | **45.6** |    **66.2**    |  **58.3**  | **56.9** |
-|    **XVERSE-13B-Chat**     |   fine-tuned  | **53.1** | **44.5** |    **65.3**    |  **56.5**  | **54.3** |
 ### Loading with Transformers

 **XVERSE-13B** 是由深圳元象科技自主研发的支持多语言的大语言模型（Large Language Model），主要特点如下：
 - **模型结构**：XVERSE-13B 使用主流 Decoder-only 的标准 Transformer 网络结构，支持 8K 的上下文长度（Context Length），为同尺寸模型中最长，能满足更长的多轮对话、知识问答与摘要等需求，模型应用场景更广泛。
+- **训练数据**：构建了 3.2 万亿 token 的高质量、多样化的数据对模型进行充分训练，包含中、英、俄、西等 40 多种语言，通过精细化设置不同类型数据的采样比例，使得中英两种语言表现优异，也能兼顾其他语言效果。
+- **分词**：基于 BPE（Byte-Pair Encoding）算法，使用上百 GB 语料训练了一个词表大小为 100,534 的分词器，能够同时支持多语言，而无需额外扩展词表。
 - **训练框架**：自主研发多项关键技术，包括高效算子、显存优化、并行调度策略、数据-计算-通信重叠、平台和框架协同等，让训练效率更高，模型稳定性强，在千卡集群上的峰值算力利用率可达到 58.5%，位居业界前列。
 ## Model Introduction
 **XVERSE-13B** is a multilingual large language model, independently developed by Shenzhen Yuanxiang Technology. Its key features are as follows:
 - **Model Structure**: XVERSE-13B uses the mainstream Decoder-only Transformer network structure, supports 8k context length, the longest one among models of the same size, which can meet the need of longer multi-round dialogues, knowledge question-answering, and summarization. This makes the model more versatile in application scenarios.
+- **Training Data**: The model has been thoroughly trained on a diversified and high-quality dataset consisting of 3.2 trillion of tokens, including more than 40 languages such as Chinese, English, Russian, and Spanish. The sampling ratio of different types of data is finely set, which makes the performance of Chinese and English excellent, and also takes into account the effect of other languages.
+- **Tokenization**: Based on the BPE (Byte-Pair Encoding) algorithm, a tokenizer with a vocabulary size of 100,534 has been trained using hundreds of gigabytes of language data. This tokenizer is capable of supporting multilingual without the need for additional vocabulary expansion.
 - **Training Framework**: Several key technologies have also been independently developed, including efficient operators, memory optimization, parallel scheduling strategies, overlap of data-computation-communication, and synergy between platforms and frameworks. These advancements enhance training efficiency and model stability. With these technologies, the peak computational power utilization rate on a thousand-card cluster can reach 58.5%, ranking at the forefront of the industry.
 ## 评测结果
+为了综合评估模型的性能，我们在一系列标准数据集上进行了全面测试，包括C-Eval、CMMLU、Gaokao-Bench、MMLU、GAOKAO-English、AGIEval、RACE-M、CommonSenseQA、PIQA、GSM8K和HumanEval。这些评估覆盖了模型在多个领域的能力，具体包括中文问答、英文问答、语言理解、常识问答、逻辑推理、数学问题解答以及编程能力。评估结果如下：
+|  能力维度  |           数据集           |        | XVERSE-13B-2 | XVERSE-13B | Baichuan2-13B | Llama1-13B | Llama2-13B |
+| :--------: | :------------------------: | :----: | :----------: | :--------: | :-----------: | :--------: | :--------: |
+|  中文问答  |           C-Eval           | 5-shot |     63.5     |    54.7    |     58.1      |    28.8    |    35.6    |
+|            |           CMMLU            | 5-shot |     66.2     |    59.1    |     62.0      |    31.5    |    38.4    |
+|            |  Gaokao-Bench<sup>1</sup>  | 5-shot |     67.5     |    53.9    |     54.3      |    26.4    |    35.4    |
+|  英文问答  |            MMLU            | 5-shot |     61.2     |    55.1    |     59.2      |    46.9    |    54.8    |
+|            | GAOKAO-English<sup>1</sup> | 5-shot |     73.7     |    66.5    |     67.7      |    38.1    |    60.6    |
+| 中英文问答 |    AGIEval<sup>1</sup>     | 5-shot |     54.5     |    41.4    |     48.2      |    27.3    |    33.4    |
+|  语言理解  |           RACE-M           | 0-shot |     84.6     |    74.2    |     68.9      |    61.6    |    63.0    |
+|  常识问答  |       CommonSenseQA        | 7-shot |     74.0     |    69.5    |     65.6      |    62.0    |    67.3    |
+|    推理    |            PIQA            | 0-shot |     80.8     |    79.0    |     78.5      |    80.1    |    80.5    |
+|    数学    |           GSM8K            | 4-shot |     54.9     |    18.4    |     52.7      |    17.8    |    28.7    |
+|    代码    |         HumanEval          | 0-shot |     39.6     |    15.9    |     17.1      |    15.8    |    18.3    |
 > <sup>1：只针对其中的单项选择题进行测试，即排除了填空题、开放性问题和多项选择题</sup>
+对于上述所有比较模型，我们优先汇报其官方公布的结果。在缺少官方结果的情况下，我们采用了 [OpenCompass 榜单](https://opencompass.org.cn/leaderboard-llm)的报告结果。其他结果则来自于我们自行执行的评估流程所获得的数据。
+对于 MMLU ，我们采用作者提供的[评测工具](https://github.com/hendrycks/test)，C-Eval、AGIEval、GAOKAO-Bench、GAOKAO-English 与 MMLU 的评测方式相同，其余评测数据集使用 [OpenCompass 评估框架](https://github.com/open-compass/OpenCompass/)进行评估。
+## Model Evaluation
+To comprehensively assess the performance of the model, we conducted extensive testing across a range of standard datasets, including C-Eval, CMMLU, Gaokao-Bench, MMLU, GAOKAO-English, AGIEval, RACE-M, CommonSenseQA, PIQA, GSM8K and HumanEval. These evaluations spanned multiple capabilities of the model, specifically including Chinese question answering, English question answering, language comprehension, common sense questioning, logical reasoning, mathematical problem-solving, and coding ability. The results of the evaluations are as follows:
+|  Capability Dimension  |          Dataset           |        | XVERSE-13B-2 | XVERSE-13B | Baichuan2-13B | Llama1-13B | Llama2-13B |
+| :--------------------: | :------------------------: | :----: | :----------: | :--------: | :-----------: | :--------: | :--------: |
+|       Chinese QA       |           C-Eval           | 5-shot |     63.5     |    54.7    |     58.1      |    28.8    |    35.6    |
+|                        |           CMMLU            | 5-shot |     66.2     |    59.1    |     62.0      |    31.5    |    38.4    |
+|                        |  Gaokao-Bench<sup>1</sup>  | 5-shot |     67.5     |    53.9    |     54.3      |    26.4    |    35.4    |
+|       English QA       |            MMLU            | 5-shot |     61.2     |    55.1    |     59.2      |    46.9    |    54.8    |
+|                        | GAOKAO-English<sup>1</sup> | 5-shot |     73.7     |    66.5    |     67.7      |    38.1    |    60.6    |
+|  Chinese & English QA  |    AGIEval<sup>1</sup>     | 5-shot |     54.5     |    41.4    |     48.2      |    27.3    |    33.4    |
+| Language Understanding |           RACE-M           | 0-shot |     84.6     |    74.2    |     68.9      |    61.6    |    63.0    |
+|    Common Sense QA     |       CommonSenseQA        | 7-shot |     74.0     |    69.5    |     65.6      |    62.0    |    67.3    |
+|       Reasoning        |            PIQA            | 0-shot |     80.8     |    79.0    |     78.5      |    80.1    |    80.5    |
+|          Math          |           GSM8K            | 4-shot |     54.9     |    18.4    |     52.7      |    17.8    |    28.7    |
+|         Coding         |         HumanEval          | 0-shot |     39.6     |    15.9    |     17.1      |    15.8    |    18.3    |
 > <sup>1: Tests are conducted only on single-answer multiple-choice questions, thus excluding fill-in-the-blanks, open-ended questions, and multiple-answer multiple-choice questions.</sup>
+For all the comparison models mentioned above, we prioritize the disclosure of their officially published results. In the absence of official data, we refer to the reported outcomes from [OpenCompass Leaderboard](https://opencompass.org.cn/leaderboard-llm). Results not covered by the aforementioned sources are derived from our own evaluation pipline.
+For MMLU, we adopt the [evaluation tools](https://github.com/hendrycks/test) provided by the authors, C-Eval, AGIEval, GAOKAO-Bench, GAOKAO-English are the same as MMLU. For the remaining evaluation datasets, the [OpenCompass](https://github.com/open-compass/OpenCompass/) is employed for evaluation.
 ### Loading with Transformers

config.json CHANGED Viewed

@@ -14,6 +14,7 @@
   "initializer_range": 0.02,
   "intermediate_size": 13824,
   "max_position_embeddings": 8192,
   "model_type": "xverse",
   "num_attention_heads": 40,
   "num_hidden_layers": 40,
@@ -22,6 +23,5 @@
   "torch_dtype": "bfloat16",
   "transformers_version": "4.28.1",
   "use_cache": true,
-  "vocab_size": 100278
 }

   "initializer_range": 0.02,
   "intermediate_size": 13824,
   "max_position_embeddings": 8192,
+  "max_tokenizer_truncation": 6144,
   "model_type": "xverse",
   "num_attention_heads": 40,
   "num_hidden_layers": 40,
   "torch_dtype": "bfloat16",
   "transformers_version": "4.28.1",
   "use_cache": true,
+  "vocab_size": 100534
 }

configuration_xverse.py CHANGED Viewed

@@ -91,6 +91,7 @@ class XverseConfig(PretrainedConfig):
         num_attention_heads=40,
         hidden_act="silu",
         max_position_embeddings=8192,
         initializer_range=0.02,
         rms_norm_eps=1e-6,
         use_cache=True,
@@ -111,6 +112,7 @@ class XverseConfig(PretrainedConfig):
         self.initializer_range = initializer_range
         self.rms_norm_eps = rms_norm_eps
         self.use_cache = use_cache
         super().__init__(
             pad_token_id=pad_token_id,

         num_attention_heads=40,
         hidden_act="silu",
         max_position_embeddings=8192,
+        max_tokenizer_truncation=8192,
         initializer_range=0.02,
         rms_norm_eps=1e-6,
         use_cache=True,
         self.initializer_range = initializer_range
         self.rms_norm_eps = rms_norm_eps
         self.use_cache = use_cache
+        self.max_tokenizer_truncation = max_tokenizer_truncation
         super().__init__(
             pad_token_id=pad_token_id,

modeling_xverse.py CHANGED Viewed

@@ -611,8 +611,6 @@ class XverseModel(XversePreTrainedModel):
 class XverseForCausalLM(XversePreTrainedModel):
-    _tied_weights_keys = ["lm_head.weight"]
     def __init__(self, config):
         super().__init__(config)
         self.model = XverseModel(config)
@@ -732,15 +730,22 @@ class XverseForCausalLM(XversePreTrainedModel):
         max_new_tokens = max_new_tokens or self.generation_config.max_new_tokens
         max_input_tokens = self.config.max_position_embeddings - max_new_tokens
         max_input_tokens = max(self.config.max_position_embeddings // 2, max_input_tokens)
         total_input, round_input = [], []
-        user_prompt, assist_prompt = "Human: ", "Assistant: "
         for i, message in enumerate(messages[::-1]):
-            if message['role'] == 'user':
-                user_content = f"{user_prompt}{message['content']}\n\n"
                 if i == 0:
-                    user_content += assist_prompt
-                content_tokens = tokenizer.encode(user_content, return_token_type_ids=False)
                 round_input = content_tokens + round_input
                 if i != 0:
@@ -754,12 +759,20 @@ class XverseForCausalLM(XversePreTrainedModel):
                         break
                 round_input = []
             elif message['role'] == 'assistant':
-                assist_content = f"{assist_prompt}{message['content']}"
-                content_tokens = tokenizer.encode(assist_content, return_token_type_ids=False)
                 round_input = content_tokens + [self.generation_config.eos_token_id] + round_input
             else:
                 raise ValueError(f"message role not supported yet: {message['role']}")
-        total_input = total_input[-max_input_tokens:]  # truncate left
         total_input = torch.LongTensor([total_input]).to(self.device)
         return total_input
@@ -779,7 +792,7 @@ class XverseForCausalLM(XversePreTrainedModel):
                 thread = Thread(target=self.generate, kwargs=generation_kwargs)
                 thread.start()
                 for next_text in streamer:
-                    yield next_text.rstrip(tokenizer.eos_token)
             return stream_generator()
         else:
@@ -822,9 +835,7 @@ class XverseForCausalLM(XversePreTrainedModel):
     def _reorder_cache(past_key_values, beam_idx):
         reordered_past = ()
         for layer_past in past_key_values:
-            reordered_past += (
-                tuple(past_state.index_select(0, beam_idx.to(past_state.device)) for past_state in layer_past),
-            )
         return reordered_past
     def quantize(self, bit_length: int):

 class XverseForCausalLM(XversePreTrainedModel):
     def __init__(self, config):
         super().__init__(config)
         self.model = XverseModel(config)
         max_new_tokens = max_new_tokens or self.generation_config.max_new_tokens
         max_input_tokens = self.config.max_position_embeddings - max_new_tokens
         max_input_tokens = max(self.config.max_position_embeddings // 2, max_input_tokens)
+        max_input_tokens = min(self.config.max_tokenizer_truncation, max_input_tokens)
         total_input, round_input = [], []
+        user_prompt_tokens = tokenizer.encode("Human: ", return_token_type_ids=False)
+        exec_prompt_tokens = tokenizer.encode("Exec: ", return_token_type_ids=False)
+        assist_prompt_tokens = tokenizer.encode("Assistant: ", return_token_type_ids=False)
+        assist_prompt_len = len(assist_prompt_tokens)
         for i, message in enumerate(messages[::-1]):
+            if message['role'] == 'user' or message['role'] == 'exec':
+                user_content = f"{message['content']}\n\n"
+                content_tokens = user_prompt_tokens + tokenizer.encode(user_content, return_token_type_ids=False) if message['role'] == 'user' else \
+                    exec_prompt_tokens + tokenizer.encode(user_content, return_token_type_ids=False)
                 if i == 0:
+                    content_tokens = content_tokens[:max_input_tokens-assist_prompt_len]
+                    content_tokens += assist_prompt_tokens
                 round_input = content_tokens + round_input
                 if i != 0:
                         break
                 round_input = []
             elif message['role'] == 'assistant':
+                assist_content = f"{message['content']}"
+                content_tokens = assist_prompt_tokens + tokenizer.encode(assist_content, return_token_type_ids=False)
                 round_input = content_tokens + [self.generation_config.eos_token_id] + round_input
+            elif message['role'] == 'system':
+                assert i == len(messages) - 1
+                user_content = f"{message['content']}\n"
+                content_tokens = tokenizer.encode(user_content, return_token_type_ids=False)
+                round_input = user_prompt_tokens + content_tokens + round_input
+                if len(total_input) + len(round_input) > max_input_tokens:
+                    break
+                else:
+                    total_input = round_input + total_input
             else:
                 raise ValueError(f"message role not supported yet: {message['role']}")
         total_input = torch.LongTensor([total_input]).to(self.device)
         return total_input
                 thread = Thread(target=self.generate, kwargs=generation_kwargs)
                 thread.start()
                 for next_text in streamer:
+                    yield next_text.replace(tokenizer.eos_token, "")
             return stream_generator()
         else:
     def _reorder_cache(past_key_values, beam_idx):
         reordered_past = ()
         for layer_past in past_key_values:
+            reordered_past += (tuple(past_state.index_select(0, beam_idx) for past_state in layer_past),)
         return reordered_past
     def quantize(self, bit_length: int):

pytorch_model-00001-of-00015.bin → pytorch_model-00001-of-00010.bin RENAMED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e9eefa73431588ba182a9fa06fba459bfbbbc538ed15ea6aaa67ca525b1da446
-size 1871016318

 version https://git-lfs.github.com/spec/v1
+oid sha256:7ac6f98cae6a0b3768822474284d619beda358b68304a8bde5f1e493a694ef4e
+size 2508131049

pytorch_model-00002-of-00015.bin → pytorch_model-00002-of-00010.bin RENAMED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:88d32bfaf3d7b5368e7c00b6479ea1ef798aca901c3602ddc31e2bfdd2418c9b
-size 1903234214

 version https://git-lfs.github.com/spec/v1
+oid sha256:7715c66734a8871bbd764528ac509caa22b9c7a44b3e2b50ceb5bde1b237f6d5
+size 3172057468

pytorch_model-00003-of-00015.bin → pytorch_model-00003-of-00010.bin RENAMED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a943f6aa46a856797a804a15c5cb19035395d41bf239874f244192f38afaba72
-size 1903234214

 version https://git-lfs.github.com/spec/v1
+oid sha256:33b1866910aeadb014e0462c828e5d03d8a52044a3405253ab2f786c8c17279e
+size 3172057468

pytorch_model-00004-of-00015.bin → pytorch_model-00004-of-00010.bin RENAMED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b8926fe65e93fbfd9befa4528ca51f52723d781c571948ad64284465a0bd42a0
-size 1903234214

 version https://git-lfs.github.com/spec/v1
+oid sha256:b1ad2384bf041f4b3eb0dfa9cd2fd36ca9dea9761504c3b945cfb8302c7449a9
+size 3172057532

pytorch_model-00005-of-00010.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ebb74708adecad7f7ddf3a5d7ab327a4fcb61c8f0dfb6d66e31e82475a914af7
+size 3172057532

pytorch_model-00005-of-00015.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:4bbd7ff1159cb112d72a0bd0256783478d4cbccb684c87ec26992b1dfa952996
-size 1903234214

pytorch_model-00006-of-00010.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2df8350a00f3c5e7e1cf65ea7731c69343df05d5e52205b3284bb1dc43d0edfb
+size 3172057532

pytorch_model-00006-of-00015.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:d018135a9b98f873303c8fb6fd1e110fcab63aa63617a03bb17c62e535d5bfa4
-size 1903234214

pytorch_model-00007-of-00010.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:74cb38f652c76808a77e695afa4157924b1d0ce21db6bc18d1faa6fd7a842aff
+size 3172057532

pytorch_model-00007-of-00015.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:1c843829a5757d6fdc506471bae3516cf18aa512ba3ffd7ade30c353606b9427
-size 1903234214

pytorch_model-00008-of-00010.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e75e08a58078d4aa1302feaf449241b4c298344b9451b0bcbcb460677e0a7718
+size 3172057532

pytorch_model-00008-of-00015.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:e774213e567df646459b9db90ad70bab09f39a9eca6a894487e7c4aebbaf32a1
-size 1903234214

pytorch_model-00014-of-00015.bin → pytorch_model-00009-of-00010.bin RENAMED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d5b9bffbd923ca0be3badb0bb35e630465a95b21526b1a9ae2cccbd3fc03cb2c
 size 1693507250

 version https://git-lfs.github.com/spec/v1
+oid sha256:fa3de49840e4b259ff458a38e5d5a1720a6d1daca77b071a3774600effc16ca2
 size 1693507250

pytorch_model-00009-of-00015.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:8e2ee37b9e7e30f6bd914c219550a26eb2e364ff55a1089de1114242f4dc742f
-size 1903234214

pytorch_model-00010-of-00010.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:17d1b0fa1afc4439ac6d633fe77a4eebd93e0a23f2433e7c39058b9e5ea31a7b
+size 1029571307

pytorch_model-00010-of-00015.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:8b519216f13ae9d3db29e3bf3baa61051038ab686fc5bcc5a056d54215dce66c
-size 1903234214

pytorch_model-00011-of-00015.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:3f9af8f0d46f93b2dd86f40ec115ddcedf42dff498d8481c96bcfc47d655b7f7
-size 1903234214

pytorch_model-00012-of-00015.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:dc8045ea179095d6961c8bb579b00cb58a7c65eb17fca343397f55acb30d8fe4
-size 1903234214

pytorch_model-00013-of-00015.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:92a9f68f76d1b0542c441154a5f05049cf5ca3aaa82166e101a7a4018b44ea37
-size 1903234214

pytorch_model-00015-of-00015.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:c395ce9729e0cc6a3824c1645838ebc96a4f232ac848e69ecf092fdfcf4380bc
-size 1026867947

pytorch_model.bin.index.json CHANGED Viewed

@@ -1,410 +1,410 @@
 {
   "metadata": {
-    "total_size": 27430067200
   },
   "weight_map": {
-    "lm_head.weight": "pytorch_model-00015-of-00015.bin",
-    "model.embed_tokens.weight": "pytorch_model-00001-of-00015.bin",
-    "model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00015.bin",
-    "model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00015.bin",
-    "model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00015.bin",
-    "model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00015.bin",
-    "model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00015.bin",
-    "model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00015.bin",
-    "model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00015.bin",
-    "model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00015.bin",
-    "model.layers.0.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00015.bin",
-    "model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00015.bin",
-    "model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00015.bin",
-    "model.layers.1.mlp.down_proj.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.1.mlp.gate_proj.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.1.mlp.up_proj.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00015.bin",
-    "model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00015.bin",
-    "model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00015.bin",
-    "model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00015.bin",
-    "model.layers.1.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00015.bin",
-    "model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00015.bin",
-    "model.layers.10.input_layernorm.weight": "pytorch_model-00004-of-00015.bin",
-    "model.layers.10.mlp.down_proj.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.10.mlp.gate_proj.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.10.mlp.up_proj.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.10.post_attention_layernorm.weight": "pytorch_model-00004-of-00015.bin",
-    "model.layers.10.self_attn.k_proj.weight": "pytorch_model-00004-of-00015.bin",
-    "model.layers.10.self_attn.o_proj.weight": "pytorch_model-00004-of-00015.bin",
-    "model.layers.10.self_attn.q_proj.weight": "pytorch_model-00004-of-00015.bin",
-    "model.layers.10.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00015.bin",
-    "model.layers.10.self_attn.v_proj.weight": "pytorch_model-00004-of-00015.bin",
-    "model.layers.11.input_layernorm.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.11.mlp.down_proj.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.11.mlp.gate_proj.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.11.mlp.up_proj.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.11.post_attention_layernorm.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.11.self_attn.k_proj.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.11.self_attn.o_proj.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.11.self_attn.q_proj.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.11.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00015.bin",
-    "model.layers.11.self_attn.v_proj.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.12.input_layernorm.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.12.mlp.down_proj.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.12.mlp.gate_proj.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.12.mlp.up_proj.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.12.post_attention_layernorm.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.12.self_attn.k_proj.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.12.self_attn.o_proj.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.12.self_attn.q_proj.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.12.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00015.bin",
-    "model.layers.12.self_attn.v_proj.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.13.input_layernorm.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.13.mlp.down_proj.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.13.mlp.gate_proj.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.13.mlp.up_proj.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.13.post_attention_layernorm.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.13.self_attn.k_proj.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.13.self_attn.o_proj.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.13.self_attn.q_proj.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.13.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00015.bin",
-    "model.layers.13.self_attn.v_proj.weight": "pytorch_model-00005-of-00015.bin",
-    "model.layers.14.input_layernorm.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.14.mlp.down_proj.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.14.mlp.gate_proj.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.14.mlp.up_proj.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.14.post_attention_layernorm.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.14.self_attn.k_proj.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.14.self_attn.o_proj.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.14.self_attn.q_proj.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.14.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00015.bin",
-    "model.layers.14.self_attn.v_proj.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.15.input_layernorm.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.15.mlp.down_proj.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.15.mlp.gate_proj.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.15.mlp.up_proj.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.15.post_attention_layernorm.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.15.self_attn.k_proj.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.15.self_attn.o_proj.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.15.self_attn.q_proj.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.15.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00015.bin",
-    "model.layers.15.self_attn.v_proj.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.16.input_layernorm.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.16.mlp.down_proj.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.16.mlp.gate_proj.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.16.mlp.up_proj.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.16.post_attention_layernorm.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.16.self_attn.k_proj.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.16.self_attn.o_proj.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.16.self_attn.q_proj.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.16.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00015.bin",
-    "model.layers.16.self_attn.v_proj.weight": "pytorch_model-00006-of-00015.bin",
-    "model.layers.17.input_layernorm.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.17.mlp.down_proj.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.17.mlp.gate_proj.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.17.mlp.up_proj.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.17.post_attention_layernorm.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.17.self_attn.k_proj.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.17.self_attn.o_proj.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.17.self_attn.q_proj.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.17.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00015.bin",
-    "model.layers.17.self_attn.v_proj.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.18.input_layernorm.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.18.mlp.down_proj.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.18.mlp.gate_proj.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.18.mlp.up_proj.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.18.post_attention_layernorm.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.18.self_attn.k_proj.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.18.self_attn.o_proj.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.18.self_attn.q_proj.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.18.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00015.bin",
-    "model.layers.18.self_attn.v_proj.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.19.input_layernorm.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.19.mlp.down_proj.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.19.mlp.gate_proj.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.19.mlp.up_proj.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.19.post_attention_layernorm.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.19.self_attn.k_proj.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.19.self_attn.o_proj.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.19.self_attn.q_proj.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.19.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00015.bin",
-    "model.layers.19.self_attn.v_proj.weight": "pytorch_model-00007-of-00015.bin",
-    "model.layers.2.input_layernorm.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.2.mlp.down_proj.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.2.mlp.gate_proj.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.2.mlp.up_proj.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.2.post_attention_layernorm.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.2.self_attn.k_proj.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.2.self_attn.o_proj.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.2.self_attn.q_proj.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.2.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00015.bin",
-    "model.layers.2.self_attn.v_proj.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.20.input_layernorm.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.20.mlp.down_proj.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.20.mlp.gate_proj.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.20.mlp.up_proj.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.20.post_attention_layernorm.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.20.self_attn.k_proj.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.20.self_attn.o_proj.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.20.self_attn.q_proj.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.20.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00015.bin",
-    "model.layers.20.self_attn.v_proj.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.21.input_layernorm.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.21.mlp.down_proj.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.21.mlp.gate_proj.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.21.mlp.up_proj.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.21.post_attention_layernorm.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.21.self_attn.k_proj.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.21.self_attn.o_proj.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.21.self_attn.q_proj.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.21.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00015.bin",
-    "model.layers.21.self_attn.v_proj.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.22.input_layernorm.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.22.mlp.down_proj.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.22.mlp.gate_proj.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.22.mlp.up_proj.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.22.post_attention_layernorm.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.22.self_attn.k_proj.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.22.self_attn.o_proj.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.22.self_attn.q_proj.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.22.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00015.bin",
-    "model.layers.22.self_attn.v_proj.weight": "pytorch_model-00008-of-00015.bin",
-    "model.layers.23.input_layernorm.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.23.mlp.down_proj.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.23.mlp.gate_proj.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.23.mlp.up_proj.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.23.post_attention_layernorm.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.23.self_attn.k_proj.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.23.self_attn.o_proj.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.23.self_attn.q_proj.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.23.self_attn.rotary_emb.inv_freq": "pytorch_model-00009-of-00015.bin",
-    "model.layers.23.self_attn.v_proj.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.24.input_layernorm.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.24.mlp.down_proj.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.24.mlp.gate_proj.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.24.mlp.up_proj.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.24.post_attention_layernorm.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.24.self_attn.k_proj.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.24.self_attn.o_proj.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.24.self_attn.q_proj.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.24.self_attn.rotary_emb.inv_freq": "pytorch_model-00009-of-00015.bin",
-    "model.layers.24.self_attn.v_proj.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.25.input_layernorm.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.25.mlp.down_proj.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.25.mlp.gate_proj.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.25.mlp.up_proj.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.25.post_attention_layernorm.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.25.self_attn.k_proj.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.25.self_attn.o_proj.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.25.self_attn.q_proj.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.25.self_attn.rotary_emb.inv_freq": "pytorch_model-00009-of-00015.bin",
-    "model.layers.25.self_attn.v_proj.weight": "pytorch_model-00009-of-00015.bin",
-    "model.layers.26.input_layernorm.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.26.mlp.down_proj.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.26.mlp.gate_proj.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.26.mlp.up_proj.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.26.post_attention_layernorm.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.26.self_attn.k_proj.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.26.self_attn.o_proj.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.26.self_attn.q_proj.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.26.self_attn.rotary_emb.inv_freq": "pytorch_model-00010-of-00015.bin",
-    "model.layers.26.self_attn.v_proj.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.27.input_layernorm.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.27.mlp.down_proj.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.27.mlp.gate_proj.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.27.mlp.up_proj.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.27.post_attention_layernorm.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.27.self_attn.k_proj.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.27.self_attn.o_proj.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.27.self_attn.q_proj.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.27.self_attn.rotary_emb.inv_freq": "pytorch_model-00010-of-00015.bin",
-    "model.layers.27.self_attn.v_proj.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.28.input_layernorm.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.28.mlp.down_proj.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.28.mlp.gate_proj.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.28.mlp.up_proj.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.28.post_attention_layernorm.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.28.self_attn.k_proj.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.28.self_attn.o_proj.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.28.self_attn.q_proj.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.28.self_attn.rotary_emb.inv_freq": "pytorch_model-00010-of-00015.bin",
-    "model.layers.28.self_attn.v_proj.weight": "pytorch_model-00010-of-00015.bin",
-    "model.layers.29.input_layernorm.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.29.mlp.down_proj.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.29.mlp.gate_proj.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.29.mlp.up_proj.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.29.post_attention_layernorm.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.29.self_attn.k_proj.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.29.self_attn.o_proj.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.29.self_attn.q_proj.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.29.self_attn.rotary_emb.inv_freq": "pytorch_model-00011-of-00015.bin",
-    "model.layers.29.self_attn.v_proj.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.3.input_layernorm.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.3.mlp.down_proj.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.3.mlp.gate_proj.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.3.mlp.up_proj.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.3.post_attention_layernorm.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.3.self_attn.k_proj.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.3.self_attn.o_proj.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.3.self_attn.q_proj.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.3.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00015.bin",
-    "model.layers.3.self_attn.v_proj.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.30.input_layernorm.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.30.mlp.down_proj.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.30.mlp.gate_proj.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.30.mlp.up_proj.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.30.post_attention_layernorm.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.30.self_attn.k_proj.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.30.self_attn.o_proj.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.30.self_attn.q_proj.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.30.self_attn.rotary_emb.inv_freq": "pytorch_model-00011-of-00015.bin",
-    "model.layers.30.self_attn.v_proj.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.31.input_layernorm.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.31.mlp.down_proj.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.31.mlp.gate_proj.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.31.mlp.up_proj.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.31.post_attention_layernorm.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.31.self_attn.k_proj.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.31.self_attn.o_proj.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.31.self_attn.q_proj.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.31.self_attn.rotary_emb.inv_freq": "pytorch_model-00011-of-00015.bin",
-    "model.layers.31.self_attn.v_proj.weight": "pytorch_model-00011-of-00015.bin",
-    "model.layers.32.input_layernorm.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.32.mlp.down_proj.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.32.mlp.gate_proj.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.32.mlp.up_proj.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.32.post_attention_layernorm.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.32.self_attn.k_proj.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.32.self_attn.o_proj.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.32.self_attn.q_proj.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.32.self_attn.rotary_emb.inv_freq": "pytorch_model-00012-of-00015.bin",
-    "model.layers.32.self_attn.v_proj.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.33.input_layernorm.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.33.mlp.down_proj.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.33.mlp.gate_proj.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.33.mlp.up_proj.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.33.post_attention_layernorm.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.33.self_attn.k_proj.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.33.self_attn.o_proj.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.33.self_attn.q_proj.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.33.self_attn.rotary_emb.inv_freq": "pytorch_model-00012-of-00015.bin",
-    "model.layers.33.self_attn.v_proj.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.34.input_layernorm.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.34.mlp.down_proj.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.34.mlp.gate_proj.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.34.mlp.up_proj.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.34.post_attention_layernorm.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.34.self_attn.k_proj.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.34.self_attn.o_proj.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.34.self_attn.q_proj.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.34.self_attn.rotary_emb.inv_freq": "pytorch_model-00012-of-00015.bin",
-    "model.layers.34.self_attn.v_proj.weight": "pytorch_model-00012-of-00015.bin",
-    "model.layers.35.input_layernorm.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.35.mlp.down_proj.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.35.mlp.gate_proj.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.35.mlp.up_proj.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.35.post_attention_layernorm.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.35.self_attn.k_proj.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.35.self_attn.o_proj.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.35.self_attn.q_proj.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.35.self_attn.rotary_emb.inv_freq": "pytorch_model-00013-of-00015.bin",
-    "model.layers.35.self_attn.v_proj.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.36.input_layernorm.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.36.mlp.down_proj.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.36.mlp.gate_proj.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.36.mlp.up_proj.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.36.post_attention_layernorm.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.36.self_attn.k_proj.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.36.self_attn.o_proj.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.36.self_attn.q_proj.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.36.self_attn.rotary_emb.inv_freq": "pytorch_model-00013-of-00015.bin",
-    "model.layers.36.self_attn.v_proj.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.37.input_layernorm.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.37.mlp.down_proj.weight": "pytorch_model-00014-of-00015.bin",
-    "model.layers.37.mlp.gate_proj.weight": "pytorch_model-00014-of-00015.bin",
-    "model.layers.37.mlp.up_proj.weight": "pytorch_model-00014-of-00015.bin",
-    "model.layers.37.post_attention_layernorm.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.37.self_attn.k_proj.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.37.self_attn.o_proj.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.37.self_attn.q_proj.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.37.self_attn.rotary_emb.inv_freq": "pytorch_model-00013-of-00015.bin",
-    "model.layers.37.self_attn.v_proj.weight": "pytorch_model-00013-of-00015.bin",
-    "model.layers.38.input_layernorm.weight": "pytorch_model-00014-of-00015.bin",
-    "model.layers.38.mlp.down_proj.weight": "pytorch_model-00014-of-00015.bin",
-    "model.layers.38.mlp.gate_proj.weight": "pytorch_model-00014-of-00015.bin",
-    "model.layers.38.mlp.up_proj.weight": "pytorch_model-00014-of-00015.bin",
-    "model.layers.38.post_attention_layernorm.weight": "pytorch_model-00014-of-00015.bin",
-    "model.layers.38.self_attn.k_proj.weight": "pytorch_model-00014-of-00015.bin",
-    "model.layers.38.self_attn.o_proj.weight": "pytorch_model-00014-of-00015.bin",
-    "model.layers.38.self_attn.q_proj.weight": "pytorch_model-00014-of-00015.bin",
-    "model.layers.38.self_attn.rotary_emb.inv_freq": "pytorch_model-00014-of-00015.bin",
-    "model.layers.38.self_attn.v_proj.weight": "pytorch_model-00014-of-00015.bin",
-    "model.layers.39.input_layernorm.weight": "pytorch_model-00014-of-00015.bin",
-    "model.layers.39.mlp.down_proj.weight": "pytorch_model-00014-of-00015.bin",
-    "model.layers.39.mlp.gate_proj.weight": "pytorch_model-00014-of-00015.bin",
-    "model.layers.39.mlp.up_proj.weight": "pytorch_model-00014-of-00015.bin",
-    "model.layers.39.post_attention_layernorm.weight": "pytorch_model-00014-of-00015.bin",
-    "model.layers.39.self_attn.k_proj.weight": "pytorch_model-00014-of-00015.bin",
-    "model.layers.39.self_attn.o_proj.weight": "pytorch_model-00014-of-00015.bin",
-    "model.layers.39.self_attn.q_proj.weight": "pytorch_model-00014-of-00015.bin",
-    "model.layers.39.self_attn.rotary_emb.inv_freq": "pytorch_model-00014-of-00015.bin",
-    "model.layers.39.self_attn.v_proj.weight": "pytorch_model-00014-of-00015.bin",
-    "model.layers.4.input_layernorm.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.4.mlp.down_proj.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.4.mlp.gate_proj.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.4.mlp.up_proj.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.4.post_attention_layernorm.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.4.self_attn.k_proj.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.4.self_attn.o_proj.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.4.self_attn.q_proj.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.4.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00015.bin",
-    "model.layers.4.self_attn.v_proj.weight": "pytorch_model-00002-of-00015.bin",
-    "model.layers.5.input_layernorm.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.5.mlp.down_proj.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.5.mlp.gate_proj.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.5.mlp.up_proj.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.5.post_attention_layernorm.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.5.self_attn.k_proj.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.5.self_attn.o_proj.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.5.self_attn.q_proj.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.5.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00015.bin",
-    "model.layers.5.self_attn.v_proj.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.6.input_layernorm.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.6.mlp.down_proj.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.6.mlp.gate_proj.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.6.mlp.up_proj.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.6.post_attention_layernorm.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.6.self_attn.k_proj.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.6.self_attn.o_proj.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.6.self_attn.q_proj.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.6.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00015.bin",
-    "model.layers.6.self_attn.v_proj.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.7.input_layernorm.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.7.mlp.down_proj.weight": "pytorch_model-00004-of-00015.bin",
-    "model.layers.7.mlp.gate_proj.weight": "pytorch_model-00004-of-00015.bin",
-    "model.layers.7.mlp.up_proj.weight": "pytorch_model-00004-of-00015.bin",
-    "model.layers.7.post_attention_layernorm.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.7.self_attn.k_proj.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.7.self_attn.o_proj.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.7.self_attn.q_proj.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.7.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00015.bin",
-    "model.layers.7.self_attn.v_proj.weight": "pytorch_model-00003-of-00015.bin",
-    "model.layers.8.input_layernorm.weight": "pytorch_model-00004-of-00015.bin",
-    "model.layers.8.mlp.down_proj.weight": "pytorch_model-00004-of-00015.bin",
-    "model.layers.8.mlp.gate_proj.weight": "pytorch_model-00004-of-00015.bin",
-    "model.layers.8.mlp.up_proj.weight": "pytorch_model-00004-of-00015.bin",
-    "model.layers.8.post_attention_layernorm.weight": "pytorch_model-00004-of-00015.bin",
-    "model.layers.8.self_attn.k_proj.weight": "pytorch_model-00004-of-00015.bin",
-    "model.layers.8.self_attn.o_proj.weight": "pytorch_model-00004-of-00015.bin",
-    "model.layers.8.self_attn.q_proj.weight": "pytorch_model-00004-of-00015.bin",
-    "model.layers.8.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00015.bin",
-    "model.layers.8.self_attn.v_proj.weight": "pytorch_model-00004-of-00015.bin",
-    "model.layers.9.input_layernorm.weight": "pytorch_model-00004-of-00015.bin",
-    "model.layers.9.mlp.down_proj.weight": "pytorch_model-00004-of-00015.bin",
-    "model.layers.9.mlp.gate_proj.weight": "pytorch_model-00004-of-00015.bin",
-    "model.layers.9.mlp.up_proj.weight": "pytorch_model-00004-of-00015.bin",
-    "model.layers.9.post_attention_layernorm.weight": "pytorch_model-00004-of-00015.bin",
-    "model.layers.9.self_attn.k_proj.weight": "pytorch_model-00004-of-00015.bin",
-    "model.layers.9.self_attn.o_proj.weight": "pytorch_model-00004-of-00015.bin",
-    "model.layers.9.self_attn.q_proj.weight": "pytorch_model-00004-of-00015.bin",
-    "model.layers.9.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00015.bin",
-    "model.layers.9.self_attn.v_proj.weight": "pytorch_model-00004-of-00015.bin",
-    "model.norm.weight": "pytorch_model-00014-of-00015.bin"
   }
 }

 {
   "metadata": {
+    "total_size": 17578695680
   },
   "weight_map": {
+    "lm_head.weight": "pytorch_model-00010-of-00010.bin",
+    "model.embed_tokens.weight": "pytorch_model-00001-of-00010.bin",
+    "model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00010.bin",
+    "model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00010.bin",
+    "model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00010.bin",
+    "model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00010.bin",
+    "model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00010.bin",
+    "model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00010.bin",
+    "model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00010.bin",
+    "model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00010.bin",
+    "model.layers.0.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00010.bin",
+    "model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00010.bin",
+    "model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00010.bin",
+    "model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00010.bin",
+    "model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00010.bin",
+    "model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00010.bin",
+    "model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00010.bin",
+    "model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00010.bin",
+    "model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00010.bin",
+    "model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00010.bin",
+    "model.layers.1.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00010.bin",
+    "model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00010.bin",
+    "model.layers.10.input_layernorm.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.10.mlp.down_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.10.mlp.gate_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.10.mlp.up_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.10.post_attention_layernorm.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.10.self_attn.k_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.10.self_attn.o_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.10.self_attn.q_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.10.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00010.bin",
+    "model.layers.10.self_attn.v_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.11.input_layernorm.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.11.mlp.down_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.11.mlp.gate_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.11.mlp.up_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.11.post_attention_layernorm.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.11.self_attn.k_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.11.self_attn.o_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.11.self_attn.q_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.11.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00010.bin",
+    "model.layers.11.self_attn.v_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.12.input_layernorm.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.12.mlp.down_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.12.mlp.gate_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.12.mlp.up_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.12.post_attention_layernorm.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.12.self_attn.k_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.12.self_attn.o_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.12.self_attn.q_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.12.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00010.bin",
+    "model.layers.12.self_attn.v_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.13.input_layernorm.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.13.mlp.down_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.13.mlp.gate_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.13.mlp.up_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.13.post_attention_layernorm.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.13.self_attn.k_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.13.self_attn.o_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.13.self_attn.q_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.13.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00010.bin",
+    "model.layers.13.self_attn.v_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.14.input_layernorm.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.14.mlp.down_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.14.mlp.gate_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.14.mlp.up_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.14.post_attention_layernorm.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.14.self_attn.k_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.14.self_attn.o_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.14.self_attn.q_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.14.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00010.bin",
+    "model.layers.14.self_attn.v_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.15.input_layernorm.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.15.mlp.down_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.15.mlp.gate_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.15.mlp.up_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.15.post_attention_layernorm.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.15.self_attn.k_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.15.self_attn.o_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.15.self_attn.q_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.15.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00010.bin",
+    "model.layers.15.self_attn.v_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.16.input_layernorm.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.16.mlp.down_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.16.mlp.gate_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.16.mlp.up_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.16.post_attention_layernorm.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.16.self_attn.k_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.16.self_attn.o_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.16.self_attn.q_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.16.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00010.bin",
+    "model.layers.16.self_attn.v_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.17.input_layernorm.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.17.mlp.down_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.17.mlp.gate_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.17.mlp.up_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.17.post_attention_layernorm.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.17.self_attn.k_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.17.self_attn.o_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.17.self_attn.q_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.17.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00010.bin",
+    "model.layers.17.self_attn.v_proj.weight": "pytorch_model-00004-of-00010.bin",
+    "model.layers.18.input_layernorm.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.18.mlp.down_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.18.mlp.gate_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.18.mlp.up_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.18.post_attention_layernorm.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.18.self_attn.k_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.18.self_attn.o_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.18.self_attn.q_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.18.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00010.bin",
+    "model.layers.18.self_attn.v_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.19.input_layernorm.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.19.mlp.down_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.19.mlp.gate_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.19.mlp.up_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.19.post_attention_layernorm.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.19.self_attn.k_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.19.self_attn.o_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.19.self_attn.q_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.19.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00010.bin",
+    "model.layers.19.self_attn.v_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00010.bin",
+    "model.layers.2.mlp.down_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.2.mlp.gate_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.2.mlp.up_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00010.bin",
+    "model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00010.bin",
+    "model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00010.bin",
+    "model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00010.bin",
+    "model.layers.2.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00010.bin",
+    "model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00010.bin",
+    "model.layers.20.input_layernorm.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.20.mlp.down_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.20.mlp.gate_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.20.mlp.up_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.20.post_attention_layernorm.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.20.self_attn.k_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.20.self_attn.o_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.20.self_attn.q_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.20.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00010.bin",
+    "model.layers.20.self_attn.v_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.21.input_layernorm.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.21.mlp.down_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.21.mlp.gate_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.21.mlp.up_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.21.post_attention_layernorm.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.21.self_attn.k_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.21.self_attn.o_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.21.self_attn.q_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.21.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00010.bin",
+    "model.layers.21.self_attn.v_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.22.input_layernorm.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.22.mlp.down_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.22.mlp.gate_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.22.mlp.up_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.22.post_attention_layernorm.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.22.self_attn.k_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.22.self_attn.o_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.22.self_attn.q_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.22.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00010.bin",
+    "model.layers.22.self_attn.v_proj.weight": "pytorch_model-00005-of-00010.bin",
+    "model.layers.23.input_layernorm.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.23.mlp.down_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.23.mlp.gate_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.23.mlp.up_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.23.post_attention_layernorm.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.23.self_attn.k_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.23.self_attn.o_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.23.self_attn.q_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.23.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00010.bin",
+    "model.layers.23.self_attn.v_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.24.input_layernorm.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.24.mlp.down_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.24.mlp.gate_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.24.mlp.up_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.24.post_attention_layernorm.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.24.self_attn.k_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.24.self_attn.o_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.24.self_attn.q_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.24.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00010.bin",
+    "model.layers.24.self_attn.v_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.25.input_layernorm.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.25.mlp.down_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.25.mlp.gate_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.25.mlp.up_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.25.post_attention_layernorm.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.25.self_attn.k_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.25.self_attn.o_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.25.self_attn.q_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.25.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00010.bin",
+    "model.layers.25.self_attn.v_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.26.input_layernorm.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.26.mlp.down_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.26.mlp.gate_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.26.mlp.up_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.26.post_attention_layernorm.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.26.self_attn.k_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.26.self_attn.o_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.26.self_attn.q_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.26.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00010.bin",
+    "model.layers.26.self_attn.v_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.27.input_layernorm.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.27.mlp.down_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.27.mlp.gate_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.27.mlp.up_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.27.post_attention_layernorm.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.27.self_attn.k_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.27.self_attn.o_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.27.self_attn.q_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.27.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00010.bin",
+    "model.layers.27.self_attn.v_proj.weight": "pytorch_model-00006-of-00010.bin",
+    "model.layers.28.input_layernorm.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.28.mlp.down_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.28.mlp.gate_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.28.mlp.up_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.28.post_attention_layernorm.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.28.self_attn.k_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.28.self_attn.o_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.28.self_attn.q_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.28.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00010.bin",
+    "model.layers.28.self_attn.v_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.29.input_layernorm.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.29.mlp.down_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.29.mlp.gate_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.29.mlp.up_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.29.post_attention_layernorm.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.29.self_attn.k_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.29.self_attn.o_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.29.self_attn.q_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.29.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00010.bin",
+    "model.layers.29.self_attn.v_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.3.input_layernorm.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.3.mlp.down_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.3.mlp.gate_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.3.mlp.up_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.3.post_attention_layernorm.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.3.self_attn.k_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.3.self_attn.o_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.3.self_attn.q_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.3.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00010.bin",
+    "model.layers.3.self_attn.v_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.30.input_layernorm.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.30.mlp.down_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.30.mlp.gate_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.30.mlp.up_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.30.post_attention_layernorm.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.30.self_attn.k_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.30.self_attn.o_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.30.self_attn.q_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.30.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00010.bin",
+    "model.layers.30.self_attn.v_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.31.input_layernorm.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.31.mlp.down_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.31.mlp.gate_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.31.mlp.up_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.31.post_attention_layernorm.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.31.self_attn.k_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.31.self_attn.o_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.31.self_attn.q_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.31.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00010.bin",
+    "model.layers.31.self_attn.v_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.32.input_layernorm.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.32.mlp.down_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.32.mlp.gate_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.32.mlp.up_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.32.post_attention_layernorm.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.32.self_attn.k_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.32.self_attn.o_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.32.self_attn.q_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.32.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00010.bin",
+    "model.layers.32.self_attn.v_proj.weight": "pytorch_model-00007-of-00010.bin",
+    "model.layers.33.input_layernorm.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.33.mlp.down_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.33.mlp.gate_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.33.mlp.up_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.33.post_attention_layernorm.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.33.self_attn.k_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.33.self_attn.o_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.33.self_attn.q_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.33.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00010.bin",
+    "model.layers.33.self_attn.v_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.34.input_layernorm.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.34.mlp.down_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.34.mlp.gate_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.34.mlp.up_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.34.post_attention_layernorm.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.34.self_attn.k_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.34.self_attn.o_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.34.self_attn.q_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.34.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00010.bin",
+    "model.layers.34.self_attn.v_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.35.input_layernorm.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.35.mlp.down_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.35.mlp.gate_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.35.mlp.up_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.35.post_attention_layernorm.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.35.self_attn.k_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.35.self_attn.o_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.35.self_attn.q_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.35.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00010.bin",
+    "model.layers.35.self_attn.v_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.36.input_layernorm.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.36.mlp.down_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.36.mlp.gate_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.36.mlp.up_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.36.post_attention_layernorm.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.36.self_attn.k_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.36.self_attn.o_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.36.self_attn.q_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.36.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00010.bin",
+    "model.layers.36.self_attn.v_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.37.input_layernorm.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.37.mlp.down_proj.weight": "pytorch_model-00009-of-00010.bin",
+    "model.layers.37.mlp.gate_proj.weight": "pytorch_model-00009-of-00010.bin",
+    "model.layers.37.mlp.up_proj.weight": "pytorch_model-00009-of-00010.bin",
+    "model.layers.37.post_attention_layernorm.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.37.self_attn.k_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.37.self_attn.o_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.37.self_attn.q_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.37.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00010.bin",
+    "model.layers.37.self_attn.v_proj.weight": "pytorch_model-00008-of-00010.bin",
+    "model.layers.38.input_layernorm.weight": "pytorch_model-00009-of-00010.bin",
+    "model.layers.38.mlp.down_proj.weight": "pytorch_model-00009-of-00010.bin",
+    "model.layers.38.mlp.gate_proj.weight": "pytorch_model-00009-of-00010.bin",
+    "model.layers.38.mlp.up_proj.weight": "pytorch_model-00009-of-00010.bin",
+    "model.layers.38.post_attention_layernorm.weight": "pytorch_model-00009-of-00010.bin",
+    "model.layers.38.self_attn.k_proj.weight": "pytorch_model-00009-of-00010.bin",
+    "model.layers.38.self_attn.o_proj.weight": "pytorch_model-00009-of-00010.bin",
+    "model.layers.38.self_attn.q_proj.weight": "pytorch_model-00009-of-00010.bin",
+    "model.layers.38.self_attn.rotary_emb.inv_freq": "pytorch_model-00009-of-00010.bin",
+    "model.layers.38.self_attn.v_proj.weight": "pytorch_model-00009-of-00010.bin",
+    "model.layers.39.input_layernorm.weight": "pytorch_model-00009-of-00010.bin",
+    "model.layers.39.mlp.down_proj.weight": "pytorch_model-00009-of-00010.bin",
+    "model.layers.39.mlp.gate_proj.weight": "pytorch_model-00009-of-00010.bin",
+    "model.layers.39.mlp.up_proj.weight": "pytorch_model-00009-of-00010.bin",
+    "model.layers.39.post_attention_layernorm.weight": "pytorch_model-00009-of-00010.bin",
+    "model.layers.39.self_attn.k_proj.weight": "pytorch_model-00009-of-00010.bin",
+    "model.layers.39.self_attn.o_proj.weight": "pytorch_model-00009-of-00010.bin",
+    "model.layers.39.self_attn.q_proj.weight": "pytorch_model-00009-of-00010.bin",
+    "model.layers.39.self_attn.rotary_emb.inv_freq": "pytorch_model-00009-of-00010.bin",
+    "model.layers.39.self_attn.v_proj.weight": "pytorch_model-00009-of-00010.bin",
+    "model.layers.4.input_layernorm.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.4.mlp.down_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.4.mlp.gate_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.4.mlp.up_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.4.post_attention_layernorm.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.4.self_attn.k_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.4.self_attn.o_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.4.self_attn.q_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.4.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00010.bin",
+    "model.layers.4.self_attn.v_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.5.input_layernorm.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.5.mlp.down_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.5.mlp.gate_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.5.mlp.up_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.5.post_attention_layernorm.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.5.self_attn.k_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.5.self_attn.o_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.5.self_attn.q_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.5.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00010.bin",
+    "model.layers.5.self_attn.v_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.6.input_layernorm.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.6.mlp.down_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.6.mlp.gate_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.6.mlp.up_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.6.post_attention_layernorm.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.6.self_attn.k_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.6.self_attn.o_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.6.self_attn.q_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.6.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00010.bin",
+    "model.layers.6.self_attn.v_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.7.input_layernorm.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.7.mlp.down_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.7.mlp.gate_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.7.mlp.up_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.7.post_attention_layernorm.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.7.self_attn.k_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.7.self_attn.o_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.7.self_attn.q_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.7.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00010.bin",
+    "model.layers.7.self_attn.v_proj.weight": "pytorch_model-00002-of-00010.bin",
+    "model.layers.8.input_layernorm.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.8.mlp.down_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.8.mlp.gate_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.8.mlp.up_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.8.post_attention_layernorm.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.8.self_attn.k_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.8.self_attn.o_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.8.self_attn.q_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.8.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00010.bin",
+    "model.layers.8.self_attn.v_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.9.input_layernorm.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.9.mlp.down_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.9.mlp.gate_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.9.mlp.up_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.9.post_attention_layernorm.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.9.self_attn.k_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.9.self_attn.o_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.9.self_attn.q_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.layers.9.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00010.bin",
+    "model.layers.9.self_attn.v_proj.weight": "pytorch_model-00003-of-00010.bin",
+    "model.norm.weight": "pytorch_model-00009-of-00010.bin"
   }
 }

tokenizer.json CHANGED Viewed

@@ -58,14 +58,6 @@
       "special": true
     }
   ],
-  "normalizer": {
-    "type": "Sequence",
-    "normalizers": [
-      {
-        "type": "NFKC"
-      }
-    ]
-  },
   "pre_tokenizer": {
     "type": "Sequence",
     "pretokenizers": [
@@ -86,9 +78,17 @@
   },
   "post_processor": null,
   "decoder": {
-    "type": "Metaspace",
-    "replacement": "▁",
-    "add_prefix_space": false
   },
   "model": {
     "type": "BPE",
@@ -100376,7 +100376,263 @@
       "nj": 100274,
       "iful": 100275,
       "▁solution": 100276,
-      "\n": 100277
     },
     "merges": [
       "▁ t",
@@ -104090,4 +104346,4 @@
       "▁sol ution"
     ]
   }
-}

       "special": true
     }
   ],
   "pre_tokenizer": {
     "type": "Sequence",
     "pretokenizers": [
   },
   "post_processor": null,
   "decoder": {
+    "type": "Sequence",
+    "decoders": [
+      {
+        "type": "Metaspace",
+        "replacement": "▁",
+        "add_prefix_space": false
+      },
+      {
+        "type": "ByteFallback"
+      }
+    ]
   },
   "model": {
     "type": "BPE",
       "nj": 100274,
       "iful": 100275,
       "▁solution": 100276,
+      "\n": 100277,
+      "<0x00>": 100278,
+      "<0x01>": 100279,
+      "<0x02>": 100280,
+      "<0x03>": 100281,
+      "<0x04>": 100282,
+      "<0x05>": 100283,
+      "<0x06>": 100284,
+      "<0x07>": 100285,
+      "<0x08>": 100286,
+      "<0x09>": 100287,
+      "<0x0A>": 100288,
+      "<0x0B>": 100289,
+      "<0x0C>": 100290,
+      "<0x0D>": 100291,
+      "<0x0E>": 100292,
+      "<0x0F>": 100293,
+      "<0x10>": 100294,
+      "<0x11>": 100295,
+      "<0x12>": 100296,
+      "<0x13>": 100297,
+      "<0x14>": 100298,
+      "<0x15>": 100299,
+      "<0x16>": 100300,
+      "<0x17>": 100301,
+      "<0x18>": 100302,
+      "<0x19>": 100303,
+      "<0x1A>": 100304,
+      "<0x1B>": 100305,
+      "<0x1C>": 100306,
+      "<0x1D>": 100307,
+      "<0x1E>": 100308,
+      "<0x1F>": 100309,
+      "<0x20>": 100310,
+      "<0x21>": 100311,
+      "<0x22>": 100312,
+      "<0x23>": 100313,
+      "<0x24>": 100314,
+      "<0x25>": 100315,
+      "<0x26>": 100316,
+      "<0x27>": 100317,
+      "<0x28>": 100318,
+      "<0x29>": 100319,
+      "<0x2A>": 100320,
+      "<0x2B>": 100321,
+      "<0x2C>": 100322,
+      "<0x2D>": 100323,
+      "<0x2E>": 100324,
+      "<0x2F>": 100325,
+      "<0x30>": 100326,
+      "<0x31>": 100327,
+      "<0x32>": 100328,
+      "<0x33>": 100329,
+      "<0x34>": 100330,
+      "<0x35>": 100331,
+      "<0x36>": 100332,
+      "<0x37>": 100333,
+      "<0x38>": 100334,
+      "<0x39>": 100335,
+      "<0x3A>": 100336,
+      "<0x3B>": 100337,
+      "<0x3C>": 100338,
+      "<0x3D>": 100339,
+      "<0x3E>": 100340,
+      "<0x3F>": 100341,
+      "<0x40>": 100342,
+      "<0x41>": 100343,
+      "<0x42>": 100344,
+      "<0x43>": 100345,
+      "<0x44>": 100346,
+      "<0x45>": 100347,
+      "<0x46>": 100348,
+      "<0x47>": 100349,
+      "<0x48>": 100350,
+      "<0x49>": 100351,
+      "<0x4A>": 100352,
+      "<0x4B>": 100353,
+      "<0x4C>": 100354,
+      "<0x4D>": 100355,
+      "<0x4E>": 100356,
+      "<0x4F>": 100357,
+      "<0x50>": 100358,
+      "<0x51>": 100359,
+      "<0x52>": 100360,
+      "<0x53>": 100361,
+      "<0x54>": 100362,
+      "<0x55>": 100363,
+      "<0x56>": 100364,
+      "<0x57>": 100365,
+      "<0x58>": 100366,
+      "<0x59>": 100367,
+      "<0x5A>": 100368,
+      "<0x5B>": 100369,
+      "<0x5C>": 100370,
+      "<0x5D>": 100371,
+      "<0x5E>": 100372,
+      "<0x5F>": 100373,
+      "<0x60>": 100374,
+      "<0x61>": 100375,
+      "<0x62>": 100376,
+      "<0x63>": 100377,
+      "<0x64>": 100378,
+      "<0x65>": 100379,
+      "<0x66>": 100380,
+      "<0x67>": 100381,
+      "<0x68>": 100382,
+      "<0x69>": 100383,
+      "<0x6A>": 100384,
+      "<0x6B>": 100385,
+      "<0x6C>": 100386,
+      "<0x6D>": 100387,
+      "<0x6E>": 100388,
+      "<0x6F>": 100389,
+      "<0x70>": 100390,
+      "<0x71>": 100391,
+      "<0x72>": 100392,
+      "<0x73>": 100393,
+      "<0x74>": 100394,
+      "<0x75>": 100395,
+      "<0x76>": 100396,
+      "<0x77>": 100397,
+      "<0x78>": 100398,
+      "<0x79>": 100399,
+      "<0x7A>": 100400,
+      "<0x7B>": 100401,
+      "<0x7C>": 100402,
+      "<0x7D>": 100403,
+      "<0x7E>": 100404,
+      "<0x7F>": 100405,
+      "<0x80>": 100406,
+      "<0x81>": 100407,
+      "<0x82>": 100408,
+      "<0x83>": 100409,
+      "<0x84>": 100410,
+      "<0x85>": 100411,
+      "<0x86>": 100412,
+      "<0x87>": 100413,
+      "<0x88>": 100414,
+      "<0x89>": 100415,
+      "<0x8A>": 100416,
+      "<0x8B>": 100417,
+      "<0x8C>": 100418,
+      "<0x8D>": 100419,
+      "<0x8E>": 100420,
+      "<0x8F>": 100421,
+      "<0x90>": 100422,
+      "<0x91>": 100423,
+      "<0x92>": 100424,
+      "<0x93>": 100425,
+      "<0x94>": 100426,
+      "<0x95>": 100427,
+      "<0x96>": 100428,
+      "<0x97>": 100429,
+      "<0x98>": 100430,
+      "<0x99>": 100431,
+      "<0x9A>": 100432,
+      "<0x9B>": 100433,
+      "<0x9C>": 100434,
+      "<0x9D>": 100435,
+      "<0x9E>": 100436,
+      "<0x9F>": 100437,
+      "<0xA0>": 100438,
+      "<0xA1>": 100439,
+      "<0xA2>": 100440,
+      "<0xA3>": 100441,
+      "<0xA4>": 100442,
+      "<0xA5>": 100443,
+      "<0xA6>": 100444,
+      "<0xA7>": 100445,
+      "<0xA8>": 100446,
+      "<0xA9>": 100447,
+      "<0xAA>": 100448,
+      "<0xAB>": 100449,
+      "<0xAC>": 100450,
+      "<0xAD>": 100451,
+      "<0xAE>": 100452,
+      "<0xAF>": 100453,
+      "<0xB0>": 100454,
+      "<0xB1>": 100455,
+      "<0xB2>": 100456,
+      "<0xB3>": 100457,
+      "<0xB4>": 100458,
+      "<0xB5>": 100459,
+      "<0xB6>": 100460,
+      "<0xB7>": 100461,
+      "<0xB8>": 100462,
+      "<0xB9>": 100463,
+      "<0xBA>": 100464,
+      "<0xBB>": 100465,
+      "<0xBC>": 100466,
+      "<0xBD>": 100467,
+      "<0xBE>": 100468,
+      "<0xBF>": 100469,
+      "<0xC0>": 100470,
+      "<0xC1>": 100471,
+      "<0xC2>": 100472,
+      "<0xC3>": 100473,
+      "<0xC4>": 100474,
+      "<0xC5>": 100475,
+      "<0xC6>": 100476,
+      "<0xC7>": 100477,
+      "<0xC8>": 100478,
+      "<0xC9>": 100479,
+      "<0xCA>": 100480,
+      "<0xCB>": 100481,
+      "<0xCC>": 100482,
+      "<0xCD>": 100483,
+      "<0xCE>": 100484,
+      "<0xCF>": 100485,
+      "<0xD0>": 100486,
+      "<0xD1>": 100487,
+      "<0xD2>": 100488,
+      "<0xD3>": 100489,
+      "<0xD4>": 100490,
+      "<0xD5>": 100491,
+      "<0xD6>": 100492,
+      "<0xD7>": 100493,
+      "<0xD8>": 100494,
+      "<0xD9>": 100495,
+      "<0xDA>": 100496,
+      "<0xDB>": 100497,
+      "<0xDC>": 100498,
+      "<0xDD>": 100499,
+      "<0xDE>": 100500,
+      "<0xDF>": 100501,
+      "<0xE0>": 100502,
+      "<0xE1>": 100503,
+      "<0xE2>": 100504,
+      "<0xE3>": 100505,
+      "<0xE4>": 100506,
+      "<0xE5>": 100507,
+      "<0xE6>": 100508,
+      "<0xE7>": 100509,
+      "<0xE8>": 100510,
+      "<0xE9>": 100511,
+      "<0xEA>": 100512,
+      "<0xEB>": 100513,
+      "<0xEC>": 100514,
+      "<0xED>": 100515,
+      "<0xEE>": 100516,
+      "<0xEF>": 100517,
+      "<0xF0>": 100518,
+      "<0xF1>": 100519,
+      "<0xF2>": 100520,
+      "<0xF3>": 100521,
+      "<0xF4>": 100522,
+      "<0xF5>": 100523,
+      "<0xF6>": 100524,
+      "<0xF7>": 100525,
+      "<0xF8>": 100526,
+      "<0xF9>": 100527,
+      "<0xFA>": 100528,
+      "<0xFB>": 100529,
+      "<0xFC>": 100530,
+      "<0xFD>": 100531,
+      "<0xFE>": 100532,
+      "<0xFF>": 100533
     },
     "merges": [
       "▁ t",
       "▁sol ution"
     ]
   }
+}