pom
commited on
Commit
•
33e5a3a
1
Parent(s):
4f64cdc
update XVERSE-13B-Chat model
Browse files- MODEL_LICENSE.pdf +0 -0
- README.md +40 -98
- config.json +2 -2
- configuration_xverse.py +2 -0
- modeling_xverse.py +25 -14
- pytorch_model-00001-of-00015.bin → pytorch_model-00001-of-00010.bin +2 -2
- pytorch_model-00002-of-00015.bin → pytorch_model-00002-of-00010.bin +2 -2
- pytorch_model-00003-of-00015.bin → pytorch_model-00003-of-00010.bin +2 -2
- pytorch_model-00004-of-00015.bin → pytorch_model-00004-of-00010.bin +2 -2
- pytorch_model-00005-of-00010.bin +3 -0
- pytorch_model-00005-of-00015.bin +0 -3
- pytorch_model-00006-of-00010.bin +3 -0
- pytorch_model-00006-of-00015.bin +0 -3
- pytorch_model-00007-of-00010.bin +3 -0
- pytorch_model-00007-of-00015.bin +0 -3
- pytorch_model-00008-of-00010.bin +3 -0
- pytorch_model-00008-of-00015.bin +0 -3
- pytorch_model-00014-of-00015.bin → pytorch_model-00009-of-00010.bin +1 -1
- pytorch_model-00009-of-00015.bin +0 -3
- pytorch_model-00010-of-00010.bin +3 -0
- pytorch_model-00010-of-00015.bin +0 -3
- pytorch_model-00011-of-00015.bin +0 -3
- pytorch_model-00012-of-00015.bin +0 -3
- pytorch_model-00013-of-00015.bin +0 -3
- pytorch_model-00015-of-00015.bin +0 -3
- pytorch_model.bin.index.json +404 -404
- tokenizer.json +269 -13
MODEL_LICENSE.pdf
CHANGED
Binary files a/MODEL_LICENSE.pdf and b/MODEL_LICENSE.pdf differ
|
|
README.md
CHANGED
@@ -14,8 +14,8 @@ inference: false
|
|
14 |
**XVERSE-13B** 是由深圳元象科技自主研发的支持多语言的大语言模型(Large Language Model),主要特点如下:
|
15 |
|
16 |
- **模型结构**:XVERSE-13B 使用主流 Decoder-only 的标准 Transformer 网络结构,支持 8K 的上下文长度(Context Length),为同尺寸模型中最长,能满足更长的多轮对话、知识问答与摘要等需求,模型应用场景更广泛。
|
17 |
-
- **训练数据**:构建了
|
18 |
-
- **分词**:基于 BPE(Byte-Pair Encoding)算法,使用上百 GB 语料训练了一个词表大小为 100,
|
19 |
- **训练框架**:自主研发多项关键技术,包括高效算子、显存优化、并行调度策略、数据-计算-通信重叠、平台和框架协同等,让训练效率更高,模型稳定性强,在千卡集群上的峰值算力利用率可达到 58.5%,位居业界前列。
|
20 |
|
21 |
## Model Introduction
|
@@ -25,113 +25,55 @@ inference: false
|
|
25 |
**XVERSE-13B** is a multilingual large language model, independently developed by Shenzhen Yuanxiang Technology. Its key features are as follows:
|
26 |
|
27 |
- **Model Structure**: XVERSE-13B uses the mainstream Decoder-only Transformer network structure, supports 8k context length, the longest one among models of the same size, which can meet the need of longer multi-round dialogues, knowledge question-answering, and summarization. This makes the model more versatile in application scenarios.
|
28 |
-
- **Training Data**: The model has been thoroughly trained on a diversified and high-quality dataset consisting of
|
29 |
-
- **Tokenization**: Based on the BPE (Byte-Pair Encoding) algorithm, a tokenizer with a vocabulary size of 100,
|
30 |
- **Training Framework**: Several key technologies have also been independently developed, including efficient operators, memory optimization, parallel scheduling strategies, overlap of data-computation-communication, and synergy between platforms and frameworks. These advancements enhance training efficiency and model stability. With these technologies, the peak computational power utilization rate on a thousand-card cluster can reach 58.5%, ranking at the forefront of the industry.
|
31 |
|
32 |
## 评测结果
|
33 |
|
34 |
-
|
35 |
-
|
36 |
-
|
|
37 |
-
| :------------------------: |
|
38 |
-
|
|
39 |
-
|
|
40 |
-
|
|
41 |
-
|
|
42 |
-
|
|
43 |
-
|
|
44 |
-
|
|
45 |
-
|
|
46 |
-
|
|
47 |
-
|
|
48 |
-
|
|
49 |
-
| Ziya-LLaMA-13B-Pretrain-v1| 底座 | 43.9 | 30.2 | 27.2 | 26.4 | 37.6 |
|
50 |
-
| Ziya-LLaMA-13B-v1.1 | 对话 | 50.6 | 29.3 | 23.6 | 26.7 | 27.3 |
|
51 |
-
| **XVERSE-13B** | 底座 | **55.1** | **54.7** | **41.4** | **53.9** | **66.5** |
|
52 |
-
| **XVERSE-13B-Chat** | 对话 | **60.2** | **53.1** | **48.3** | **50.7** | **80.6** |
|
53 |
|
54 |
> <sup>1:只针对其中的单项选择题进行测试,即排除了填空题、开放性问题和多项选择题</sup>
|
55 |
-
> <sup>2:来源于 [Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B) 的汇报结果</sup>
|
56 |
-
> <sup>3:来源于 [C-Eval](https://cevalbenchmark.com/) 的汇报结果</sup>
|
57 |
-
> <sup>4:来源于[Llama 2 论文](https://arxiv.org/abs/2307.09288)的汇报结果</sup>
|
58 |
-
>
|
59 |
-
> 对于 MMLU ,我们采用作者提供的[评测工具](https://github.com/hendrycks/test),C-Eval、AGIEval、GAOKAO-Bench、GAOKAO-English 与 MMLU 的评测方式相同,且统一采用 **5-shot** 构造测试样本。
|
60 |
-
|
61 |
-
## Model Evaluation
|
62 |
-
|
63 |
-
In order to validate the various abilities of the model, we have chosen several comprehensive capability benchmarks across multiple disciplines, including [MMLU](https://arxiv.org/abs/2009.03300) (English), [C-Eval](https://cevalbenchmark.com/) (Chinese), [AGIEval](https://arxiv.org/abs/2304.06364) (Chinese and English), [GAOKAO-Bench](https://github.com/OpenLMLab/GAOKAO-Bench) (Chinese and English), [GAOKAO-English](https://github.com/ExpressAI/AI-Gaokao) (English), the evaluation results are as follows:
|
64 |
|
|
|
|
|
65 |
|
66 |
-
|
67 |
-
| :------------------------: | :--------------: | :--------------: | :--------------: | :-----------------: | :----------------------: | :------------------------: |
|
68 |
-
| Baichuan-13B | pretrained | 51.6<sup>2</sup> | 53.6<sup>3</sup> | 40.5 | 45.9 | 56.9 |
|
69 |
-
| Baichuan-13B-Chat | fine-tuned | 52.1<sup>2</sup> | 51.5<sup>2</sup> | 34.6 | 46.7 | 63.8 |
|
70 |
-
| Chinese-Alpaca-2-13B | fine-tuned | 53.2 | 41.3 | 36.6 | 38.4 | 65.1 |
|
71 |
-
| Llama-1-13B | pretrained | 46.9<sup>4</sup> | 28.8 | 27.3 | 26.4 | 38.1 |
|
72 |
-
| Llama-2-13B | pretrained | 54.8<sup>4</sup> | 35.6 | 33.4 | 35.4 | 60.6 |
|
73 |
-
| moss-moon-003-base (16B) | pretrained | 24.7 | 33.1<sup>3</sup> | 26.8 | 28.5 | 34.7 |
|
74 |
-
| moss-moon-003-sft (16B) | fine-tuned | 25.5 | 33.6 | 27.6 | 28.8 | 29.2 |
|
75 |
-
| OpenLLaMA-13B | pretrained | 42.4 | 24.7 | 24.0 | 25.6 | 33.3 |
|
76 |
-
| OPT-13B | pretrained | 25.2 | 25.0 | 24.2 | 24.4 | 31.1 |
|
77 |
-
| Pythia-12B | pretrained | 25.1 | 26.2 | 25.3 | 25.3 | 26.8 |
|
78 |
-
| Vicuna-13B-v1.5 | fine-tuned | 53.5 | 27.9 | 29.7 | 31.6 | 52.9 |
|
79 |
-
| Ziya-LLaMA-13B-Pretrain-v1| pretrained | 43.9 | 30.2 | 27.2 | 26.4 | 37.6 |
|
80 |
-
| Ziya-LLaMA-13B-v1.1 | fine-tuned | 50.6 | 29.3 | 23.6 | 26.7 | 27.3 |
|
81 |
-
| **XVERSE-13B** | pretrained | **55.1** | **54.7** | **41.4** | **53.9** | **66.5** |
|
82 |
-
| **XVERSE-13B-Chat** | fine-tuned | **60.2** | **53.1** | **48.3** | **50.7** | **80.6** |
|
83 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
84 |
|
85 |
> <sup>1: Tests are conducted only on single-answer multiple-choice questions, thus excluding fill-in-the-blanks, open-ended questions, and multiple-answer multiple-choice questions.</sup>
|
86 |
-
|
87 |
-
|
88 |
-
|
89 |
-
>
|
90 |
-
> For MMLU, we adopt the [evaluation tools](https://github.com/hendrycks/test) provided by the authors, C-Eval, AGIEval, GAOKAO-Bench, GAOKAO-English are the same as MMLU, and uniformly use **5-shot** to construct the test samples.
|
91 |
-
|
92 |
-
### MMLU 各类别指标
|
93 |
-
|
94 |
-
MMLU Category Results
|
95 |
-
|
96 |
-
| Models | Type | Average | STEM | Social Science | Humanities | Others |
|
97 |
-
| :------------------------: | :------------------------: | :------: | :------: | :------------: | :--------: | :------: |
|
98 |
-
| Baichuan-13B | pretrained | 51.6 | 41.6 | 60.9 | 47.4 | 58.5 |
|
99 |
-
| Baichuan-13B-Chat | fine-tuned | 52.1 | 40.9 | 60.9 | 48.8 | 59.0 |
|
100 |
-
| Chinese-Alpaca-2-13B | fine-tuned | 53.2 | 41.8 | 61.2 | 51.3 | 59.2 |
|
101 |
-
| Llama-1-13B | pretrained | 46.9 | 35.8 | 53.8 | 45.0 | 53.3 |
|
102 |
-
| Llama-2-13B | pretrained | 54.8 | 44.1 | 62.6 | 52.8 | 61.1 |
|
103 |
-
| moss-moon-003-base (16B) | pretrained | 24.7 | 23.0 | 24.0 | 25.2 | 26.3 |
|
104 |
-
| moss-moon-003-sft (16B) | fine-tuned | 25.5 | 25.9 | 23.8 | 27.1 | 24.4 |
|
105 |
-
| OpenLLaMA-13B | pretrained | 42.4 | 34.7 | 48.6 | 40.0 | 47.1 |
|
106 |
-
| OPT-13B | pretrained | 25.2 | 23.9 | 24.1 | 25.9 | 26.3 |
|
107 |
-
| Pythia-12B | pretrained | 25.1 | 24.8 | 23.0 | 26.1 | 26.0 |
|
108 |
-
| Vicuna-13B-v1.5 | fine-tuned | 53.5 | 42.3 | 61.3 | 50.3 | 60.9 |
|
109 |
-
| Ziya-LLaMA-13B-Pretrain-v1 | pretrained | 43.9 | 36.3 | 48.8 | 41.1 | 50.3 |
|
110 |
-
| Ziya-LLaMA-13B-v1.1 | fine-tuned | 50.6 | 40.7 | 57.8 | 48.1 | 56.7 |
|
111 |
-
| **XVERSE-13B** | pretrained | **55.1** | **44.5** | **64.4** | **50.5** | **62.9** |
|
112 |
-
| **XVERSE-13B-Chat** | fine-tuned | **60.2** | **48.1** | **67.7** | **56.4** | **68.0** |
|
113 |
-
|
114 |
-
### C-Eval 各类别指标
|
115 |
-
|
116 |
-
C-Eval Category Results
|
117 |
-
|
118 |
-
| Models | Type | Average | STEM | Social Science | Humanities | Others |
|
119 |
-
| :------------------------: | :------------------------: | :------: | :------: | :------------: | :--------: | :------: |
|
120 |
-
| Baichuan-13B | pretrained | 53.6 | 47.0 | 66.8 | 57.3 | 49.8 |
|
121 |
-
| Baichuan-13B-Chat | fine-tuned | 51.5 | 43.7 | 64.6 | 56.2 | 49.2 |
|
122 |
-
| Chinese-Alpaca-2-13B | fine-tuned | 41.3 | 37.8 | 51.1 | 42.4 | 37.8 |
|
123 |
-
| Llama-1-13B | pretrained | 28.8 | 27.5 | 33.9 | 27.7 | 27.7 |
|
124 |
-
| Llama-2-13B | pretrained | 35.6 | 34.5 | 39.8 | 36.2 | 33.2 |
|
125 |
-
| moss-moon-003-base (16B) | pretrained | 33.1 | 31.6 | 37.0 | 33.4 | 32.1 |
|
126 |
-
| moss-moon-003-sft (16B) | fine-tuned | 33.6 | 31.4 | 38.6 | 33.8 | 32.9 |
|
127 |
-
| OpenLLaMA-13B | pretrained | 24.7 | 25.5 | 23.5 | 24.2 | 24.7 |
|
128 |
-
| OPT-13B | pretrained | 25.0 | 24.4 | 24.6 | 25.9 | 25.4 |
|
129 |
-
| Pythia-12B | pretrained | 26.2 | 26.8 | 25.1 | 26.7 | 25.4 |
|
130 |
-
| Vicuna-13B-v1.5 | fine-tuned | 27.9 | 25.4 | 33.2 | 29.3 | 26.2 |
|
131 |
-
| Ziya-LLaMA-13B-Pretrain-v1 | pretrained | 30.2 | 27.8 | 34.3 | 32.0 | 29.0 |
|
132 |
-
| Ziya-LLaMA-13B-v1.1 | fine-tuned | 29.3 | 27.5 | 32.8 | 29.7 | 29.0 |
|
133 |
-
| **XVERSE-13B** | pretrained | **54.7** | **45.6** | **66.2** | **58.3** | **56.9** |
|
134 |
-
| **XVERSE-13B-Chat** | fine-tuned | **53.1** | **44.5** | **65.3** | **56.5** | **54.3** |
|
135 |
|
136 |
### Loading with Transformers
|
137 |
|
|
|
14 |
**XVERSE-13B** 是由深圳元象科技自主研发的支持多语言的大语言模型(Large Language Model),主要特点如下:
|
15 |
|
16 |
- **模型结构**:XVERSE-13B 使用主流 Decoder-only 的标准 Transformer 网络结构,支持 8K 的上下文长度(Context Length),为同尺寸模型中最长,能满足更长的多轮对话、知识问答与摘要等需求,模型应用场景更广泛。
|
17 |
+
- **训练数据**:构建了 3.2 万亿 token 的高质量、多样化的数据对模型进行充分训练,包含中、英、俄、西等 40 多种语言,通过精细化设置不同类型数据的采样比例,使得中英两种语言表现优异,也能兼顾其他语言效果。
|
18 |
+
- **分词**:基于 BPE(Byte-Pair Encoding)算法,使用上百 GB 语料训练了一个词表大小为 100,534 的分词器,能够同时支持多语言,而无需额外扩展词表。
|
19 |
- **训练框架**:自主研发多项关键技术,包括高效算子、显存优化、并行调度策略、数据-计算-通信重叠、平台和框架协同等,让训练效率更高,模型稳定性强,在千卡集群上的峰值算力利用率可达到 58.5%,位居业界前列。
|
20 |
|
21 |
## Model Introduction
|
|
|
25 |
**XVERSE-13B** is a multilingual large language model, independently developed by Shenzhen Yuanxiang Technology. Its key features are as follows:
|
26 |
|
27 |
- **Model Structure**: XVERSE-13B uses the mainstream Decoder-only Transformer network structure, supports 8k context length, the longest one among models of the same size, which can meet the need of longer multi-round dialogues, knowledge question-answering, and summarization. This makes the model more versatile in application scenarios.
|
28 |
+
- **Training Data**: The model has been thoroughly trained on a diversified and high-quality dataset consisting of 3.2 trillion of tokens, including more than 40 languages such as Chinese, English, Russian, and Spanish. The sampling ratio of different types of data is finely set, which makes the performance of Chinese and English excellent, and also takes into account the effect of other languages.
|
29 |
+
- **Tokenization**: Based on the BPE (Byte-Pair Encoding) algorithm, a tokenizer with a vocabulary size of 100,534 has been trained using hundreds of gigabytes of language data. This tokenizer is capable of supporting multilingual without the need for additional vocabulary expansion.
|
30 |
- **Training Framework**: Several key technologies have also been independently developed, including efficient operators, memory optimization, parallel scheduling strategies, overlap of data-computation-communication, and synergy between platforms and frameworks. These advancements enhance training efficiency and model stability. With these technologies, the peak computational power utilization rate on a thousand-card cluster can reach 58.5%, ranking at the forefront of the industry.
|
31 |
|
32 |
## 评测结果
|
33 |
|
34 |
+
为了综合评估模型的性能,我们在一系列标准数据集上进行了全面测试,包括C-Eval、CMMLU、Gaokao-Bench、MMLU、GAOKAO-English、AGIEval、RACE-M、CommonSenseQA、PIQA、GSM8K和HumanEval。这些评估覆盖了模型在多个领域的能力,具体包括中文问答、英文问答、语言理解、常识问答、逻辑推理、数学问题解答以及编程能力。评估结果如下:
|
35 |
+
|
36 |
+
| 能力维度 | 数据集 | | XVERSE-13B-2 | XVERSE-13B | Baichuan2-13B | Llama1-13B | Llama2-13B |
|
37 |
+
| :--------: | :------------------------: | :----: | :----------: | :--------: | :-----------: | :--------: | :--------: |
|
38 |
+
| 中文问答 | C-Eval | 5-shot | 63.5 | 54.7 | 58.1 | 28.8 | 35.6 |
|
39 |
+
| | CMMLU | 5-shot | 66.2 | 59.1 | 62.0 | 31.5 | 38.4 |
|
40 |
+
| | Gaokao-Bench<sup>1</sup> | 5-shot | 67.5 | 53.9 | 54.3 | 26.4 | 35.4 |
|
41 |
+
| 英文问答 | MMLU | 5-shot | 61.2 | 55.1 | 59.2 | 46.9 | 54.8 |
|
42 |
+
| | GAOKAO-English<sup>1</sup> | 5-shot | 73.7 | 66.5 | 67.7 | 38.1 | 60.6 |
|
43 |
+
| 中英文问答 | AGIEval<sup>1</sup> | 5-shot | 54.5 | 41.4 | 48.2 | 27.3 | 33.4 |
|
44 |
+
| 语言理解 | RACE-M | 0-shot | 84.6 | 74.2 | 68.9 | 61.6 | 63.0 |
|
45 |
+
| 常识问答 | CommonSenseQA | 7-shot | 74.0 | 69.5 | 65.6 | 62.0 | 67.3 |
|
46 |
+
| 推理 | PIQA | 0-shot | 80.8 | 79.0 | 78.5 | 80.1 | 80.5 |
|
47 |
+
| 数学 | GSM8K | 4-shot | 54.9 | 18.4 | 52.7 | 17.8 | 28.7 |
|
48 |
+
| 代码 | HumanEval | 0-shot | 39.6 | 15.9 | 17.1 | 15.8 | 18.3 |
|
|
|
|
|
|
|
|
|
49 |
|
50 |
> <sup>1:只针对其中的单项选择题进行测试,即排除了填空题、开放性问题和多项选择题</sup>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
51 |
|
52 |
+
对于上述所有比较模型,我们优先汇报其官方公布的结果。在缺少官方结果的情况下,我们采用了 [OpenCompass 榜单](https://opencompass.org.cn/leaderboard-llm)的报告结果。其他结果则来自于我们自行执行的评估流程所获得的数据。
|
53 |
+
对于 MMLU ,我们采用作者提供的[评测工具](https://github.com/hendrycks/test),C-Eval、AGIEval、GAOKAO-Bench、GAOKAO-English 与 MMLU 的评测方式相同,其余评测数据集使用 [OpenCompass 评估框架](https://github.com/open-compass/OpenCompass/)进行评估。
|
54 |
|
55 |
+
## Model Evaluation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
56 |
|
57 |
+
To comprehensively assess the performance of the model, we conducted extensive testing across a range of standard datasets, including C-Eval, CMMLU, Gaokao-Bench, MMLU, GAOKAO-English, AGIEval, RACE-M, CommonSenseQA, PIQA, GSM8K and HumanEval. These evaluations spanned multiple capabilities of the model, specifically including Chinese question answering, English question answering, language comprehension, common sense questioning, logical reasoning, mathematical problem-solving, and coding ability. The results of the evaluations are as follows:
|
58 |
+
|
59 |
+
| Capability Dimension | Dataset | | XVERSE-13B-2 | XVERSE-13B | Baichuan2-13B | Llama1-13B | Llama2-13B |
|
60 |
+
| :--------------------: | :------------------------: | :----: | :----------: | :--------: | :-----------: | :--------: | :--------: |
|
61 |
+
| Chinese QA | C-Eval | 5-shot | 63.5 | 54.7 | 58.1 | 28.8 | 35.6 |
|
62 |
+
| | CMMLU | 5-shot | 66.2 | 59.1 | 62.0 | 31.5 | 38.4 |
|
63 |
+
| | Gaokao-Bench<sup>1</sup> | 5-shot | 67.5 | 53.9 | 54.3 | 26.4 | 35.4 |
|
64 |
+
| English QA | MMLU | 5-shot | 61.2 | 55.1 | 59.2 | 46.9 | 54.8 |
|
65 |
+
| | GAOKAO-English<sup>1</sup> | 5-shot | 73.7 | 66.5 | 67.7 | 38.1 | 60.6 |
|
66 |
+
| Chinese & English QA | AGIEval<sup>1</sup> | 5-shot | 54.5 | 41.4 | 48.2 | 27.3 | 33.4 |
|
67 |
+
| Language Understanding | RACE-M | 0-shot | 84.6 | 74.2 | 68.9 | 61.6 | 63.0 |
|
68 |
+
| Common Sense QA | CommonSenseQA | 7-shot | 74.0 | 69.5 | 65.6 | 62.0 | 67.3 |
|
69 |
+
| Reasoning | PIQA | 0-shot | 80.8 | 79.0 | 78.5 | 80.1 | 80.5 |
|
70 |
+
| Math | GSM8K | 4-shot | 54.9 | 18.4 | 52.7 | 17.8 | 28.7 |
|
71 |
+
| Coding | HumanEval | 0-shot | 39.6 | 15.9 | 17.1 | 15.8 | 18.3 |
|
72 |
|
73 |
> <sup>1: Tests are conducted only on single-answer multiple-choice questions, thus excluding fill-in-the-blanks, open-ended questions, and multiple-answer multiple-choice questions.</sup>
|
74 |
+
|
75 |
+
For all the comparison models mentioned above, we prioritize the disclosure of their officially published results. In the absence of official data, we refer to the reported outcomes from [OpenCompass Leaderboard](https://opencompass.org.cn/leaderboard-llm). Results not covered by the aforementioned sources are derived from our own evaluation pipline.
|
76 |
+
For MMLU, we adopt the [evaluation tools](https://github.com/hendrycks/test) provided by the authors, C-Eval, AGIEval, GAOKAO-Bench, GAOKAO-English are the same as MMLU. For the remaining evaluation datasets, the [OpenCompass](https://github.com/open-compass/OpenCompass/) is employed for evaluation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
77 |
|
78 |
### Loading with Transformers
|
79 |
|
config.json
CHANGED
@@ -14,6 +14,7 @@
|
|
14 |
"initializer_range": 0.02,
|
15 |
"intermediate_size": 13824,
|
16 |
"max_position_embeddings": 8192,
|
|
|
17 |
"model_type": "xverse",
|
18 |
"num_attention_heads": 40,
|
19 |
"num_hidden_layers": 40,
|
@@ -22,6 +23,5 @@
|
|
22 |
"torch_dtype": "bfloat16",
|
23 |
"transformers_version": "4.28.1",
|
24 |
"use_cache": true,
|
25 |
-
"vocab_size":
|
26 |
}
|
27 |
-
|
|
|
14 |
"initializer_range": 0.02,
|
15 |
"intermediate_size": 13824,
|
16 |
"max_position_embeddings": 8192,
|
17 |
+
"max_tokenizer_truncation": 6144,
|
18 |
"model_type": "xverse",
|
19 |
"num_attention_heads": 40,
|
20 |
"num_hidden_layers": 40,
|
|
|
23 |
"torch_dtype": "bfloat16",
|
24 |
"transformers_version": "4.28.1",
|
25 |
"use_cache": true,
|
26 |
+
"vocab_size": 100534
|
27 |
}
|
|
configuration_xverse.py
CHANGED
@@ -91,6 +91,7 @@ class XverseConfig(PretrainedConfig):
|
|
91 |
num_attention_heads=40,
|
92 |
hidden_act="silu",
|
93 |
max_position_embeddings=8192,
|
|
|
94 |
initializer_range=0.02,
|
95 |
rms_norm_eps=1e-6,
|
96 |
use_cache=True,
|
@@ -111,6 +112,7 @@ class XverseConfig(PretrainedConfig):
|
|
111 |
self.initializer_range = initializer_range
|
112 |
self.rms_norm_eps = rms_norm_eps
|
113 |
self.use_cache = use_cache
|
|
|
114 |
|
115 |
super().__init__(
|
116 |
pad_token_id=pad_token_id,
|
|
|
91 |
num_attention_heads=40,
|
92 |
hidden_act="silu",
|
93 |
max_position_embeddings=8192,
|
94 |
+
max_tokenizer_truncation=8192,
|
95 |
initializer_range=0.02,
|
96 |
rms_norm_eps=1e-6,
|
97 |
use_cache=True,
|
|
|
112 |
self.initializer_range = initializer_range
|
113 |
self.rms_norm_eps = rms_norm_eps
|
114 |
self.use_cache = use_cache
|
115 |
+
self.max_tokenizer_truncation = max_tokenizer_truncation
|
116 |
|
117 |
super().__init__(
|
118 |
pad_token_id=pad_token_id,
|
modeling_xverse.py
CHANGED
@@ -611,8 +611,6 @@ class XverseModel(XversePreTrainedModel):
|
|
611 |
|
612 |
|
613 |
class XverseForCausalLM(XversePreTrainedModel):
|
614 |
-
_tied_weights_keys = ["lm_head.weight"]
|
615 |
-
|
616 |
def __init__(self, config):
|
617 |
super().__init__(config)
|
618 |
self.model = XverseModel(config)
|
@@ -732,15 +730,22 @@ class XverseForCausalLM(XversePreTrainedModel):
|
|
732 |
max_new_tokens = max_new_tokens or self.generation_config.max_new_tokens
|
733 |
max_input_tokens = self.config.max_position_embeddings - max_new_tokens
|
734 |
max_input_tokens = max(self.config.max_position_embeddings // 2, max_input_tokens)
|
|
|
735 |
|
736 |
total_input, round_input = [], []
|
737 |
-
|
|
|
|
|
|
|
|
|
738 |
for i, message in enumerate(messages[::-1]):
|
739 |
-
if message['role'] == 'user':
|
740 |
-
user_content = f"{
|
|
|
|
|
741 |
if i == 0:
|
742 |
-
|
743 |
-
|
744 |
round_input = content_tokens + round_input
|
745 |
|
746 |
if i != 0:
|
@@ -754,12 +759,20 @@ class XverseForCausalLM(XversePreTrainedModel):
|
|
754 |
break
|
755 |
round_input = []
|
756 |
elif message['role'] == 'assistant':
|
757 |
-
assist_content = f"{
|
758 |
-
content_tokens = tokenizer.encode(assist_content, return_token_type_ids=False)
|
759 |
round_input = content_tokens + [self.generation_config.eos_token_id] + round_input
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
760 |
else:
|
761 |
raise ValueError(f"message role not supported yet: {message['role']}")
|
762 |
-
total_input = total_input[-max_input_tokens:] # truncate left
|
763 |
total_input = torch.LongTensor([total_input]).to(self.device)
|
764 |
return total_input
|
765 |
|
@@ -779,7 +792,7 @@ class XverseForCausalLM(XversePreTrainedModel):
|
|
779 |
thread = Thread(target=self.generate, kwargs=generation_kwargs)
|
780 |
thread.start()
|
781 |
for next_text in streamer:
|
782 |
-
yield next_text.
|
783 |
|
784 |
return stream_generator()
|
785 |
else:
|
@@ -822,9 +835,7 @@ class XverseForCausalLM(XversePreTrainedModel):
|
|
822 |
def _reorder_cache(past_key_values, beam_idx):
|
823 |
reordered_past = ()
|
824 |
for layer_past in past_key_values:
|
825 |
-
reordered_past += (
|
826 |
-
tuple(past_state.index_select(0, beam_idx.to(past_state.device)) for past_state in layer_past),
|
827 |
-
)
|
828 |
return reordered_past
|
829 |
|
830 |
def quantize(self, bit_length: int):
|
|
|
611 |
|
612 |
|
613 |
class XverseForCausalLM(XversePreTrainedModel):
|
|
|
|
|
614 |
def __init__(self, config):
|
615 |
super().__init__(config)
|
616 |
self.model = XverseModel(config)
|
|
|
730 |
max_new_tokens = max_new_tokens or self.generation_config.max_new_tokens
|
731 |
max_input_tokens = self.config.max_position_embeddings - max_new_tokens
|
732 |
max_input_tokens = max(self.config.max_position_embeddings // 2, max_input_tokens)
|
733 |
+
max_input_tokens = min(self.config.max_tokenizer_truncation, max_input_tokens)
|
734 |
|
735 |
total_input, round_input = [], []
|
736 |
+
user_prompt_tokens = tokenizer.encode("Human: ", return_token_type_ids=False)
|
737 |
+
exec_prompt_tokens = tokenizer.encode("Exec: ", return_token_type_ids=False)
|
738 |
+
assist_prompt_tokens = tokenizer.encode("Assistant: ", return_token_type_ids=False)
|
739 |
+
assist_prompt_len = len(assist_prompt_tokens)
|
740 |
+
|
741 |
for i, message in enumerate(messages[::-1]):
|
742 |
+
if message['role'] == 'user' or message['role'] == 'exec':
|
743 |
+
user_content = f"{message['content']}\n\n"
|
744 |
+
content_tokens = user_prompt_tokens + tokenizer.encode(user_content, return_token_type_ids=False) if message['role'] == 'user' else \
|
745 |
+
exec_prompt_tokens + tokenizer.encode(user_content, return_token_type_ids=False)
|
746 |
if i == 0:
|
747 |
+
content_tokens = content_tokens[:max_input_tokens-assist_prompt_len]
|
748 |
+
content_tokens += assist_prompt_tokens
|
749 |
round_input = content_tokens + round_input
|
750 |
|
751 |
if i != 0:
|
|
|
759 |
break
|
760 |
round_input = []
|
761 |
elif message['role'] == 'assistant':
|
762 |
+
assist_content = f"{message['content']}"
|
763 |
+
content_tokens = assist_prompt_tokens + tokenizer.encode(assist_content, return_token_type_ids=False)
|
764 |
round_input = content_tokens + [self.generation_config.eos_token_id] + round_input
|
765 |
+
elif message['role'] == 'system':
|
766 |
+
assert i == len(messages) - 1
|
767 |
+
user_content = f"{message['content']}\n"
|
768 |
+
content_tokens = tokenizer.encode(user_content, return_token_type_ids=False)
|
769 |
+
round_input = user_prompt_tokens + content_tokens + round_input
|
770 |
+
if len(total_input) + len(round_input) > max_input_tokens:
|
771 |
+
break
|
772 |
+
else:
|
773 |
+
total_input = round_input + total_input
|
774 |
else:
|
775 |
raise ValueError(f"message role not supported yet: {message['role']}")
|
|
|
776 |
total_input = torch.LongTensor([total_input]).to(self.device)
|
777 |
return total_input
|
778 |
|
|
|
792 |
thread = Thread(target=self.generate, kwargs=generation_kwargs)
|
793 |
thread.start()
|
794 |
for next_text in streamer:
|
795 |
+
yield next_text.replace(tokenizer.eos_token, "")
|
796 |
|
797 |
return stream_generator()
|
798 |
else:
|
|
|
835 |
def _reorder_cache(past_key_values, beam_idx):
|
836 |
reordered_past = ()
|
837 |
for layer_past in past_key_values:
|
838 |
+
reordered_past += (tuple(past_state.index_select(0, beam_idx) for past_state in layer_past),)
|
|
|
|
|
839 |
return reordered_past
|
840 |
|
841 |
def quantize(self, bit_length: int):
|
pytorch_model-00001-of-00015.bin → pytorch_model-00001-of-00010.bin
RENAMED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7ac6f98cae6a0b3768822474284d619beda358b68304a8bde5f1e493a694ef4e
|
3 |
+
size 2508131049
|
pytorch_model-00002-of-00015.bin → pytorch_model-00002-of-00010.bin
RENAMED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7715c66734a8871bbd764528ac509caa22b9c7a44b3e2b50ceb5bde1b237f6d5
|
3 |
+
size 3172057468
|
pytorch_model-00003-of-00015.bin → pytorch_model-00003-of-00010.bin
RENAMED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:33b1866910aeadb014e0462c828e5d03d8a52044a3405253ab2f786c8c17279e
|
3 |
+
size 3172057468
|
pytorch_model-00004-of-00015.bin → pytorch_model-00004-of-00010.bin
RENAMED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b1ad2384bf041f4b3eb0dfa9cd2fd36ca9dea9761504c3b945cfb8302c7449a9
|
3 |
+
size 3172057532
|
pytorch_model-00005-of-00010.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ebb74708adecad7f7ddf3a5d7ab327a4fcb61c8f0dfb6d66e31e82475a914af7
|
3 |
+
size 3172057532
|
pytorch_model-00005-of-00015.bin
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:4bbd7ff1159cb112d72a0bd0256783478d4cbccb684c87ec26992b1dfa952996
|
3 |
-
size 1903234214
|
|
|
|
|
|
|
|
pytorch_model-00006-of-00010.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:2df8350a00f3c5e7e1cf65ea7731c69343df05d5e52205b3284bb1dc43d0edfb
|
3 |
+
size 3172057532
|
pytorch_model-00006-of-00015.bin
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:d018135a9b98f873303c8fb6fd1e110fcab63aa63617a03bb17c62e535d5bfa4
|
3 |
-
size 1903234214
|
|
|
|
|
|
|
|
pytorch_model-00007-of-00010.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:74cb38f652c76808a77e695afa4157924b1d0ce21db6bc18d1faa6fd7a842aff
|
3 |
+
size 3172057532
|
pytorch_model-00007-of-00015.bin
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:1c843829a5757d6fdc506471bae3516cf18aa512ba3ffd7ade30c353606b9427
|
3 |
-
size 1903234214
|
|
|
|
|
|
|
|
pytorch_model-00008-of-00010.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e75e08a58078d4aa1302feaf449241b4c298344b9451b0bcbcb460677e0a7718
|
3 |
+
size 3172057532
|
pytorch_model-00008-of-00015.bin
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:e774213e567df646459b9db90ad70bab09f39a9eca6a894487e7c4aebbaf32a1
|
3 |
-
size 1903234214
|
|
|
|
|
|
|
|
pytorch_model-00014-of-00015.bin → pytorch_model-00009-of-00010.bin
RENAMED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 1693507250
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:fa3de49840e4b259ff458a38e5d5a1720a6d1daca77b071a3774600effc16ca2
|
3 |
size 1693507250
|
pytorch_model-00009-of-00015.bin
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:8e2ee37b9e7e30f6bd914c219550a26eb2e364ff55a1089de1114242f4dc742f
|
3 |
-
size 1903234214
|
|
|
|
|
|
|
|
pytorch_model-00010-of-00010.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:17d1b0fa1afc4439ac6d633fe77a4eebd93e0a23f2433e7c39058b9e5ea31a7b
|
3 |
+
size 1029571307
|
pytorch_model-00010-of-00015.bin
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:8b519216f13ae9d3db29e3bf3baa61051038ab686fc5bcc5a056d54215dce66c
|
3 |
-
size 1903234214
|
|
|
|
|
|
|
|
pytorch_model-00011-of-00015.bin
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:3f9af8f0d46f93b2dd86f40ec115ddcedf42dff498d8481c96bcfc47d655b7f7
|
3 |
-
size 1903234214
|
|
|
|
|
|
|
|
pytorch_model-00012-of-00015.bin
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:dc8045ea179095d6961c8bb579b00cb58a7c65eb17fca343397f55acb30d8fe4
|
3 |
-
size 1903234214
|
|
|
|
|
|
|
|
pytorch_model-00013-of-00015.bin
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:92a9f68f76d1b0542c441154a5f05049cf5ca3aaa82166e101a7a4018b44ea37
|
3 |
-
size 1903234214
|
|
|
|
|
|
|
|
pytorch_model-00015-of-00015.bin
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:c395ce9729e0cc6a3824c1645838ebc96a4f232ac848e69ecf092fdfcf4380bc
|
3 |
-
size 1026867947
|
|
|
|
|
|
|
|
pytorch_model.bin.index.json
CHANGED
@@ -1,410 +1,410 @@
|
|
1 |
{
|
2 |
"metadata": {
|
3 |
-
"total_size":
|
4 |
},
|
5 |
"weight_map": {
|
6 |
-
"lm_head.weight": "pytorch_model-
|
7 |
-
"model.embed_tokens.weight": "pytorch_model-00001-of-
|
8 |
-
"model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-
|
9 |
-
"model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-
|
10 |
-
"model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-
|
11 |
-
"model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-
|
12 |
-
"model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-
|
13 |
-
"model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-
|
14 |
-
"model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-
|
15 |
-
"model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-
|
16 |
-
"model.layers.0.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-
|
17 |
-
"model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-
|
18 |
-
"model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-
|
19 |
-
"model.layers.1.mlp.down_proj.weight": "pytorch_model-
|
20 |
-
"model.layers.1.mlp.gate_proj.weight": "pytorch_model-
|
21 |
-
"model.layers.1.mlp.up_proj.weight": "pytorch_model-
|
22 |
-
"model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-
|
23 |
-
"model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-
|
24 |
-
"model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-
|
25 |
-
"model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-
|
26 |
-
"model.layers.1.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-
|
27 |
-
"model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-
|
28 |
-
"model.layers.10.input_layernorm.weight": "pytorch_model-
|
29 |
-
"model.layers.10.mlp.down_proj.weight": "pytorch_model-
|
30 |
-
"model.layers.10.mlp.gate_proj.weight": "pytorch_model-
|
31 |
-
"model.layers.10.mlp.up_proj.weight": "pytorch_model-
|
32 |
-
"model.layers.10.post_attention_layernorm.weight": "pytorch_model-
|
33 |
-
"model.layers.10.self_attn.k_proj.weight": "pytorch_model-
|
34 |
-
"model.layers.10.self_attn.o_proj.weight": "pytorch_model-
|
35 |
-
"model.layers.10.self_attn.q_proj.weight": "pytorch_model-
|
36 |
-
"model.layers.10.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
37 |
-
"model.layers.10.self_attn.v_proj.weight": "pytorch_model-
|
38 |
-
"model.layers.11.input_layernorm.weight": "pytorch_model-
|
39 |
-
"model.layers.11.mlp.down_proj.weight": "pytorch_model-
|
40 |
-
"model.layers.11.mlp.gate_proj.weight": "pytorch_model-
|
41 |
-
"model.layers.11.mlp.up_proj.weight": "pytorch_model-
|
42 |
-
"model.layers.11.post_attention_layernorm.weight": "pytorch_model-
|
43 |
-
"model.layers.11.self_attn.k_proj.weight": "pytorch_model-
|
44 |
-
"model.layers.11.self_attn.o_proj.weight": "pytorch_model-
|
45 |
-
"model.layers.11.self_attn.q_proj.weight": "pytorch_model-
|
46 |
-
"model.layers.11.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
47 |
-
"model.layers.11.self_attn.v_proj.weight": "pytorch_model-
|
48 |
-
"model.layers.12.input_layernorm.weight": "pytorch_model-
|
49 |
-
"model.layers.12.mlp.down_proj.weight": "pytorch_model-
|
50 |
-
"model.layers.12.mlp.gate_proj.weight": "pytorch_model-
|
51 |
-
"model.layers.12.mlp.up_proj.weight": "pytorch_model-
|
52 |
-
"model.layers.12.post_attention_layernorm.weight": "pytorch_model-
|
53 |
-
"model.layers.12.self_attn.k_proj.weight": "pytorch_model-
|
54 |
-
"model.layers.12.self_attn.o_proj.weight": "pytorch_model-
|
55 |
-
"model.layers.12.self_attn.q_proj.weight": "pytorch_model-
|
56 |
-
"model.layers.12.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
57 |
-
"model.layers.12.self_attn.v_proj.weight": "pytorch_model-
|
58 |
-
"model.layers.13.input_layernorm.weight": "pytorch_model-
|
59 |
-
"model.layers.13.mlp.down_proj.weight": "pytorch_model-
|
60 |
-
"model.layers.13.mlp.gate_proj.weight": "pytorch_model-
|
61 |
-
"model.layers.13.mlp.up_proj.weight": "pytorch_model-
|
62 |
-
"model.layers.13.post_attention_layernorm.weight": "pytorch_model-
|
63 |
-
"model.layers.13.self_attn.k_proj.weight": "pytorch_model-
|
64 |
-
"model.layers.13.self_attn.o_proj.weight": "pytorch_model-
|
65 |
-
"model.layers.13.self_attn.q_proj.weight": "pytorch_model-
|
66 |
-
"model.layers.13.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
67 |
-
"model.layers.13.self_attn.v_proj.weight": "pytorch_model-
|
68 |
-
"model.layers.14.input_layernorm.weight": "pytorch_model-
|
69 |
-
"model.layers.14.mlp.down_proj.weight": "pytorch_model-
|
70 |
-
"model.layers.14.mlp.gate_proj.weight": "pytorch_model-
|
71 |
-
"model.layers.14.mlp.up_proj.weight": "pytorch_model-
|
72 |
-
"model.layers.14.post_attention_layernorm.weight": "pytorch_model-
|
73 |
-
"model.layers.14.self_attn.k_proj.weight": "pytorch_model-
|
74 |
-
"model.layers.14.self_attn.o_proj.weight": "pytorch_model-
|
75 |
-
"model.layers.14.self_attn.q_proj.weight": "pytorch_model-
|
76 |
-
"model.layers.14.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
77 |
-
"model.layers.14.self_attn.v_proj.weight": "pytorch_model-
|
78 |
-
"model.layers.15.input_layernorm.weight": "pytorch_model-
|
79 |
-
"model.layers.15.mlp.down_proj.weight": "pytorch_model-
|
80 |
-
"model.layers.15.mlp.gate_proj.weight": "pytorch_model-
|
81 |
-
"model.layers.15.mlp.up_proj.weight": "pytorch_model-
|
82 |
-
"model.layers.15.post_attention_layernorm.weight": "pytorch_model-
|
83 |
-
"model.layers.15.self_attn.k_proj.weight": "pytorch_model-
|
84 |
-
"model.layers.15.self_attn.o_proj.weight": "pytorch_model-
|
85 |
-
"model.layers.15.self_attn.q_proj.weight": "pytorch_model-
|
86 |
-
"model.layers.15.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
87 |
-
"model.layers.15.self_attn.v_proj.weight": "pytorch_model-
|
88 |
-
"model.layers.16.input_layernorm.weight": "pytorch_model-
|
89 |
-
"model.layers.16.mlp.down_proj.weight": "pytorch_model-
|
90 |
-
"model.layers.16.mlp.gate_proj.weight": "pytorch_model-
|
91 |
-
"model.layers.16.mlp.up_proj.weight": "pytorch_model-
|
92 |
-
"model.layers.16.post_attention_layernorm.weight": "pytorch_model-
|
93 |
-
"model.layers.16.self_attn.k_proj.weight": "pytorch_model-
|
94 |
-
"model.layers.16.self_attn.o_proj.weight": "pytorch_model-
|
95 |
-
"model.layers.16.self_attn.q_proj.weight": "pytorch_model-
|
96 |
-
"model.layers.16.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
97 |
-
"model.layers.16.self_attn.v_proj.weight": "pytorch_model-
|
98 |
-
"model.layers.17.input_layernorm.weight": "pytorch_model-
|
99 |
-
"model.layers.17.mlp.down_proj.weight": "pytorch_model-
|
100 |
-
"model.layers.17.mlp.gate_proj.weight": "pytorch_model-
|
101 |
-
"model.layers.17.mlp.up_proj.weight": "pytorch_model-
|
102 |
-
"model.layers.17.post_attention_layernorm.weight": "pytorch_model-
|
103 |
-
"model.layers.17.self_attn.k_proj.weight": "pytorch_model-
|
104 |
-
"model.layers.17.self_attn.o_proj.weight": "pytorch_model-
|
105 |
-
"model.layers.17.self_attn.q_proj.weight": "pytorch_model-
|
106 |
-
"model.layers.17.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
107 |
-
"model.layers.17.self_attn.v_proj.weight": "pytorch_model-
|
108 |
-
"model.layers.18.input_layernorm.weight": "pytorch_model-
|
109 |
-
"model.layers.18.mlp.down_proj.weight": "pytorch_model-
|
110 |
-
"model.layers.18.mlp.gate_proj.weight": "pytorch_model-
|
111 |
-
"model.layers.18.mlp.up_proj.weight": "pytorch_model-
|
112 |
-
"model.layers.18.post_attention_layernorm.weight": "pytorch_model-
|
113 |
-
"model.layers.18.self_attn.k_proj.weight": "pytorch_model-
|
114 |
-
"model.layers.18.self_attn.o_proj.weight": "pytorch_model-
|
115 |
-
"model.layers.18.self_attn.q_proj.weight": "pytorch_model-
|
116 |
-
"model.layers.18.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
117 |
-
"model.layers.18.self_attn.v_proj.weight": "pytorch_model-
|
118 |
-
"model.layers.19.input_layernorm.weight": "pytorch_model-
|
119 |
-
"model.layers.19.mlp.down_proj.weight": "pytorch_model-
|
120 |
-
"model.layers.19.mlp.gate_proj.weight": "pytorch_model-
|
121 |
-
"model.layers.19.mlp.up_proj.weight": "pytorch_model-
|
122 |
-
"model.layers.19.post_attention_layernorm.weight": "pytorch_model-
|
123 |
-
"model.layers.19.self_attn.k_proj.weight": "pytorch_model-
|
124 |
-
"model.layers.19.self_attn.o_proj.weight": "pytorch_model-
|
125 |
-
"model.layers.19.self_attn.q_proj.weight": "pytorch_model-
|
126 |
-
"model.layers.19.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
127 |
-
"model.layers.19.self_attn.v_proj.weight": "pytorch_model-
|
128 |
-
"model.layers.2.input_layernorm.weight": "pytorch_model-
|
129 |
-
"model.layers.2.mlp.down_proj.weight": "pytorch_model-00002-of-
|
130 |
-
"model.layers.2.mlp.gate_proj.weight": "pytorch_model-00002-of-
|
131 |
-
"model.layers.2.mlp.up_proj.weight": "pytorch_model-00002-of-
|
132 |
-
"model.layers.2.post_attention_layernorm.weight": "pytorch_model-
|
133 |
-
"model.layers.2.self_attn.k_proj.weight": "pytorch_model-
|
134 |
-
"model.layers.2.self_attn.o_proj.weight": "pytorch_model-
|
135 |
-
"model.layers.2.self_attn.q_proj.weight": "pytorch_model-
|
136 |
-
"model.layers.2.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
137 |
-
"model.layers.2.self_attn.v_proj.weight": "pytorch_model-
|
138 |
-
"model.layers.20.input_layernorm.weight": "pytorch_model-
|
139 |
-
"model.layers.20.mlp.down_proj.weight": "pytorch_model-
|
140 |
-
"model.layers.20.mlp.gate_proj.weight": "pytorch_model-
|
141 |
-
"model.layers.20.mlp.up_proj.weight": "pytorch_model-
|
142 |
-
"model.layers.20.post_attention_layernorm.weight": "pytorch_model-
|
143 |
-
"model.layers.20.self_attn.k_proj.weight": "pytorch_model-
|
144 |
-
"model.layers.20.self_attn.o_proj.weight": "pytorch_model-
|
145 |
-
"model.layers.20.self_attn.q_proj.weight": "pytorch_model-
|
146 |
-
"model.layers.20.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
147 |
-
"model.layers.20.self_attn.v_proj.weight": "pytorch_model-
|
148 |
-
"model.layers.21.input_layernorm.weight": "pytorch_model-
|
149 |
-
"model.layers.21.mlp.down_proj.weight": "pytorch_model-
|
150 |
-
"model.layers.21.mlp.gate_proj.weight": "pytorch_model-
|
151 |
-
"model.layers.21.mlp.up_proj.weight": "pytorch_model-
|
152 |
-
"model.layers.21.post_attention_layernorm.weight": "pytorch_model-
|
153 |
-
"model.layers.21.self_attn.k_proj.weight": "pytorch_model-
|
154 |
-
"model.layers.21.self_attn.o_proj.weight": "pytorch_model-
|
155 |
-
"model.layers.21.self_attn.q_proj.weight": "pytorch_model-
|
156 |
-
"model.layers.21.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
157 |
-
"model.layers.21.self_attn.v_proj.weight": "pytorch_model-
|
158 |
-
"model.layers.22.input_layernorm.weight": "pytorch_model-
|
159 |
-
"model.layers.22.mlp.down_proj.weight": "pytorch_model-
|
160 |
-
"model.layers.22.mlp.gate_proj.weight": "pytorch_model-
|
161 |
-
"model.layers.22.mlp.up_proj.weight": "pytorch_model-
|
162 |
-
"model.layers.22.post_attention_layernorm.weight": "pytorch_model-
|
163 |
-
"model.layers.22.self_attn.k_proj.weight": "pytorch_model-
|
164 |
-
"model.layers.22.self_attn.o_proj.weight": "pytorch_model-
|
165 |
-
"model.layers.22.self_attn.q_proj.weight": "pytorch_model-
|
166 |
-
"model.layers.22.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
167 |
-
"model.layers.22.self_attn.v_proj.weight": "pytorch_model-
|
168 |
-
"model.layers.23.input_layernorm.weight": "pytorch_model-
|
169 |
-
"model.layers.23.mlp.down_proj.weight": "pytorch_model-
|
170 |
-
"model.layers.23.mlp.gate_proj.weight": "pytorch_model-
|
171 |
-
"model.layers.23.mlp.up_proj.weight": "pytorch_model-
|
172 |
-
"model.layers.23.post_attention_layernorm.weight": "pytorch_model-
|
173 |
-
"model.layers.23.self_attn.k_proj.weight": "pytorch_model-
|
174 |
-
"model.layers.23.self_attn.o_proj.weight": "pytorch_model-
|
175 |
-
"model.layers.23.self_attn.q_proj.weight": "pytorch_model-
|
176 |
-
"model.layers.23.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
177 |
-
"model.layers.23.self_attn.v_proj.weight": "pytorch_model-
|
178 |
-
"model.layers.24.input_layernorm.weight": "pytorch_model-
|
179 |
-
"model.layers.24.mlp.down_proj.weight": "pytorch_model-
|
180 |
-
"model.layers.24.mlp.gate_proj.weight": "pytorch_model-
|
181 |
-
"model.layers.24.mlp.up_proj.weight": "pytorch_model-
|
182 |
-
"model.layers.24.post_attention_layernorm.weight": "pytorch_model-
|
183 |
-
"model.layers.24.self_attn.k_proj.weight": "pytorch_model-
|
184 |
-
"model.layers.24.self_attn.o_proj.weight": "pytorch_model-
|
185 |
-
"model.layers.24.self_attn.q_proj.weight": "pytorch_model-
|
186 |
-
"model.layers.24.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
187 |
-
"model.layers.24.self_attn.v_proj.weight": "pytorch_model-
|
188 |
-
"model.layers.25.input_layernorm.weight": "pytorch_model-
|
189 |
-
"model.layers.25.mlp.down_proj.weight": "pytorch_model-
|
190 |
-
"model.layers.25.mlp.gate_proj.weight": "pytorch_model-
|
191 |
-
"model.layers.25.mlp.up_proj.weight": "pytorch_model-
|
192 |
-
"model.layers.25.post_attention_layernorm.weight": "pytorch_model-
|
193 |
-
"model.layers.25.self_attn.k_proj.weight": "pytorch_model-
|
194 |
-
"model.layers.25.self_attn.o_proj.weight": "pytorch_model-
|
195 |
-
"model.layers.25.self_attn.q_proj.weight": "pytorch_model-
|
196 |
-
"model.layers.25.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
197 |
-
"model.layers.25.self_attn.v_proj.weight": "pytorch_model-
|
198 |
-
"model.layers.26.input_layernorm.weight": "pytorch_model-
|
199 |
-
"model.layers.26.mlp.down_proj.weight": "pytorch_model-
|
200 |
-
"model.layers.26.mlp.gate_proj.weight": "pytorch_model-
|
201 |
-
"model.layers.26.mlp.up_proj.weight": "pytorch_model-
|
202 |
-
"model.layers.26.post_attention_layernorm.weight": "pytorch_model-
|
203 |
-
"model.layers.26.self_attn.k_proj.weight": "pytorch_model-
|
204 |
-
"model.layers.26.self_attn.o_proj.weight": "pytorch_model-
|
205 |
-
"model.layers.26.self_attn.q_proj.weight": "pytorch_model-
|
206 |
-
"model.layers.26.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
207 |
-
"model.layers.26.self_attn.v_proj.weight": "pytorch_model-
|
208 |
-
"model.layers.27.input_layernorm.weight": "pytorch_model-
|
209 |
-
"model.layers.27.mlp.down_proj.weight": "pytorch_model-
|
210 |
-
"model.layers.27.mlp.gate_proj.weight": "pytorch_model-
|
211 |
-
"model.layers.27.mlp.up_proj.weight": "pytorch_model-
|
212 |
-
"model.layers.27.post_attention_layernorm.weight": "pytorch_model-
|
213 |
-
"model.layers.27.self_attn.k_proj.weight": "pytorch_model-
|
214 |
-
"model.layers.27.self_attn.o_proj.weight": "pytorch_model-
|
215 |
-
"model.layers.27.self_attn.q_proj.weight": "pytorch_model-
|
216 |
-
"model.layers.27.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
217 |
-
"model.layers.27.self_attn.v_proj.weight": "pytorch_model-
|
218 |
-
"model.layers.28.input_layernorm.weight": "pytorch_model-
|
219 |
-
"model.layers.28.mlp.down_proj.weight": "pytorch_model-
|
220 |
-
"model.layers.28.mlp.gate_proj.weight": "pytorch_model-
|
221 |
-
"model.layers.28.mlp.up_proj.weight": "pytorch_model-
|
222 |
-
"model.layers.28.post_attention_layernorm.weight": "pytorch_model-
|
223 |
-
"model.layers.28.self_attn.k_proj.weight": "pytorch_model-
|
224 |
-
"model.layers.28.self_attn.o_proj.weight": "pytorch_model-
|
225 |
-
"model.layers.28.self_attn.q_proj.weight": "pytorch_model-
|
226 |
-
"model.layers.28.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
227 |
-
"model.layers.28.self_attn.v_proj.weight": "pytorch_model-
|
228 |
-
"model.layers.29.input_layernorm.weight": "pytorch_model-
|
229 |
-
"model.layers.29.mlp.down_proj.weight": "pytorch_model-
|
230 |
-
"model.layers.29.mlp.gate_proj.weight": "pytorch_model-
|
231 |
-
"model.layers.29.mlp.up_proj.weight": "pytorch_model-
|
232 |
-
"model.layers.29.post_attention_layernorm.weight": "pytorch_model-
|
233 |
-
"model.layers.29.self_attn.k_proj.weight": "pytorch_model-
|
234 |
-
"model.layers.29.self_attn.o_proj.weight": "pytorch_model-
|
235 |
-
"model.layers.29.self_attn.q_proj.weight": "pytorch_model-
|
236 |
-
"model.layers.29.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
237 |
-
"model.layers.29.self_attn.v_proj.weight": "pytorch_model-
|
238 |
-
"model.layers.3.input_layernorm.weight": "pytorch_model-00002-of-
|
239 |
-
"model.layers.3.mlp.down_proj.weight": "pytorch_model-00002-of-
|
240 |
-
"model.layers.3.mlp.gate_proj.weight": "pytorch_model-00002-of-
|
241 |
-
"model.layers.3.mlp.up_proj.weight": "pytorch_model-00002-of-
|
242 |
-
"model.layers.3.post_attention_layernorm.weight": "pytorch_model-00002-of-
|
243 |
-
"model.layers.3.self_attn.k_proj.weight": "pytorch_model-00002-of-
|
244 |
-
"model.layers.3.self_attn.o_proj.weight": "pytorch_model-00002-of-
|
245 |
-
"model.layers.3.self_attn.q_proj.weight": "pytorch_model-00002-of-
|
246 |
-
"model.layers.3.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-
|
247 |
-
"model.layers.3.self_attn.v_proj.weight": "pytorch_model-00002-of-
|
248 |
-
"model.layers.30.input_layernorm.weight": "pytorch_model-
|
249 |
-
"model.layers.30.mlp.down_proj.weight": "pytorch_model-
|
250 |
-
"model.layers.30.mlp.gate_proj.weight": "pytorch_model-
|
251 |
-
"model.layers.30.mlp.up_proj.weight": "pytorch_model-
|
252 |
-
"model.layers.30.post_attention_layernorm.weight": "pytorch_model-
|
253 |
-
"model.layers.30.self_attn.k_proj.weight": "pytorch_model-
|
254 |
-
"model.layers.30.self_attn.o_proj.weight": "pytorch_model-
|
255 |
-
"model.layers.30.self_attn.q_proj.weight": "pytorch_model-
|
256 |
-
"model.layers.30.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
257 |
-
"model.layers.30.self_attn.v_proj.weight": "pytorch_model-
|
258 |
-
"model.layers.31.input_layernorm.weight": "pytorch_model-
|
259 |
-
"model.layers.31.mlp.down_proj.weight": "pytorch_model-
|
260 |
-
"model.layers.31.mlp.gate_proj.weight": "pytorch_model-
|
261 |
-
"model.layers.31.mlp.up_proj.weight": "pytorch_model-
|
262 |
-
"model.layers.31.post_attention_layernorm.weight": "pytorch_model-
|
263 |
-
"model.layers.31.self_attn.k_proj.weight": "pytorch_model-
|
264 |
-
"model.layers.31.self_attn.o_proj.weight": "pytorch_model-
|
265 |
-
"model.layers.31.self_attn.q_proj.weight": "pytorch_model-
|
266 |
-
"model.layers.31.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
267 |
-
"model.layers.31.self_attn.v_proj.weight": "pytorch_model-
|
268 |
-
"model.layers.32.input_layernorm.weight": "pytorch_model-
|
269 |
-
"model.layers.32.mlp.down_proj.weight": "pytorch_model-
|
270 |
-
"model.layers.32.mlp.gate_proj.weight": "pytorch_model-
|
271 |
-
"model.layers.32.mlp.up_proj.weight": "pytorch_model-
|
272 |
-
"model.layers.32.post_attention_layernorm.weight": "pytorch_model-
|
273 |
-
"model.layers.32.self_attn.k_proj.weight": "pytorch_model-
|
274 |
-
"model.layers.32.self_attn.o_proj.weight": "pytorch_model-
|
275 |
-
"model.layers.32.self_attn.q_proj.weight": "pytorch_model-
|
276 |
-
"model.layers.32.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
277 |
-
"model.layers.32.self_attn.v_proj.weight": "pytorch_model-
|
278 |
-
"model.layers.33.input_layernorm.weight": "pytorch_model-
|
279 |
-
"model.layers.33.mlp.down_proj.weight": "pytorch_model-
|
280 |
-
"model.layers.33.mlp.gate_proj.weight": "pytorch_model-
|
281 |
-
"model.layers.33.mlp.up_proj.weight": "pytorch_model-
|
282 |
-
"model.layers.33.post_attention_layernorm.weight": "pytorch_model-
|
283 |
-
"model.layers.33.self_attn.k_proj.weight": "pytorch_model-
|
284 |
-
"model.layers.33.self_attn.o_proj.weight": "pytorch_model-
|
285 |
-
"model.layers.33.self_attn.q_proj.weight": "pytorch_model-
|
286 |
-
"model.layers.33.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
287 |
-
"model.layers.33.self_attn.v_proj.weight": "pytorch_model-
|
288 |
-
"model.layers.34.input_layernorm.weight": "pytorch_model-
|
289 |
-
"model.layers.34.mlp.down_proj.weight": "pytorch_model-
|
290 |
-
"model.layers.34.mlp.gate_proj.weight": "pytorch_model-
|
291 |
-
"model.layers.34.mlp.up_proj.weight": "pytorch_model-
|
292 |
-
"model.layers.34.post_attention_layernorm.weight": "pytorch_model-
|
293 |
-
"model.layers.34.self_attn.k_proj.weight": "pytorch_model-
|
294 |
-
"model.layers.34.self_attn.o_proj.weight": "pytorch_model-
|
295 |
-
"model.layers.34.self_attn.q_proj.weight": "pytorch_model-
|
296 |
-
"model.layers.34.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
297 |
-
"model.layers.34.self_attn.v_proj.weight": "pytorch_model-
|
298 |
-
"model.layers.35.input_layernorm.weight": "pytorch_model-
|
299 |
-
"model.layers.35.mlp.down_proj.weight": "pytorch_model-
|
300 |
-
"model.layers.35.mlp.gate_proj.weight": "pytorch_model-
|
301 |
-
"model.layers.35.mlp.up_proj.weight": "pytorch_model-
|
302 |
-
"model.layers.35.post_attention_layernorm.weight": "pytorch_model-
|
303 |
-
"model.layers.35.self_attn.k_proj.weight": "pytorch_model-
|
304 |
-
"model.layers.35.self_attn.o_proj.weight": "pytorch_model-
|
305 |
-
"model.layers.35.self_attn.q_proj.weight": "pytorch_model-
|
306 |
-
"model.layers.35.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
307 |
-
"model.layers.35.self_attn.v_proj.weight": "pytorch_model-
|
308 |
-
"model.layers.36.input_layernorm.weight": "pytorch_model-
|
309 |
-
"model.layers.36.mlp.down_proj.weight": "pytorch_model-
|
310 |
-
"model.layers.36.mlp.gate_proj.weight": "pytorch_model-
|
311 |
-
"model.layers.36.mlp.up_proj.weight": "pytorch_model-
|
312 |
-
"model.layers.36.post_attention_layernorm.weight": "pytorch_model-
|
313 |
-
"model.layers.36.self_attn.k_proj.weight": "pytorch_model-
|
314 |
-
"model.layers.36.self_attn.o_proj.weight": "pytorch_model-
|
315 |
-
"model.layers.36.self_attn.q_proj.weight": "pytorch_model-
|
316 |
-
"model.layers.36.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
317 |
-
"model.layers.36.self_attn.v_proj.weight": "pytorch_model-
|
318 |
-
"model.layers.37.input_layernorm.weight": "pytorch_model-
|
319 |
-
"model.layers.37.mlp.down_proj.weight": "pytorch_model-
|
320 |
-
"model.layers.37.mlp.gate_proj.weight": "pytorch_model-
|
321 |
-
"model.layers.37.mlp.up_proj.weight": "pytorch_model-
|
322 |
-
"model.layers.37.post_attention_layernorm.weight": "pytorch_model-
|
323 |
-
"model.layers.37.self_attn.k_proj.weight": "pytorch_model-
|
324 |
-
"model.layers.37.self_attn.o_proj.weight": "pytorch_model-
|
325 |
-
"model.layers.37.self_attn.q_proj.weight": "pytorch_model-
|
326 |
-
"model.layers.37.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
327 |
-
"model.layers.37.self_attn.v_proj.weight": "pytorch_model-
|
328 |
-
"model.layers.38.input_layernorm.weight": "pytorch_model-
|
329 |
-
"model.layers.38.mlp.down_proj.weight": "pytorch_model-
|
330 |
-
"model.layers.38.mlp.gate_proj.weight": "pytorch_model-
|
331 |
-
"model.layers.38.mlp.up_proj.weight": "pytorch_model-
|
332 |
-
"model.layers.38.post_attention_layernorm.weight": "pytorch_model-
|
333 |
-
"model.layers.38.self_attn.k_proj.weight": "pytorch_model-
|
334 |
-
"model.layers.38.self_attn.o_proj.weight": "pytorch_model-
|
335 |
-
"model.layers.38.self_attn.q_proj.weight": "pytorch_model-
|
336 |
-
"model.layers.38.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
337 |
-
"model.layers.38.self_attn.v_proj.weight": "pytorch_model-
|
338 |
-
"model.layers.39.input_layernorm.weight": "pytorch_model-
|
339 |
-
"model.layers.39.mlp.down_proj.weight": "pytorch_model-
|
340 |
-
"model.layers.39.mlp.gate_proj.weight": "pytorch_model-
|
341 |
-
"model.layers.39.mlp.up_proj.weight": "pytorch_model-
|
342 |
-
"model.layers.39.post_attention_layernorm.weight": "pytorch_model-
|
343 |
-
"model.layers.39.self_attn.k_proj.weight": "pytorch_model-
|
344 |
-
"model.layers.39.self_attn.o_proj.weight": "pytorch_model-
|
345 |
-
"model.layers.39.self_attn.q_proj.weight": "pytorch_model-
|
346 |
-
"model.layers.39.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
347 |
-
"model.layers.39.self_attn.v_proj.weight": "pytorch_model-
|
348 |
-
"model.layers.4.input_layernorm.weight": "pytorch_model-00002-of-
|
349 |
-
"model.layers.4.mlp.down_proj.weight": "pytorch_model-
|
350 |
-
"model.layers.4.mlp.gate_proj.weight": "pytorch_model-
|
351 |
-
"model.layers.4.mlp.up_proj.weight": "pytorch_model-
|
352 |
-
"model.layers.4.post_attention_layernorm.weight": "pytorch_model-00002-of-
|
353 |
-
"model.layers.4.self_attn.k_proj.weight": "pytorch_model-00002-of-
|
354 |
-
"model.layers.4.self_attn.o_proj.weight": "pytorch_model-00002-of-
|
355 |
-
"model.layers.4.self_attn.q_proj.weight": "pytorch_model-00002-of-
|
356 |
-
"model.layers.4.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-
|
357 |
-
"model.layers.4.self_attn.v_proj.weight": "pytorch_model-00002-of-
|
358 |
-
"model.layers.5.input_layernorm.weight": "pytorch_model-
|
359 |
-
"model.layers.5.mlp.down_proj.weight": "pytorch_model-
|
360 |
-
"model.layers.5.mlp.gate_proj.weight": "pytorch_model-
|
361 |
-
"model.layers.5.mlp.up_proj.weight": "pytorch_model-
|
362 |
-
"model.layers.5.post_attention_layernorm.weight": "pytorch_model-
|
363 |
-
"model.layers.5.self_attn.k_proj.weight": "pytorch_model-
|
364 |
-
"model.layers.5.self_attn.o_proj.weight": "pytorch_model-
|
365 |
-
"model.layers.5.self_attn.q_proj.weight": "pytorch_model-
|
366 |
-
"model.layers.5.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
367 |
-
"model.layers.5.self_attn.v_proj.weight": "pytorch_model-
|
368 |
-
"model.layers.6.input_layernorm.weight": "pytorch_model-
|
369 |
-
"model.layers.6.mlp.down_proj.weight": "pytorch_model-
|
370 |
-
"model.layers.6.mlp.gate_proj.weight": "pytorch_model-
|
371 |
-
"model.layers.6.mlp.up_proj.weight": "pytorch_model-
|
372 |
-
"model.layers.6.post_attention_layernorm.weight": "pytorch_model-
|
373 |
-
"model.layers.6.self_attn.k_proj.weight": "pytorch_model-
|
374 |
-
"model.layers.6.self_attn.o_proj.weight": "pytorch_model-
|
375 |
-
"model.layers.6.self_attn.q_proj.weight": "pytorch_model-
|
376 |
-
"model.layers.6.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
377 |
-
"model.layers.6.self_attn.v_proj.weight": "pytorch_model-
|
378 |
-
"model.layers.7.input_layernorm.weight": "pytorch_model-
|
379 |
-
"model.layers.7.mlp.down_proj.weight": "pytorch_model-
|
380 |
-
"model.layers.7.mlp.gate_proj.weight": "pytorch_model-
|
381 |
-
"model.layers.7.mlp.up_proj.weight": "pytorch_model-
|
382 |
-
"model.layers.7.post_attention_layernorm.weight": "pytorch_model-
|
383 |
-
"model.layers.7.self_attn.k_proj.weight": "pytorch_model-
|
384 |
-
"model.layers.7.self_attn.o_proj.weight": "pytorch_model-
|
385 |
-
"model.layers.7.self_attn.q_proj.weight": "pytorch_model-
|
386 |
-
"model.layers.7.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
387 |
-
"model.layers.7.self_attn.v_proj.weight": "pytorch_model-
|
388 |
-
"model.layers.8.input_layernorm.weight": "pytorch_model-
|
389 |
-
"model.layers.8.mlp.down_proj.weight": "pytorch_model-
|
390 |
-
"model.layers.8.mlp.gate_proj.weight": "pytorch_model-
|
391 |
-
"model.layers.8.mlp.up_proj.weight": "pytorch_model-
|
392 |
-
"model.layers.8.post_attention_layernorm.weight": "pytorch_model-
|
393 |
-
"model.layers.8.self_attn.k_proj.weight": "pytorch_model-
|
394 |
-
"model.layers.8.self_attn.o_proj.weight": "pytorch_model-
|
395 |
-
"model.layers.8.self_attn.q_proj.weight": "pytorch_model-
|
396 |
-
"model.layers.8.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
397 |
-
"model.layers.8.self_attn.v_proj.weight": "pytorch_model-
|
398 |
-
"model.layers.9.input_layernorm.weight": "pytorch_model-
|
399 |
-
"model.layers.9.mlp.down_proj.weight": "pytorch_model-
|
400 |
-
"model.layers.9.mlp.gate_proj.weight": "pytorch_model-
|
401 |
-
"model.layers.9.mlp.up_proj.weight": "pytorch_model-
|
402 |
-
"model.layers.9.post_attention_layernorm.weight": "pytorch_model-
|
403 |
-
"model.layers.9.self_attn.k_proj.weight": "pytorch_model-
|
404 |
-
"model.layers.9.self_attn.o_proj.weight": "pytorch_model-
|
405 |
-
"model.layers.9.self_attn.q_proj.weight": "pytorch_model-
|
406 |
-
"model.layers.9.self_attn.rotary_emb.inv_freq": "pytorch_model-
|
407 |
-
"model.layers.9.self_attn.v_proj.weight": "pytorch_model-
|
408 |
-
"model.norm.weight": "pytorch_model-
|
409 |
}
|
410 |
}
|
|
|
1 |
{
|
2 |
"metadata": {
|
3 |
+
"total_size": 17578695680
|
4 |
},
|
5 |
"weight_map": {
|
6 |
+
"lm_head.weight": "pytorch_model-00010-of-00010.bin",
|
7 |
+
"model.embed_tokens.weight": "pytorch_model-00001-of-00010.bin",
|
8 |
+
"model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00010.bin",
|
9 |
+
"model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00010.bin",
|
10 |
+
"model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00010.bin",
|
11 |
+
"model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00010.bin",
|
12 |
+
"model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00010.bin",
|
13 |
+
"model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00010.bin",
|
14 |
+
"model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00010.bin",
|
15 |
+
"model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00010.bin",
|
16 |
+
"model.layers.0.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00010.bin",
|
17 |
+
"model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00010.bin",
|
18 |
+
"model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00010.bin",
|
19 |
+
"model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00010.bin",
|
20 |
+
"model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00010.bin",
|
21 |
+
"model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00010.bin",
|
22 |
+
"model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00010.bin",
|
23 |
+
"model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00010.bin",
|
24 |
+
"model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00010.bin",
|
25 |
+
"model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00010.bin",
|
26 |
+
"model.layers.1.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00010.bin",
|
27 |
+
"model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00010.bin",
|
28 |
+
"model.layers.10.input_layernorm.weight": "pytorch_model-00003-of-00010.bin",
|
29 |
+
"model.layers.10.mlp.down_proj.weight": "pytorch_model-00003-of-00010.bin",
|
30 |
+
"model.layers.10.mlp.gate_proj.weight": "pytorch_model-00003-of-00010.bin",
|
31 |
+
"model.layers.10.mlp.up_proj.weight": "pytorch_model-00003-of-00010.bin",
|
32 |
+
"model.layers.10.post_attention_layernorm.weight": "pytorch_model-00003-of-00010.bin",
|
33 |
+
"model.layers.10.self_attn.k_proj.weight": "pytorch_model-00003-of-00010.bin",
|
34 |
+
"model.layers.10.self_attn.o_proj.weight": "pytorch_model-00003-of-00010.bin",
|
35 |
+
"model.layers.10.self_attn.q_proj.weight": "pytorch_model-00003-of-00010.bin",
|
36 |
+
"model.layers.10.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00010.bin",
|
37 |
+
"model.layers.10.self_attn.v_proj.weight": "pytorch_model-00003-of-00010.bin",
|
38 |
+
"model.layers.11.input_layernorm.weight": "pytorch_model-00003-of-00010.bin",
|
39 |
+
"model.layers.11.mlp.down_proj.weight": "pytorch_model-00003-of-00010.bin",
|
40 |
+
"model.layers.11.mlp.gate_proj.weight": "pytorch_model-00003-of-00010.bin",
|
41 |
+
"model.layers.11.mlp.up_proj.weight": "pytorch_model-00003-of-00010.bin",
|
42 |
+
"model.layers.11.post_attention_layernorm.weight": "pytorch_model-00003-of-00010.bin",
|
43 |
+
"model.layers.11.self_attn.k_proj.weight": "pytorch_model-00003-of-00010.bin",
|
44 |
+
"model.layers.11.self_attn.o_proj.weight": "pytorch_model-00003-of-00010.bin",
|
45 |
+
"model.layers.11.self_attn.q_proj.weight": "pytorch_model-00003-of-00010.bin",
|
46 |
+
"model.layers.11.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00010.bin",
|
47 |
+
"model.layers.11.self_attn.v_proj.weight": "pytorch_model-00003-of-00010.bin",
|
48 |
+
"model.layers.12.input_layernorm.weight": "pytorch_model-00003-of-00010.bin",
|
49 |
+
"model.layers.12.mlp.down_proj.weight": "pytorch_model-00004-of-00010.bin",
|
50 |
+
"model.layers.12.mlp.gate_proj.weight": "pytorch_model-00004-of-00010.bin",
|
51 |
+
"model.layers.12.mlp.up_proj.weight": "pytorch_model-00004-of-00010.bin",
|
52 |
+
"model.layers.12.post_attention_layernorm.weight": "pytorch_model-00003-of-00010.bin",
|
53 |
+
"model.layers.12.self_attn.k_proj.weight": "pytorch_model-00003-of-00010.bin",
|
54 |
+
"model.layers.12.self_attn.o_proj.weight": "pytorch_model-00003-of-00010.bin",
|
55 |
+
"model.layers.12.self_attn.q_proj.weight": "pytorch_model-00003-of-00010.bin",
|
56 |
+
"model.layers.12.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00010.bin",
|
57 |
+
"model.layers.12.self_attn.v_proj.weight": "pytorch_model-00003-of-00010.bin",
|
58 |
+
"model.layers.13.input_layernorm.weight": "pytorch_model-00004-of-00010.bin",
|
59 |
+
"model.layers.13.mlp.down_proj.weight": "pytorch_model-00004-of-00010.bin",
|
60 |
+
"model.layers.13.mlp.gate_proj.weight": "pytorch_model-00004-of-00010.bin",
|
61 |
+
"model.layers.13.mlp.up_proj.weight": "pytorch_model-00004-of-00010.bin",
|
62 |
+
"model.layers.13.post_attention_layernorm.weight": "pytorch_model-00004-of-00010.bin",
|
63 |
+
"model.layers.13.self_attn.k_proj.weight": "pytorch_model-00004-of-00010.bin",
|
64 |
+
"model.layers.13.self_attn.o_proj.weight": "pytorch_model-00004-of-00010.bin",
|
65 |
+
"model.layers.13.self_attn.q_proj.weight": "pytorch_model-00004-of-00010.bin",
|
66 |
+
"model.layers.13.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00010.bin",
|
67 |
+
"model.layers.13.self_attn.v_proj.weight": "pytorch_model-00004-of-00010.bin",
|
68 |
+
"model.layers.14.input_layernorm.weight": "pytorch_model-00004-of-00010.bin",
|
69 |
+
"model.layers.14.mlp.down_proj.weight": "pytorch_model-00004-of-00010.bin",
|
70 |
+
"model.layers.14.mlp.gate_proj.weight": "pytorch_model-00004-of-00010.bin",
|
71 |
+
"model.layers.14.mlp.up_proj.weight": "pytorch_model-00004-of-00010.bin",
|
72 |
+
"model.layers.14.post_attention_layernorm.weight": "pytorch_model-00004-of-00010.bin",
|
73 |
+
"model.layers.14.self_attn.k_proj.weight": "pytorch_model-00004-of-00010.bin",
|
74 |
+
"model.layers.14.self_attn.o_proj.weight": "pytorch_model-00004-of-00010.bin",
|
75 |
+
"model.layers.14.self_attn.q_proj.weight": "pytorch_model-00004-of-00010.bin",
|
76 |
+
"model.layers.14.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00010.bin",
|
77 |
+
"model.layers.14.self_attn.v_proj.weight": "pytorch_model-00004-of-00010.bin",
|
78 |
+
"model.layers.15.input_layernorm.weight": "pytorch_model-00004-of-00010.bin",
|
79 |
+
"model.layers.15.mlp.down_proj.weight": "pytorch_model-00004-of-00010.bin",
|
80 |
+
"model.layers.15.mlp.gate_proj.weight": "pytorch_model-00004-of-00010.bin",
|
81 |
+
"model.layers.15.mlp.up_proj.weight": "pytorch_model-00004-of-00010.bin",
|
82 |
+
"model.layers.15.post_attention_layernorm.weight": "pytorch_model-00004-of-00010.bin",
|
83 |
+
"model.layers.15.self_attn.k_proj.weight": "pytorch_model-00004-of-00010.bin",
|
84 |
+
"model.layers.15.self_attn.o_proj.weight": "pytorch_model-00004-of-00010.bin",
|
85 |
+
"model.layers.15.self_attn.q_proj.weight": "pytorch_model-00004-of-00010.bin",
|
86 |
+
"model.layers.15.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00010.bin",
|
87 |
+
"model.layers.15.self_attn.v_proj.weight": "pytorch_model-00004-of-00010.bin",
|
88 |
+
"model.layers.16.input_layernorm.weight": "pytorch_model-00004-of-00010.bin",
|
89 |
+
"model.layers.16.mlp.down_proj.weight": "pytorch_model-00004-of-00010.bin",
|
90 |
+
"model.layers.16.mlp.gate_proj.weight": "pytorch_model-00004-of-00010.bin",
|
91 |
+
"model.layers.16.mlp.up_proj.weight": "pytorch_model-00004-of-00010.bin",
|
92 |
+
"model.layers.16.post_attention_layernorm.weight": "pytorch_model-00004-of-00010.bin",
|
93 |
+
"model.layers.16.self_attn.k_proj.weight": "pytorch_model-00004-of-00010.bin",
|
94 |
+
"model.layers.16.self_attn.o_proj.weight": "pytorch_model-00004-of-00010.bin",
|
95 |
+
"model.layers.16.self_attn.q_proj.weight": "pytorch_model-00004-of-00010.bin",
|
96 |
+
"model.layers.16.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00010.bin",
|
97 |
+
"model.layers.16.self_attn.v_proj.weight": "pytorch_model-00004-of-00010.bin",
|
98 |
+
"model.layers.17.input_layernorm.weight": "pytorch_model-00004-of-00010.bin",
|
99 |
+
"model.layers.17.mlp.down_proj.weight": "pytorch_model-00005-of-00010.bin",
|
100 |
+
"model.layers.17.mlp.gate_proj.weight": "pytorch_model-00005-of-00010.bin",
|
101 |
+
"model.layers.17.mlp.up_proj.weight": "pytorch_model-00005-of-00010.bin",
|
102 |
+
"model.layers.17.post_attention_layernorm.weight": "pytorch_model-00004-of-00010.bin",
|
103 |
+
"model.layers.17.self_attn.k_proj.weight": "pytorch_model-00004-of-00010.bin",
|
104 |
+
"model.layers.17.self_attn.o_proj.weight": "pytorch_model-00004-of-00010.bin",
|
105 |
+
"model.layers.17.self_attn.q_proj.weight": "pytorch_model-00004-of-00010.bin",
|
106 |
+
"model.layers.17.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00010.bin",
|
107 |
+
"model.layers.17.self_attn.v_proj.weight": "pytorch_model-00004-of-00010.bin",
|
108 |
+
"model.layers.18.input_layernorm.weight": "pytorch_model-00005-of-00010.bin",
|
109 |
+
"model.layers.18.mlp.down_proj.weight": "pytorch_model-00005-of-00010.bin",
|
110 |
+
"model.layers.18.mlp.gate_proj.weight": "pytorch_model-00005-of-00010.bin",
|
111 |
+
"model.layers.18.mlp.up_proj.weight": "pytorch_model-00005-of-00010.bin",
|
112 |
+
"model.layers.18.post_attention_layernorm.weight": "pytorch_model-00005-of-00010.bin",
|
113 |
+
"model.layers.18.self_attn.k_proj.weight": "pytorch_model-00005-of-00010.bin",
|
114 |
+
"model.layers.18.self_attn.o_proj.weight": "pytorch_model-00005-of-00010.bin",
|
115 |
+
"model.layers.18.self_attn.q_proj.weight": "pytorch_model-00005-of-00010.bin",
|
116 |
+
"model.layers.18.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00010.bin",
|
117 |
+
"model.layers.18.self_attn.v_proj.weight": "pytorch_model-00005-of-00010.bin",
|
118 |
+
"model.layers.19.input_layernorm.weight": "pytorch_model-00005-of-00010.bin",
|
119 |
+
"model.layers.19.mlp.down_proj.weight": "pytorch_model-00005-of-00010.bin",
|
120 |
+
"model.layers.19.mlp.gate_proj.weight": "pytorch_model-00005-of-00010.bin",
|
121 |
+
"model.layers.19.mlp.up_proj.weight": "pytorch_model-00005-of-00010.bin",
|
122 |
+
"model.layers.19.post_attention_layernorm.weight": "pytorch_model-00005-of-00010.bin",
|
123 |
+
"model.layers.19.self_attn.k_proj.weight": "pytorch_model-00005-of-00010.bin",
|
124 |
+
"model.layers.19.self_attn.o_proj.weight": "pytorch_model-00005-of-00010.bin",
|
125 |
+
"model.layers.19.self_attn.q_proj.weight": "pytorch_model-00005-of-00010.bin",
|
126 |
+
"model.layers.19.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00010.bin",
|
127 |
+
"model.layers.19.self_attn.v_proj.weight": "pytorch_model-00005-of-00010.bin",
|
128 |
+
"model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00010.bin",
|
129 |
+
"model.layers.2.mlp.down_proj.weight": "pytorch_model-00002-of-00010.bin",
|
130 |
+
"model.layers.2.mlp.gate_proj.weight": "pytorch_model-00002-of-00010.bin",
|
131 |
+
"model.layers.2.mlp.up_proj.weight": "pytorch_model-00002-of-00010.bin",
|
132 |
+
"model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00010.bin",
|
133 |
+
"model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00010.bin",
|
134 |
+
"model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00010.bin",
|
135 |
+
"model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00010.bin",
|
136 |
+
"model.layers.2.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00010.bin",
|
137 |
+
"model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00010.bin",
|
138 |
+
"model.layers.20.input_layernorm.weight": "pytorch_model-00005-of-00010.bin",
|
139 |
+
"model.layers.20.mlp.down_proj.weight": "pytorch_model-00005-of-00010.bin",
|
140 |
+
"model.layers.20.mlp.gate_proj.weight": "pytorch_model-00005-of-00010.bin",
|
141 |
+
"model.layers.20.mlp.up_proj.weight": "pytorch_model-00005-of-00010.bin",
|
142 |
+
"model.layers.20.post_attention_layernorm.weight": "pytorch_model-00005-of-00010.bin",
|
143 |
+
"model.layers.20.self_attn.k_proj.weight": "pytorch_model-00005-of-00010.bin",
|
144 |
+
"model.layers.20.self_attn.o_proj.weight": "pytorch_model-00005-of-00010.bin",
|
145 |
+
"model.layers.20.self_attn.q_proj.weight": "pytorch_model-00005-of-00010.bin",
|
146 |
+
"model.layers.20.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00010.bin",
|
147 |
+
"model.layers.20.self_attn.v_proj.weight": "pytorch_model-00005-of-00010.bin",
|
148 |
+
"model.layers.21.input_layernorm.weight": "pytorch_model-00005-of-00010.bin",
|
149 |
+
"model.layers.21.mlp.down_proj.weight": "pytorch_model-00005-of-00010.bin",
|
150 |
+
"model.layers.21.mlp.gate_proj.weight": "pytorch_model-00005-of-00010.bin",
|
151 |
+
"model.layers.21.mlp.up_proj.weight": "pytorch_model-00005-of-00010.bin",
|
152 |
+
"model.layers.21.post_attention_layernorm.weight": "pytorch_model-00005-of-00010.bin",
|
153 |
+
"model.layers.21.self_attn.k_proj.weight": "pytorch_model-00005-of-00010.bin",
|
154 |
+
"model.layers.21.self_attn.o_proj.weight": "pytorch_model-00005-of-00010.bin",
|
155 |
+
"model.layers.21.self_attn.q_proj.weight": "pytorch_model-00005-of-00010.bin",
|
156 |
+
"model.layers.21.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00010.bin",
|
157 |
+
"model.layers.21.self_attn.v_proj.weight": "pytorch_model-00005-of-00010.bin",
|
158 |
+
"model.layers.22.input_layernorm.weight": "pytorch_model-00005-of-00010.bin",
|
159 |
+
"model.layers.22.mlp.down_proj.weight": "pytorch_model-00006-of-00010.bin",
|
160 |
+
"model.layers.22.mlp.gate_proj.weight": "pytorch_model-00006-of-00010.bin",
|
161 |
+
"model.layers.22.mlp.up_proj.weight": "pytorch_model-00006-of-00010.bin",
|
162 |
+
"model.layers.22.post_attention_layernorm.weight": "pytorch_model-00005-of-00010.bin",
|
163 |
+
"model.layers.22.self_attn.k_proj.weight": "pytorch_model-00005-of-00010.bin",
|
164 |
+
"model.layers.22.self_attn.o_proj.weight": "pytorch_model-00005-of-00010.bin",
|
165 |
+
"model.layers.22.self_attn.q_proj.weight": "pytorch_model-00005-of-00010.bin",
|
166 |
+
"model.layers.22.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00010.bin",
|
167 |
+
"model.layers.22.self_attn.v_proj.weight": "pytorch_model-00005-of-00010.bin",
|
168 |
+
"model.layers.23.input_layernorm.weight": "pytorch_model-00006-of-00010.bin",
|
169 |
+
"model.layers.23.mlp.down_proj.weight": "pytorch_model-00006-of-00010.bin",
|
170 |
+
"model.layers.23.mlp.gate_proj.weight": "pytorch_model-00006-of-00010.bin",
|
171 |
+
"model.layers.23.mlp.up_proj.weight": "pytorch_model-00006-of-00010.bin",
|
172 |
+
"model.layers.23.post_attention_layernorm.weight": "pytorch_model-00006-of-00010.bin",
|
173 |
+
"model.layers.23.self_attn.k_proj.weight": "pytorch_model-00006-of-00010.bin",
|
174 |
+
"model.layers.23.self_attn.o_proj.weight": "pytorch_model-00006-of-00010.bin",
|
175 |
+
"model.layers.23.self_attn.q_proj.weight": "pytorch_model-00006-of-00010.bin",
|
176 |
+
"model.layers.23.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00010.bin",
|
177 |
+
"model.layers.23.self_attn.v_proj.weight": "pytorch_model-00006-of-00010.bin",
|
178 |
+
"model.layers.24.input_layernorm.weight": "pytorch_model-00006-of-00010.bin",
|
179 |
+
"model.layers.24.mlp.down_proj.weight": "pytorch_model-00006-of-00010.bin",
|
180 |
+
"model.layers.24.mlp.gate_proj.weight": "pytorch_model-00006-of-00010.bin",
|
181 |
+
"model.layers.24.mlp.up_proj.weight": "pytorch_model-00006-of-00010.bin",
|
182 |
+
"model.layers.24.post_attention_layernorm.weight": "pytorch_model-00006-of-00010.bin",
|
183 |
+
"model.layers.24.self_attn.k_proj.weight": "pytorch_model-00006-of-00010.bin",
|
184 |
+
"model.layers.24.self_attn.o_proj.weight": "pytorch_model-00006-of-00010.bin",
|
185 |
+
"model.layers.24.self_attn.q_proj.weight": "pytorch_model-00006-of-00010.bin",
|
186 |
+
"model.layers.24.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00010.bin",
|
187 |
+
"model.layers.24.self_attn.v_proj.weight": "pytorch_model-00006-of-00010.bin",
|
188 |
+
"model.layers.25.input_layernorm.weight": "pytorch_model-00006-of-00010.bin",
|
189 |
+
"model.layers.25.mlp.down_proj.weight": "pytorch_model-00006-of-00010.bin",
|
190 |
+
"model.layers.25.mlp.gate_proj.weight": "pytorch_model-00006-of-00010.bin",
|
191 |
+
"model.layers.25.mlp.up_proj.weight": "pytorch_model-00006-of-00010.bin",
|
192 |
+
"model.layers.25.post_attention_layernorm.weight": "pytorch_model-00006-of-00010.bin",
|
193 |
+
"model.layers.25.self_attn.k_proj.weight": "pytorch_model-00006-of-00010.bin",
|
194 |
+
"model.layers.25.self_attn.o_proj.weight": "pytorch_model-00006-of-00010.bin",
|
195 |
+
"model.layers.25.self_attn.q_proj.weight": "pytorch_model-00006-of-00010.bin",
|
196 |
+
"model.layers.25.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00010.bin",
|
197 |
+
"model.layers.25.self_attn.v_proj.weight": "pytorch_model-00006-of-00010.bin",
|
198 |
+
"model.layers.26.input_layernorm.weight": "pytorch_model-00006-of-00010.bin",
|
199 |
+
"model.layers.26.mlp.down_proj.weight": "pytorch_model-00006-of-00010.bin",
|
200 |
+
"model.layers.26.mlp.gate_proj.weight": "pytorch_model-00006-of-00010.bin",
|
201 |
+
"model.layers.26.mlp.up_proj.weight": "pytorch_model-00006-of-00010.bin",
|
202 |
+
"model.layers.26.post_attention_layernorm.weight": "pytorch_model-00006-of-00010.bin",
|
203 |
+
"model.layers.26.self_attn.k_proj.weight": "pytorch_model-00006-of-00010.bin",
|
204 |
+
"model.layers.26.self_attn.o_proj.weight": "pytorch_model-00006-of-00010.bin",
|
205 |
+
"model.layers.26.self_attn.q_proj.weight": "pytorch_model-00006-of-00010.bin",
|
206 |
+
"model.layers.26.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00010.bin",
|
207 |
+
"model.layers.26.self_attn.v_proj.weight": "pytorch_model-00006-of-00010.bin",
|
208 |
+
"model.layers.27.input_layernorm.weight": "pytorch_model-00006-of-00010.bin",
|
209 |
+
"model.layers.27.mlp.down_proj.weight": "pytorch_model-00007-of-00010.bin",
|
210 |
+
"model.layers.27.mlp.gate_proj.weight": "pytorch_model-00007-of-00010.bin",
|
211 |
+
"model.layers.27.mlp.up_proj.weight": "pytorch_model-00007-of-00010.bin",
|
212 |
+
"model.layers.27.post_attention_layernorm.weight": "pytorch_model-00006-of-00010.bin",
|
213 |
+
"model.layers.27.self_attn.k_proj.weight": "pytorch_model-00006-of-00010.bin",
|
214 |
+
"model.layers.27.self_attn.o_proj.weight": "pytorch_model-00006-of-00010.bin",
|
215 |
+
"model.layers.27.self_attn.q_proj.weight": "pytorch_model-00006-of-00010.bin",
|
216 |
+
"model.layers.27.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00010.bin",
|
217 |
+
"model.layers.27.self_attn.v_proj.weight": "pytorch_model-00006-of-00010.bin",
|
218 |
+
"model.layers.28.input_layernorm.weight": "pytorch_model-00007-of-00010.bin",
|
219 |
+
"model.layers.28.mlp.down_proj.weight": "pytorch_model-00007-of-00010.bin",
|
220 |
+
"model.layers.28.mlp.gate_proj.weight": "pytorch_model-00007-of-00010.bin",
|
221 |
+
"model.layers.28.mlp.up_proj.weight": "pytorch_model-00007-of-00010.bin",
|
222 |
+
"model.layers.28.post_attention_layernorm.weight": "pytorch_model-00007-of-00010.bin",
|
223 |
+
"model.layers.28.self_attn.k_proj.weight": "pytorch_model-00007-of-00010.bin",
|
224 |
+
"model.layers.28.self_attn.o_proj.weight": "pytorch_model-00007-of-00010.bin",
|
225 |
+
"model.layers.28.self_attn.q_proj.weight": "pytorch_model-00007-of-00010.bin",
|
226 |
+
"model.layers.28.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00010.bin",
|
227 |
+
"model.layers.28.self_attn.v_proj.weight": "pytorch_model-00007-of-00010.bin",
|
228 |
+
"model.layers.29.input_layernorm.weight": "pytorch_model-00007-of-00010.bin",
|
229 |
+
"model.layers.29.mlp.down_proj.weight": "pytorch_model-00007-of-00010.bin",
|
230 |
+
"model.layers.29.mlp.gate_proj.weight": "pytorch_model-00007-of-00010.bin",
|
231 |
+
"model.layers.29.mlp.up_proj.weight": "pytorch_model-00007-of-00010.bin",
|
232 |
+
"model.layers.29.post_attention_layernorm.weight": "pytorch_model-00007-of-00010.bin",
|
233 |
+
"model.layers.29.self_attn.k_proj.weight": "pytorch_model-00007-of-00010.bin",
|
234 |
+
"model.layers.29.self_attn.o_proj.weight": "pytorch_model-00007-of-00010.bin",
|
235 |
+
"model.layers.29.self_attn.q_proj.weight": "pytorch_model-00007-of-00010.bin",
|
236 |
+
"model.layers.29.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00010.bin",
|
237 |
+
"model.layers.29.self_attn.v_proj.weight": "pytorch_model-00007-of-00010.bin",
|
238 |
+
"model.layers.3.input_layernorm.weight": "pytorch_model-00002-of-00010.bin",
|
239 |
+
"model.layers.3.mlp.down_proj.weight": "pytorch_model-00002-of-00010.bin",
|
240 |
+
"model.layers.3.mlp.gate_proj.weight": "pytorch_model-00002-of-00010.bin",
|
241 |
+
"model.layers.3.mlp.up_proj.weight": "pytorch_model-00002-of-00010.bin",
|
242 |
+
"model.layers.3.post_attention_layernorm.weight": "pytorch_model-00002-of-00010.bin",
|
243 |
+
"model.layers.3.self_attn.k_proj.weight": "pytorch_model-00002-of-00010.bin",
|
244 |
+
"model.layers.3.self_attn.o_proj.weight": "pytorch_model-00002-of-00010.bin",
|
245 |
+
"model.layers.3.self_attn.q_proj.weight": "pytorch_model-00002-of-00010.bin",
|
246 |
+
"model.layers.3.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00010.bin",
|
247 |
+
"model.layers.3.self_attn.v_proj.weight": "pytorch_model-00002-of-00010.bin",
|
248 |
+
"model.layers.30.input_layernorm.weight": "pytorch_model-00007-of-00010.bin",
|
249 |
+
"model.layers.30.mlp.down_proj.weight": "pytorch_model-00007-of-00010.bin",
|
250 |
+
"model.layers.30.mlp.gate_proj.weight": "pytorch_model-00007-of-00010.bin",
|
251 |
+
"model.layers.30.mlp.up_proj.weight": "pytorch_model-00007-of-00010.bin",
|
252 |
+
"model.layers.30.post_attention_layernorm.weight": "pytorch_model-00007-of-00010.bin",
|
253 |
+
"model.layers.30.self_attn.k_proj.weight": "pytorch_model-00007-of-00010.bin",
|
254 |
+
"model.layers.30.self_attn.o_proj.weight": "pytorch_model-00007-of-00010.bin",
|
255 |
+
"model.layers.30.self_attn.q_proj.weight": "pytorch_model-00007-of-00010.bin",
|
256 |
+
"model.layers.30.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00010.bin",
|
257 |
+
"model.layers.30.self_attn.v_proj.weight": "pytorch_model-00007-of-00010.bin",
|
258 |
+
"model.layers.31.input_layernorm.weight": "pytorch_model-00007-of-00010.bin",
|
259 |
+
"model.layers.31.mlp.down_proj.weight": "pytorch_model-00007-of-00010.bin",
|
260 |
+
"model.layers.31.mlp.gate_proj.weight": "pytorch_model-00007-of-00010.bin",
|
261 |
+
"model.layers.31.mlp.up_proj.weight": "pytorch_model-00007-of-00010.bin",
|
262 |
+
"model.layers.31.post_attention_layernorm.weight": "pytorch_model-00007-of-00010.bin",
|
263 |
+
"model.layers.31.self_attn.k_proj.weight": "pytorch_model-00007-of-00010.bin",
|
264 |
+
"model.layers.31.self_attn.o_proj.weight": "pytorch_model-00007-of-00010.bin",
|
265 |
+
"model.layers.31.self_attn.q_proj.weight": "pytorch_model-00007-of-00010.bin",
|
266 |
+
"model.layers.31.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00010.bin",
|
267 |
+
"model.layers.31.self_attn.v_proj.weight": "pytorch_model-00007-of-00010.bin",
|
268 |
+
"model.layers.32.input_layernorm.weight": "pytorch_model-00007-of-00010.bin",
|
269 |
+
"model.layers.32.mlp.down_proj.weight": "pytorch_model-00008-of-00010.bin",
|
270 |
+
"model.layers.32.mlp.gate_proj.weight": "pytorch_model-00008-of-00010.bin",
|
271 |
+
"model.layers.32.mlp.up_proj.weight": "pytorch_model-00008-of-00010.bin",
|
272 |
+
"model.layers.32.post_attention_layernorm.weight": "pytorch_model-00007-of-00010.bin",
|
273 |
+
"model.layers.32.self_attn.k_proj.weight": "pytorch_model-00007-of-00010.bin",
|
274 |
+
"model.layers.32.self_attn.o_proj.weight": "pytorch_model-00007-of-00010.bin",
|
275 |
+
"model.layers.32.self_attn.q_proj.weight": "pytorch_model-00007-of-00010.bin",
|
276 |
+
"model.layers.32.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00010.bin",
|
277 |
+
"model.layers.32.self_attn.v_proj.weight": "pytorch_model-00007-of-00010.bin",
|
278 |
+
"model.layers.33.input_layernorm.weight": "pytorch_model-00008-of-00010.bin",
|
279 |
+
"model.layers.33.mlp.down_proj.weight": "pytorch_model-00008-of-00010.bin",
|
280 |
+
"model.layers.33.mlp.gate_proj.weight": "pytorch_model-00008-of-00010.bin",
|
281 |
+
"model.layers.33.mlp.up_proj.weight": "pytorch_model-00008-of-00010.bin",
|
282 |
+
"model.layers.33.post_attention_layernorm.weight": "pytorch_model-00008-of-00010.bin",
|
283 |
+
"model.layers.33.self_attn.k_proj.weight": "pytorch_model-00008-of-00010.bin",
|
284 |
+
"model.layers.33.self_attn.o_proj.weight": "pytorch_model-00008-of-00010.bin",
|
285 |
+
"model.layers.33.self_attn.q_proj.weight": "pytorch_model-00008-of-00010.bin",
|
286 |
+
"model.layers.33.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00010.bin",
|
287 |
+
"model.layers.33.self_attn.v_proj.weight": "pytorch_model-00008-of-00010.bin",
|
288 |
+
"model.layers.34.input_layernorm.weight": "pytorch_model-00008-of-00010.bin",
|
289 |
+
"model.layers.34.mlp.down_proj.weight": "pytorch_model-00008-of-00010.bin",
|
290 |
+
"model.layers.34.mlp.gate_proj.weight": "pytorch_model-00008-of-00010.bin",
|
291 |
+
"model.layers.34.mlp.up_proj.weight": "pytorch_model-00008-of-00010.bin",
|
292 |
+
"model.layers.34.post_attention_layernorm.weight": "pytorch_model-00008-of-00010.bin",
|
293 |
+
"model.layers.34.self_attn.k_proj.weight": "pytorch_model-00008-of-00010.bin",
|
294 |
+
"model.layers.34.self_attn.o_proj.weight": "pytorch_model-00008-of-00010.bin",
|
295 |
+
"model.layers.34.self_attn.q_proj.weight": "pytorch_model-00008-of-00010.bin",
|
296 |
+
"model.layers.34.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00010.bin",
|
297 |
+
"model.layers.34.self_attn.v_proj.weight": "pytorch_model-00008-of-00010.bin",
|
298 |
+
"model.layers.35.input_layernorm.weight": "pytorch_model-00008-of-00010.bin",
|
299 |
+
"model.layers.35.mlp.down_proj.weight": "pytorch_model-00008-of-00010.bin",
|
300 |
+
"model.layers.35.mlp.gate_proj.weight": "pytorch_model-00008-of-00010.bin",
|
301 |
+
"model.layers.35.mlp.up_proj.weight": "pytorch_model-00008-of-00010.bin",
|
302 |
+
"model.layers.35.post_attention_layernorm.weight": "pytorch_model-00008-of-00010.bin",
|
303 |
+
"model.layers.35.self_attn.k_proj.weight": "pytorch_model-00008-of-00010.bin",
|
304 |
+
"model.layers.35.self_attn.o_proj.weight": "pytorch_model-00008-of-00010.bin",
|
305 |
+
"model.layers.35.self_attn.q_proj.weight": "pytorch_model-00008-of-00010.bin",
|
306 |
+
"model.layers.35.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00010.bin",
|
307 |
+
"model.layers.35.self_attn.v_proj.weight": "pytorch_model-00008-of-00010.bin",
|
308 |
+
"model.layers.36.input_layernorm.weight": "pytorch_model-00008-of-00010.bin",
|
309 |
+
"model.layers.36.mlp.down_proj.weight": "pytorch_model-00008-of-00010.bin",
|
310 |
+
"model.layers.36.mlp.gate_proj.weight": "pytorch_model-00008-of-00010.bin",
|
311 |
+
"model.layers.36.mlp.up_proj.weight": "pytorch_model-00008-of-00010.bin",
|
312 |
+
"model.layers.36.post_attention_layernorm.weight": "pytorch_model-00008-of-00010.bin",
|
313 |
+
"model.layers.36.self_attn.k_proj.weight": "pytorch_model-00008-of-00010.bin",
|
314 |
+
"model.layers.36.self_attn.o_proj.weight": "pytorch_model-00008-of-00010.bin",
|
315 |
+
"model.layers.36.self_attn.q_proj.weight": "pytorch_model-00008-of-00010.bin",
|
316 |
+
"model.layers.36.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00010.bin",
|
317 |
+
"model.layers.36.self_attn.v_proj.weight": "pytorch_model-00008-of-00010.bin",
|
318 |
+
"model.layers.37.input_layernorm.weight": "pytorch_model-00008-of-00010.bin",
|
319 |
+
"model.layers.37.mlp.down_proj.weight": "pytorch_model-00009-of-00010.bin",
|
320 |
+
"model.layers.37.mlp.gate_proj.weight": "pytorch_model-00009-of-00010.bin",
|
321 |
+
"model.layers.37.mlp.up_proj.weight": "pytorch_model-00009-of-00010.bin",
|
322 |
+
"model.layers.37.post_attention_layernorm.weight": "pytorch_model-00008-of-00010.bin",
|
323 |
+
"model.layers.37.self_attn.k_proj.weight": "pytorch_model-00008-of-00010.bin",
|
324 |
+
"model.layers.37.self_attn.o_proj.weight": "pytorch_model-00008-of-00010.bin",
|
325 |
+
"model.layers.37.self_attn.q_proj.weight": "pytorch_model-00008-of-00010.bin",
|
326 |
+
"model.layers.37.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00010.bin",
|
327 |
+
"model.layers.37.self_attn.v_proj.weight": "pytorch_model-00008-of-00010.bin",
|
328 |
+
"model.layers.38.input_layernorm.weight": "pytorch_model-00009-of-00010.bin",
|
329 |
+
"model.layers.38.mlp.down_proj.weight": "pytorch_model-00009-of-00010.bin",
|
330 |
+
"model.layers.38.mlp.gate_proj.weight": "pytorch_model-00009-of-00010.bin",
|
331 |
+
"model.layers.38.mlp.up_proj.weight": "pytorch_model-00009-of-00010.bin",
|
332 |
+
"model.layers.38.post_attention_layernorm.weight": "pytorch_model-00009-of-00010.bin",
|
333 |
+
"model.layers.38.self_attn.k_proj.weight": "pytorch_model-00009-of-00010.bin",
|
334 |
+
"model.layers.38.self_attn.o_proj.weight": "pytorch_model-00009-of-00010.bin",
|
335 |
+
"model.layers.38.self_attn.q_proj.weight": "pytorch_model-00009-of-00010.bin",
|
336 |
+
"model.layers.38.self_attn.rotary_emb.inv_freq": "pytorch_model-00009-of-00010.bin",
|
337 |
+
"model.layers.38.self_attn.v_proj.weight": "pytorch_model-00009-of-00010.bin",
|
338 |
+
"model.layers.39.input_layernorm.weight": "pytorch_model-00009-of-00010.bin",
|
339 |
+
"model.layers.39.mlp.down_proj.weight": "pytorch_model-00009-of-00010.bin",
|
340 |
+
"model.layers.39.mlp.gate_proj.weight": "pytorch_model-00009-of-00010.bin",
|
341 |
+
"model.layers.39.mlp.up_proj.weight": "pytorch_model-00009-of-00010.bin",
|
342 |
+
"model.layers.39.post_attention_layernorm.weight": "pytorch_model-00009-of-00010.bin",
|
343 |
+
"model.layers.39.self_attn.k_proj.weight": "pytorch_model-00009-of-00010.bin",
|
344 |
+
"model.layers.39.self_attn.o_proj.weight": "pytorch_model-00009-of-00010.bin",
|
345 |
+
"model.layers.39.self_attn.q_proj.weight": "pytorch_model-00009-of-00010.bin",
|
346 |
+
"model.layers.39.self_attn.rotary_emb.inv_freq": "pytorch_model-00009-of-00010.bin",
|
347 |
+
"model.layers.39.self_attn.v_proj.weight": "pytorch_model-00009-of-00010.bin",
|
348 |
+
"model.layers.4.input_layernorm.weight": "pytorch_model-00002-of-00010.bin",
|
349 |
+
"model.layers.4.mlp.down_proj.weight": "pytorch_model-00002-of-00010.bin",
|
350 |
+
"model.layers.4.mlp.gate_proj.weight": "pytorch_model-00002-of-00010.bin",
|
351 |
+
"model.layers.4.mlp.up_proj.weight": "pytorch_model-00002-of-00010.bin",
|
352 |
+
"model.layers.4.post_attention_layernorm.weight": "pytorch_model-00002-of-00010.bin",
|
353 |
+
"model.layers.4.self_attn.k_proj.weight": "pytorch_model-00002-of-00010.bin",
|
354 |
+
"model.layers.4.self_attn.o_proj.weight": "pytorch_model-00002-of-00010.bin",
|
355 |
+
"model.layers.4.self_attn.q_proj.weight": "pytorch_model-00002-of-00010.bin",
|
356 |
+
"model.layers.4.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00010.bin",
|
357 |
+
"model.layers.4.self_attn.v_proj.weight": "pytorch_model-00002-of-00010.bin",
|
358 |
+
"model.layers.5.input_layernorm.weight": "pytorch_model-00002-of-00010.bin",
|
359 |
+
"model.layers.5.mlp.down_proj.weight": "pytorch_model-00002-of-00010.bin",
|
360 |
+
"model.layers.5.mlp.gate_proj.weight": "pytorch_model-00002-of-00010.bin",
|
361 |
+
"model.layers.5.mlp.up_proj.weight": "pytorch_model-00002-of-00010.bin",
|
362 |
+
"model.layers.5.post_attention_layernorm.weight": "pytorch_model-00002-of-00010.bin",
|
363 |
+
"model.layers.5.self_attn.k_proj.weight": "pytorch_model-00002-of-00010.bin",
|
364 |
+
"model.layers.5.self_attn.o_proj.weight": "pytorch_model-00002-of-00010.bin",
|
365 |
+
"model.layers.5.self_attn.q_proj.weight": "pytorch_model-00002-of-00010.bin",
|
366 |
+
"model.layers.5.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00010.bin",
|
367 |
+
"model.layers.5.self_attn.v_proj.weight": "pytorch_model-00002-of-00010.bin",
|
368 |
+
"model.layers.6.input_layernorm.weight": "pytorch_model-00002-of-00010.bin",
|
369 |
+
"model.layers.6.mlp.down_proj.weight": "pytorch_model-00002-of-00010.bin",
|
370 |
+
"model.layers.6.mlp.gate_proj.weight": "pytorch_model-00002-of-00010.bin",
|
371 |
+
"model.layers.6.mlp.up_proj.weight": "pytorch_model-00002-of-00010.bin",
|
372 |
+
"model.layers.6.post_attention_layernorm.weight": "pytorch_model-00002-of-00010.bin",
|
373 |
+
"model.layers.6.self_attn.k_proj.weight": "pytorch_model-00002-of-00010.bin",
|
374 |
+
"model.layers.6.self_attn.o_proj.weight": "pytorch_model-00002-of-00010.bin",
|
375 |
+
"model.layers.6.self_attn.q_proj.weight": "pytorch_model-00002-of-00010.bin",
|
376 |
+
"model.layers.6.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00010.bin",
|
377 |
+
"model.layers.6.self_attn.v_proj.weight": "pytorch_model-00002-of-00010.bin",
|
378 |
+
"model.layers.7.input_layernorm.weight": "pytorch_model-00002-of-00010.bin",
|
379 |
+
"model.layers.7.mlp.down_proj.weight": "pytorch_model-00003-of-00010.bin",
|
380 |
+
"model.layers.7.mlp.gate_proj.weight": "pytorch_model-00003-of-00010.bin",
|
381 |
+
"model.layers.7.mlp.up_proj.weight": "pytorch_model-00003-of-00010.bin",
|
382 |
+
"model.layers.7.post_attention_layernorm.weight": "pytorch_model-00002-of-00010.bin",
|
383 |
+
"model.layers.7.self_attn.k_proj.weight": "pytorch_model-00002-of-00010.bin",
|
384 |
+
"model.layers.7.self_attn.o_proj.weight": "pytorch_model-00002-of-00010.bin",
|
385 |
+
"model.layers.7.self_attn.q_proj.weight": "pytorch_model-00002-of-00010.bin",
|
386 |
+
"model.layers.7.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00010.bin",
|
387 |
+
"model.layers.7.self_attn.v_proj.weight": "pytorch_model-00002-of-00010.bin",
|
388 |
+
"model.layers.8.input_layernorm.weight": "pytorch_model-00003-of-00010.bin",
|
389 |
+
"model.layers.8.mlp.down_proj.weight": "pytorch_model-00003-of-00010.bin",
|
390 |
+
"model.layers.8.mlp.gate_proj.weight": "pytorch_model-00003-of-00010.bin",
|
391 |
+
"model.layers.8.mlp.up_proj.weight": "pytorch_model-00003-of-00010.bin",
|
392 |
+
"model.layers.8.post_attention_layernorm.weight": "pytorch_model-00003-of-00010.bin",
|
393 |
+
"model.layers.8.self_attn.k_proj.weight": "pytorch_model-00003-of-00010.bin",
|
394 |
+
"model.layers.8.self_attn.o_proj.weight": "pytorch_model-00003-of-00010.bin",
|
395 |
+
"model.layers.8.self_attn.q_proj.weight": "pytorch_model-00003-of-00010.bin",
|
396 |
+
"model.layers.8.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00010.bin",
|
397 |
+
"model.layers.8.self_attn.v_proj.weight": "pytorch_model-00003-of-00010.bin",
|
398 |
+
"model.layers.9.input_layernorm.weight": "pytorch_model-00003-of-00010.bin",
|
399 |
+
"model.layers.9.mlp.down_proj.weight": "pytorch_model-00003-of-00010.bin",
|
400 |
+
"model.layers.9.mlp.gate_proj.weight": "pytorch_model-00003-of-00010.bin",
|
401 |
+
"model.layers.9.mlp.up_proj.weight": "pytorch_model-00003-of-00010.bin",
|
402 |
+
"model.layers.9.post_attention_layernorm.weight": "pytorch_model-00003-of-00010.bin",
|
403 |
+
"model.layers.9.self_attn.k_proj.weight": "pytorch_model-00003-of-00010.bin",
|
404 |
+
"model.layers.9.self_attn.o_proj.weight": "pytorch_model-00003-of-00010.bin",
|
405 |
+
"model.layers.9.self_attn.q_proj.weight": "pytorch_model-00003-of-00010.bin",
|
406 |
+
"model.layers.9.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00010.bin",
|
407 |
+
"model.layers.9.self_attn.v_proj.weight": "pytorch_model-00003-of-00010.bin",
|
408 |
+
"model.norm.weight": "pytorch_model-00009-of-00010.bin"
|
409 |
}
|
410 |
}
|
tokenizer.json
CHANGED
@@ -58,14 +58,6 @@
|
|
58 |
"special": true
|
59 |
}
|
60 |
],
|
61 |
-
"normalizer": {
|
62 |
-
"type": "Sequence",
|
63 |
-
"normalizers": [
|
64 |
-
{
|
65 |
-
"type": "NFKC"
|
66 |
-
}
|
67 |
-
]
|
68 |
-
},
|
69 |
"pre_tokenizer": {
|
70 |
"type": "Sequence",
|
71 |
"pretokenizers": [
|
@@ -86,9 +78,17 @@
|
|
86 |
},
|
87 |
"post_processor": null,
|
88 |
"decoder": {
|
89 |
-
"type": "
|
90 |
-
"
|
91 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
92 |
},
|
93 |
"model": {
|
94 |
"type": "BPE",
|
@@ -100376,7 +100376,263 @@
|
|
100376 |
"nj": 100274,
|
100377 |
"iful": 100275,
|
100378 |
"▁solution": 100276,
|
100379 |
-
"\n": 100277
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
100380 |
},
|
100381 |
"merges": [
|
100382 |
"▁ t",
|
@@ -104090,4 +104346,4 @@
|
|
104090 |
"▁sol ution"
|
104091 |
]
|
104092 |
}
|
104093 |
-
}
|
|
|
58 |
"special": true
|
59 |
}
|
60 |
],
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
61 |
"pre_tokenizer": {
|
62 |
"type": "Sequence",
|
63 |
"pretokenizers": [
|
|
|
78 |
},
|
79 |
"post_processor": null,
|
80 |
"decoder": {
|
81 |
+
"type": "Sequence",
|
82 |
+
"decoders": [
|
83 |
+
{
|
84 |
+
"type": "Metaspace",
|
85 |
+
"replacement": "▁",
|
86 |
+
"add_prefix_space": false
|
87 |
+
},
|
88 |
+
{
|
89 |
+
"type": "ByteFallback"
|
90 |
+
}
|
91 |
+
]
|
92 |
},
|
93 |
"model": {
|
94 |
"type": "BPE",
|
|
|
100376 |
"nj": 100274,
|
100377 |
"iful": 100275,
|
100378 |
"▁solution": 100276,
|
100379 |
+
"\n": 100277,
|
100380 |
+
"<0x00>": 100278,
|
100381 |
+
"<0x01>": 100279,
|
100382 |
+
"<0x02>": 100280,
|
100383 |
+
"<0x03>": 100281,
|
100384 |
+
"<0x04>": 100282,
|
100385 |
+
"<0x05>": 100283,
|
100386 |
+
"<0x06>": 100284,
|
100387 |
+
"<0x07>": 100285,
|
100388 |
+
"<0x08>": 100286,
|
100389 |
+
"<0x09>": 100287,
|
100390 |
+
"<0x0A>": 100288,
|
100391 |
+
"<0x0B>": 100289,
|
100392 |
+
"<0x0C>": 100290,
|
100393 |
+
"<0x0D>": 100291,
|
100394 |
+
"<0x0E>": 100292,
|
100395 |
+
"<0x0F>": 100293,
|
100396 |
+
"<0x10>": 100294,
|
100397 |
+
"<0x11>": 100295,
|
100398 |
+
"<0x12>": 100296,
|
100399 |
+
"<0x13>": 100297,
|
100400 |
+
"<0x14>": 100298,
|
100401 |
+
"<0x15>": 100299,
|
100402 |
+
"<0x16>": 100300,
|
100403 |
+
"<0x17>": 100301,
|
100404 |
+
"<0x18>": 100302,
|
100405 |
+
"<0x19>": 100303,
|
100406 |
+
"<0x1A>": 100304,
|
100407 |
+
"<0x1B>": 100305,
|
100408 |
+
"<0x1C>": 100306,
|
100409 |
+
"<0x1D>": 100307,
|
100410 |
+
"<0x1E>": 100308,
|
100411 |
+
"<0x1F>": 100309,
|
100412 |
+
"<0x20>": 100310,
|
100413 |
+
"<0x21>": 100311,
|
100414 |
+
"<0x22>": 100312,
|
100415 |
+
"<0x23>": 100313,
|
100416 |
+
"<0x24>": 100314,
|
100417 |
+
"<0x25>": 100315,
|
100418 |
+
"<0x26>": 100316,
|
100419 |
+
"<0x27>": 100317,
|
100420 |
+
"<0x28>": 100318,
|
100421 |
+
"<0x29>": 100319,
|
100422 |
+
"<0x2A>": 100320,
|
100423 |
+
"<0x2B>": 100321,
|
100424 |
+
"<0x2C>": 100322,
|
100425 |
+
"<0x2D>": 100323,
|
100426 |
+
"<0x2E>": 100324,
|
100427 |
+
"<0x2F>": 100325,
|
100428 |
+
"<0x30>": 100326,
|
100429 |
+
"<0x31>": 100327,
|
100430 |
+
"<0x32>": 100328,
|
100431 |
+
"<0x33>": 100329,
|
100432 |
+
"<0x34>": 100330,
|
100433 |
+
"<0x35>": 100331,
|
100434 |
+
"<0x36>": 100332,
|
100435 |
+
"<0x37>": 100333,
|
100436 |
+
"<0x38>": 100334,
|
100437 |
+
"<0x39>": 100335,
|
100438 |
+
"<0x3A>": 100336,
|
100439 |
+
"<0x3B>": 100337,
|
100440 |
+
"<0x3C>": 100338,
|
100441 |
+
"<0x3D>": 100339,
|
100442 |
+
"<0x3E>": 100340,
|
100443 |
+
"<0x3F>": 100341,
|
100444 |
+
"<0x40>": 100342,
|
100445 |
+
"<0x41>": 100343,
|
100446 |
+
"<0x42>": 100344,
|
100447 |
+
"<0x43>": 100345,
|
100448 |
+
"<0x44>": 100346,
|
100449 |
+
"<0x45>": 100347,
|
100450 |
+
"<0x46>": 100348,
|
100451 |
+
"<0x47>": 100349,
|
100452 |
+
"<0x48>": 100350,
|
100453 |
+
"<0x49>": 100351,
|
100454 |
+
"<0x4A>": 100352,
|
100455 |
+
"<0x4B>": 100353,
|
100456 |
+
"<0x4C>": 100354,
|
100457 |
+
"<0x4D>": 100355,
|
100458 |
+
"<0x4E>": 100356,
|
100459 |
+
"<0x4F>": 100357,
|
100460 |
+
"<0x50>": 100358,
|
100461 |
+
"<0x51>": 100359,
|
100462 |
+
"<0x52>": 100360,
|
100463 |
+
"<0x53>": 100361,
|
100464 |
+
"<0x54>": 100362,
|
100465 |
+
"<0x55>": 100363,
|
100466 |
+
"<0x56>": 100364,
|
100467 |
+
"<0x57>": 100365,
|
100468 |
+
"<0x58>": 100366,
|
100469 |
+
"<0x59>": 100367,
|
100470 |
+
"<0x5A>": 100368,
|
100471 |
+
"<0x5B>": 100369,
|
100472 |
+
"<0x5C>": 100370,
|
100473 |
+
"<0x5D>": 100371,
|
100474 |
+
"<0x5E>": 100372,
|
100475 |
+
"<0x5F>": 100373,
|
100476 |
+
"<0x60>": 100374,
|
100477 |
+
"<0x61>": 100375,
|
100478 |
+
"<0x62>": 100376,
|
100479 |
+
"<0x63>": 100377,
|
100480 |
+
"<0x64>": 100378,
|
100481 |
+
"<0x65>": 100379,
|
100482 |
+
"<0x66>": 100380,
|
100483 |
+
"<0x67>": 100381,
|
100484 |
+
"<0x68>": 100382,
|
100485 |
+
"<0x69>": 100383,
|
100486 |
+
"<0x6A>": 100384,
|
100487 |
+
"<0x6B>": 100385,
|
100488 |
+
"<0x6C>": 100386,
|
100489 |
+
"<0x6D>": 100387,
|
100490 |
+
"<0x6E>": 100388,
|
100491 |
+
"<0x6F>": 100389,
|
100492 |
+
"<0x70>": 100390,
|
100493 |
+
"<0x71>": 100391,
|
100494 |
+
"<0x72>": 100392,
|
100495 |
+
"<0x73>": 100393,
|
100496 |
+
"<0x74>": 100394,
|
100497 |
+
"<0x75>": 100395,
|
100498 |
+
"<0x76>": 100396,
|
100499 |
+
"<0x77>": 100397,
|
100500 |
+
"<0x78>": 100398,
|
100501 |
+
"<0x79>": 100399,
|
100502 |
+
"<0x7A>": 100400,
|
100503 |
+
"<0x7B>": 100401,
|
100504 |
+
"<0x7C>": 100402,
|
100505 |
+
"<0x7D>": 100403,
|
100506 |
+
"<0x7E>": 100404,
|
100507 |
+
"<0x7F>": 100405,
|
100508 |
+
"<0x80>": 100406,
|
100509 |
+
"<0x81>": 100407,
|
100510 |
+
"<0x82>": 100408,
|
100511 |
+
"<0x83>": 100409,
|
100512 |
+
"<0x84>": 100410,
|
100513 |
+
"<0x85>": 100411,
|
100514 |
+
"<0x86>": 100412,
|
100515 |
+
"<0x87>": 100413,
|
100516 |
+
"<0x88>": 100414,
|
100517 |
+
"<0x89>": 100415,
|
100518 |
+
"<0x8A>": 100416,
|
100519 |
+
"<0x8B>": 100417,
|
100520 |
+
"<0x8C>": 100418,
|
100521 |
+
"<0x8D>": 100419,
|
100522 |
+
"<0x8E>": 100420,
|
100523 |
+
"<0x8F>": 100421,
|
100524 |
+
"<0x90>": 100422,
|
100525 |
+
"<0x91>": 100423,
|
100526 |
+
"<0x92>": 100424,
|
100527 |
+
"<0x93>": 100425,
|
100528 |
+
"<0x94>": 100426,
|
100529 |
+
"<0x95>": 100427,
|
100530 |
+
"<0x96>": 100428,
|
100531 |
+
"<0x97>": 100429,
|
100532 |
+
"<0x98>": 100430,
|
100533 |
+
"<0x99>": 100431,
|
100534 |
+
"<0x9A>": 100432,
|
100535 |
+
"<0x9B>": 100433,
|
100536 |
+
"<0x9C>": 100434,
|
100537 |
+
"<0x9D>": 100435,
|
100538 |
+
"<0x9E>": 100436,
|
100539 |
+
"<0x9F>": 100437,
|
100540 |
+
"<0xA0>": 100438,
|
100541 |
+
"<0xA1>": 100439,
|
100542 |
+
"<0xA2>": 100440,
|
100543 |
+
"<0xA3>": 100441,
|
100544 |
+
"<0xA4>": 100442,
|
100545 |
+
"<0xA5>": 100443,
|
100546 |
+
"<0xA6>": 100444,
|
100547 |
+
"<0xA7>": 100445,
|
100548 |
+
"<0xA8>": 100446,
|
100549 |
+
"<0xA9>": 100447,
|
100550 |
+
"<0xAA>": 100448,
|
100551 |
+
"<0xAB>": 100449,
|
100552 |
+
"<0xAC>": 100450,
|
100553 |
+
"<0xAD>": 100451,
|
100554 |
+
"<0xAE>": 100452,
|
100555 |
+
"<0xAF>": 100453,
|
100556 |
+
"<0xB0>": 100454,
|
100557 |
+
"<0xB1>": 100455,
|
100558 |
+
"<0xB2>": 100456,
|
100559 |
+
"<0xB3>": 100457,
|
100560 |
+
"<0xB4>": 100458,
|
100561 |
+
"<0xB5>": 100459,
|
100562 |
+
"<0xB6>": 100460,
|
100563 |
+
"<0xB7>": 100461,
|
100564 |
+
"<0xB8>": 100462,
|
100565 |
+
"<0xB9>": 100463,
|
100566 |
+
"<0xBA>": 100464,
|
100567 |
+
"<0xBB>": 100465,
|
100568 |
+
"<0xBC>": 100466,
|
100569 |
+
"<0xBD>": 100467,
|
100570 |
+
"<0xBE>": 100468,
|
100571 |
+
"<0xBF>": 100469,
|
100572 |
+
"<0xC0>": 100470,
|
100573 |
+
"<0xC1>": 100471,
|
100574 |
+
"<0xC2>": 100472,
|
100575 |
+
"<0xC3>": 100473,
|
100576 |
+
"<0xC4>": 100474,
|
100577 |
+
"<0xC5>": 100475,
|
100578 |
+
"<0xC6>": 100476,
|
100579 |
+
"<0xC7>": 100477,
|
100580 |
+
"<0xC8>": 100478,
|
100581 |
+
"<0xC9>": 100479,
|
100582 |
+
"<0xCA>": 100480,
|
100583 |
+
"<0xCB>": 100481,
|
100584 |
+
"<0xCC>": 100482,
|
100585 |
+
"<0xCD>": 100483,
|
100586 |
+
"<0xCE>": 100484,
|
100587 |
+
"<0xCF>": 100485,
|
100588 |
+
"<0xD0>": 100486,
|
100589 |
+
"<0xD1>": 100487,
|
100590 |
+
"<0xD2>": 100488,
|
100591 |
+
"<0xD3>": 100489,
|
100592 |
+
"<0xD4>": 100490,
|
100593 |
+
"<0xD5>": 100491,
|
100594 |
+
"<0xD6>": 100492,
|
100595 |
+
"<0xD7>": 100493,
|
100596 |
+
"<0xD8>": 100494,
|
100597 |
+
"<0xD9>": 100495,
|
100598 |
+
"<0xDA>": 100496,
|
100599 |
+
"<0xDB>": 100497,
|
100600 |
+
"<0xDC>": 100498,
|
100601 |
+
"<0xDD>": 100499,
|
100602 |
+
"<0xDE>": 100500,
|
100603 |
+
"<0xDF>": 100501,
|
100604 |
+
"<0xE0>": 100502,
|
100605 |
+
"<0xE1>": 100503,
|
100606 |
+
"<0xE2>": 100504,
|
100607 |
+
"<0xE3>": 100505,
|
100608 |
+
"<0xE4>": 100506,
|
100609 |
+
"<0xE5>": 100507,
|
100610 |
+
"<0xE6>": 100508,
|
100611 |
+
"<0xE7>": 100509,
|
100612 |
+
"<0xE8>": 100510,
|
100613 |
+
"<0xE9>": 100511,
|
100614 |
+
"<0xEA>": 100512,
|
100615 |
+
"<0xEB>": 100513,
|
100616 |
+
"<0xEC>": 100514,
|
100617 |
+
"<0xED>": 100515,
|
100618 |
+
"<0xEE>": 100516,
|
100619 |
+
"<0xEF>": 100517,
|
100620 |
+
"<0xF0>": 100518,
|
100621 |
+
"<0xF1>": 100519,
|
100622 |
+
"<0xF2>": 100520,
|
100623 |
+
"<0xF3>": 100521,
|
100624 |
+
"<0xF4>": 100522,
|
100625 |
+
"<0xF5>": 100523,
|
100626 |
+
"<0xF6>": 100524,
|
100627 |
+
"<0xF7>": 100525,
|
100628 |
+
"<0xF8>": 100526,
|
100629 |
+
"<0xF9>": 100527,
|
100630 |
+
"<0xFA>": 100528,
|
100631 |
+
"<0xFB>": 100529,
|
100632 |
+
"<0xFC>": 100530,
|
100633 |
+
"<0xFD>": 100531,
|
100634 |
+
"<0xFE>": 100532,
|
100635 |
+
"<0xFF>": 100533
|
100636 |
},
|
100637 |
"merges": [
|
100638 |
"▁ t",
|
|
|
104346 |
"▁sol ution"
|
104347 |
]
|
104348 |
}
|
104349 |
+
}
|