zRzRzRzRzRzRzR commited on
Commit
1443e84
1 Parent(s): 027ad0e

Upload folder using huggingface_hub

Browse files
.mdl ADDED
Binary file (60 Bytes). View file
 
.msc ADDED
Binary file (1.53 kB). View file
 
.mv ADDED
@@ -0,0 +1 @@
 
 
1
+ Revision:master,CreatedAt:1719926951
LICENSE ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ The CogVLM License
2
+
3
+ 1. Definitions
4
+
5
+ “Licensor” means the CogVLM Model Team that distributes its Software.
6
+
7
+ “Software” means the CogVLM model parameters made available under this license.
8
+
9
+ 2. License Grant
10
+
11
+ Under the terms and conditions of this license, the Licensor hereby grants you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty-free copyright license.
12
+ This license permits you to use all open-source models in this repository for academic research free. Users who wish to use the models for commercial purposes must register [here](https://open.bigmodel.cn/mla/form).
13
+ Registered users may use the models for commercial activities free of charge, but must comply with all terms and conditions of this license.
14
+ The license notice shall be included in all copies or substantial portions of the Software.
15
+
16
+ 3. Restriction
17
+
18
+ You will not use, copy, modify, merge, publish, distribute, reproduce, or create derivative works of the Software, in whole or in part, for any military, or illegal purposes.
19
+
20
+ You will not use the Software for any act that may undermine China's national security and national unity, harm the public interest of society, or infringe upon the rights and interests of human beings.
21
+
22
+ 4. Disclaimer
23
+
24
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
25
+
26
+ 5. Limitation of Liability
27
+
28
+ EXCEPT TO THE EXTENT PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, WHETHER BASED IN TORT, NEGLIGENCE, CONTRACT, LIABILITY, OR OTHERWISE WILL ANY LICENSOR BE LIABLE TO YOU FOR ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES, OR ANY OTHER COMMERCIAL LOSSES, EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
29
+
30
+ 6. Dispute Resolution
31
+
32
+ This license shall be governed and construed in accordance with the laws of People’s Republic of China. Any dispute arising from or in connection with this License shall be submitted to Haidian District People's Court in Beijing.
33
+
34
+ Note that the license is subject to update to a more comprehensive version. For any questions related to the license and copyright, please contact us at license@zhipuai.cn.
35
+
36
+ 7. Llama3 and EVA-CLIP2 License
37
+
38
+ For the CogVLM2 open source model based on the LLama3 series model as the base model, the Llama3 license conditions (https://llama.meta.com/llama3/license/, a copy of this repository license conditions) and the EVA-CLIP2 license conditions (MIT , https://github.com/baaivision/EVA/blob/master/LICENSE) for model weights.
39
+
40
+ 1. 定义
41
+
42
+ “许可方”是指分发其软件的 CogVLM 模型团队。
43
+
44
+ “软件”是指根据本许可提供的 CogVLM 模型参数。
45
+
46
+ 2. 许可授予
47
+
48
+ 根据本许可的条款和条件,许可方特此授予您非排他性、全球性、不可转让、不可再许可、可撤销、免版税的版权许可。
49
+ 本许可允许您免费使用本仓库中的所有开源模型进行学术研究,对于希望将模型用于商业目的的用户,需在[这里](https://open.bigmodel.cn/mla/form)完成登记。
50
+ 经过登记的用户可以免费使用本模型进行商业活动,但必须遵守本许可的所有条款和条件。
51
+ 上述版权声明和本许可声明应包含在本软件的所有副本或重要部分中。
52
+
53
+ 3.限制
54
+
55
+ 您不得出于任何军事或非法目的使用、复制、修改、合并、发布、分发、复制或创建本软件的全部或部分衍生作品。
56
+
57
+ 您不得利用本软件从事任何危害国家安全和国家统一、危害社会公共利益、侵犯人身权益的行为。
58
+
59
+ 4.免责声明
60
+
61
+ 本软件“按原样”提供,不提供任何明示或暗示的保证,包括但不限于对适销性、特定用途的适用性和非侵权性的保证。 在任何情况下,作者或版权持有人均不对任何索赔、损害或其他责任负责,无论是在合同诉讼、侵权行为还是其他方面,由软件或软件的使用或其他交易引起、由软件引起或与之相关 软件。
62
+
63
+ 5. 责任限制
64
+
65
+ 除适用法律禁止的范围外,在任何情况下且根据任何法律理论,无论是基于侵权行为、疏忽、合同、责任或其他原因,任何许可方均不对您承担任何直接、间接、特殊、偶然、示范性、 或间接损害,或任何其他商业损失,即使许可人已被告知此类损害的可能性。
66
+
67
+ 6.争议解决
68
+
69
+ 本许可受中华人民共和国法律管辖并按其解释。 因本许可引起的或与本许可有关的任何争议应提交北京市海淀区人民法院。
70
+
71
+ 请注意,许可证可能会更新到更全面的版本。 有关许可和版权的任何问题,请���过 license@zhipuai.cn 与我们联系。
72
+
73
+ 7. Llama3 和 EVA-CLIP2 许可
74
+
75
+ 针对基于以 LLama3 系列模型作为基座模型的 CogVLM2 开源模型, Llama3 许可条件 (https://llama.meta.com/llama3/license/ ,本仓库副本一份许可条件) 和 EVA-CLIP2 许可条件 (MIT, https://github.com/baaivision/EVA/blob/master/LICENSE) 适用于模型权重。
LLAMA3_LICENSE ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ META LLAMA 3 COMMUNITY LICENSE AGREEMENT
2
+ Meta Llama 3 Version Release Date: April 18, 2024
3
+
4
+ “Agreement” means the terms and conditions for use, reproduction, distribution and modification of the
5
+ Llama Materials set forth herein.
6
+
7
+ “Documentation” means the specifications, manuals and documentation accompanying Meta Llama 3
8
+ distributed by Meta at https://llama.meta.com/get-started/.
9
+
10
+ “Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into
11
+ this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or
12
+ regulations to provide legal consent and that has legal authority to bind your employer or such other
13
+ person or entity if you are entering in this Agreement on their behalf.
14
+
15
+ “Meta Llama 3” means the foundational large language models and software and algorithms, including
16
+ machine-learning model code, trained model weights, inference-enabling code, training-enabling code,
17
+ fine-tuning enabling code and other elements of the foregoing distributed by Meta at
18
+ https://llama.meta.com/llama-downloads.
19
+
20
+ “Llama Materials” means, collectively, Meta’s proprietary Meta Llama 3 and Documentation (and any
21
+ portion thereof) made available under this Agreement.
22
+
23
+ “Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your
24
+ principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located
25
+ outside of the EEA or Switzerland).
26
+
27
+ By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials,
28
+ you agree to be bound by this Agreement.
29
+
30
+ 1. License Rights and Redistribution.
31
+
32
+ a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free
33
+ limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama
34
+ Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the
35
+ Llama Materials.
36
+
37
+ b. Redistribution and Use.
38
+
39
+ i. If you distribute or make available the Llama Materials (or any derivative works
40
+ thereof), or a product or service that uses any of them, including another AI model, you shall (A) provide
41
+ a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with Meta
42
+ Llama 3” on a related website, user interface, blogpost, about page, or product documentation. If you
43
+ use the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is
44
+ distributed or made available, you shall also include “Llama 3” at the beginning of any such AI model
45
+ name.
46
+
47
+ ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part
48
+ of an integrated end user product, then Section 2 of this Agreement will not apply to you.
49
+
50
+ iii. You must retain in all copies of the Llama Materials that you distribute the following
51
+ attribution notice within a “Notice” text file distributed as a part of such copies: “Meta Llama 3 is
52
+ licensed under the Meta Llama 3 Community License, Copyright © Meta Platforms, Inc. All Rights
53
+ Reserved.”
54
+
55
+ iv. Your use of the Llama Materials must comply with applicable laws and regulations
56
+ (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama
57
+ Materials (available at https://llama.meta.com/llama3/use-policy), which is hereby incorporated by
58
+ reference into this Agreement.
59
+
60
+ v. You will not use the Llama Materials or any output or results of the Llama Materials to
61
+ improve any other large language model (excluding Meta Llama 3 or derivative works thereof).
62
+
63
+ 2. Additional Commercial Terms. If, on the Meta Llama 3 version release date, the monthly active users
64
+ of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700
65
+ million monthly active users in the preceding calendar month, you must request a license from Meta,
66
+ which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the
67
+ rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
68
+
69
+ 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY
70
+ OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF
71
+ ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED,
72
+ INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT,
73
+ MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR
74
+ DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND
75
+ ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND
76
+ RESULTS.
77
+
78
+ 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF
79
+ LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING
80
+ OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL,
81
+ INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED
82
+ OF THE POSSIBILITY OF ANY OF THE FOREGOING.
83
+
84
+ 5. Intellectual Property.
85
+
86
+ a. No trademark licenses are granted under this Agreement, and in connection with the Llama
87
+ Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other
88
+ or any of its affiliates, except as required for reasonable and customary use in describing and
89
+ redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to
90
+ use “Llama 3” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. You will
91
+ comply with Meta’s brand guidelines (currently accessible at
92
+ https://about.meta.com/brand/resources/meta/company-brand/ ). All goodwill arising out of your use
93
+ of the Mark will inure to the benefit of Meta.
94
+
95
+ b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with
96
+ respect to any derivative works and modifications of the Llama Materials that are made by you, as
97
+ between you and Meta, you are and will be the owner of such derivative works and modifications.
98
+
99
+ c. If you institute litigation or other proceedings against Meta or any entity (including a
100
+ cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Meta Llama 3 outputs or
101
+ results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other
102
+ rights owned or licensable by you, then any licenses granted to you under this Agreement shall
103
+ terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold
104
+ harmless Meta from and against any claim by any third party arising out of or related to your use or
105
+ distribution of the Llama Materials.
106
+
107
+ 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this
108
+ Agreement or access to the Llama Materials and will continue in full force and effect until terminated in
109
+ accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in
110
+ breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete
111
+ and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this
112
+ Agreement.
113
+
114
+ 7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of
115
+ the State of California without regard to choice of law principles, and the UN Convention on Contracts
116
+ for the International Sale of Goods does not apply to this Agreement. The courts of California shall have
117
+ exclusive jurisdiction of any dispute arising out of this Agreement.
config.json ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "CogVLMVideoForCausalLM"
4
+ ],
5
+ "auto_map": {
6
+ "AutoConfig": "configuration_cogvlm.CogVLMConfig",
7
+ "AutoModelForCausalLM": "modeling_cogvlm.CogVLMVideoForCausalLM"
8
+ },
9
+ "bos_token_id": 128000,
10
+ "eos_token_id": 128001,
11
+ "pad_token_id": 128002,
12
+ "hidden_act": "silu",
13
+ "hidden_size": 4096,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 14336,
16
+ "max_position_embeddings": 2048,
17
+ "num_attention_heads": 32,
18
+ "num_hidden_layers": 32,
19
+ "num_multi_query_heads": 8,
20
+ "rms_norm_eps": 1e-05,
21
+ "template_version": "base",
22
+ "tie_word_embeddings": false,
23
+ "torch_dtype": "bfloat16",
24
+ "transformers_version": "4.41.0",
25
+ "use_cache": true,
26
+ "vision_config": {
27
+ "dropout_prob": 0.0,
28
+ "hidden_act": "gelu",
29
+ "hidden_size": 1792,
30
+ "image_size": 224,
31
+ "in_channels": 3,
32
+ "intermediate_size": 15360,
33
+ "layer_norm_eps": 1e-06,
34
+ "num_heads": 16,
35
+ "num_hidden_layers": 63,
36
+ "num_positions": 257,
37
+ "patch_size": 14
38
+ },
39
+ "vocab_size": 128256
40
+ }
configuration.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"framework":"Pytorch","task":"video-question-answering"}
configuration_cogvlm.py ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Literal
2
+ from transformers import PretrainedConfig
3
+
4
+
5
+ class CogVLMConfig(PretrainedConfig):
6
+ _auto_class = "AutoConfig"
7
+
8
+ def __init__(
9
+ self,
10
+ vocab_size=32000,
11
+ hidden_size=4096,
12
+ intermediate_size=11008,
13
+ num_hidden_layers=32,
14
+ num_attention_heads=32,
15
+ num_multi_query_heads=8,
16
+ hidden_act='silu',
17
+ max_position_embeddings=2048,
18
+ initializer_range=0.02,
19
+ rms_norm_eps=1e-06,
20
+ template_version: Literal["base", "chat"] = "chat",
21
+ pad_token_id=128002,
22
+ bos_token_id=128001,
23
+ eos_token_id=128002,
24
+ tie_word_embeddings=False,
25
+ use_cache=True,
26
+ **kwargs,
27
+ ):
28
+ self.hidden_size = hidden_size
29
+ self.intermediate_size = intermediate_size
30
+ self.num_attention_heads = num_attention_heads
31
+ self.num_multi_query_heads = num_multi_query_heads
32
+ self.max_position_embeddings = max_position_embeddings
33
+ self.rms_norm_eps = rms_norm_eps
34
+ self.initializer_range = initializer_range
35
+ self.vocab_size = vocab_size
36
+ self.num_hidden_layers = num_hidden_layers
37
+ self.hidden_act = hidden_act
38
+ self.template_version = template_version
39
+ self.use_cache = use_cache
40
+ super().__init__(
41
+ pad_token_id=pad_token_id,
42
+ bos_token_id=bos_token_id,
43
+ eos_token_id=eos_token_id,
44
+ tie_word_embeddings=tie_word_embeddings,
45
+ **kwargs,
46
+ )
generation_config.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 128000,
3
+ "eos_token_id": 128001,
4
+ "pad_token_id": 128002,
5
+ "do_sample": true,
6
+ "temperature": 0.1,
7
+ "max_length": 2048,
8
+ "top_p": 0.1,
9
+ "top_k": 1,
10
+ "transformers_version": "4.41.0"
11
+ }
model-00001-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f53e34de951ee7205398dc1e03a8a0b797ddc4b7a6714181adb51b2a906a94c5
3
+ size 4976699712
model-00002-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:555733b658ef1733b5afa979a6c999d2f3310ac8e2f851eb6c53b9b73a10c4a9
3
+ size 4999803504
model-00003-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d90e87d1db7f29deb7a59817476da9bba04339cd3c7029d3fc4ca97c6aa4b776
3
+ size 4915917160
model-00004-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cf5581fa2fb9be9bf89ded6c82607fec36c809e200b00ce10893e80501036a25
3
+ size 4956242104
model-00005-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:40c0ca93426ca37f04b66d9e1e6cb9687e0bcbaa02b4827f282b564577b2b6da
3
+ size 4115863248
model-00006-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:73b20e2d92146829390d468bb37bb9bacf23df3e22f6000754cd94c1dfc83f23
3
+ size 1050673280
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
modeling_cogvlm.py ADDED
@@ -0,0 +1,898 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """largely copy from llama and adapt for cogvlm"""
2
+ import warnings
3
+ from typing import TYPE_CHECKING, Optional, Tuple, List, Union, Literal, Dict, Any
4
+
5
+ import math
6
+ import torch
7
+ from torch import nn
8
+ from torch.nn import CrossEntropyLoss
9
+ from torchvision import transforms
10
+ from einops import rearrange
11
+
12
+ from decord import VideoReader, cpu
13
+ import decord
14
+ import io
15
+ import numpy as np
16
+
17
+ from transformers import PreTrainedModel, PreTrainedTokenizer
18
+ from transformers.utils.logging import get_logger
19
+ from transformers.activations import ACT2FN
20
+ from transformers.modeling_outputs import BaseModelOutputWithPast, CausalLMOutputWithPast
21
+ from torchvision.transforms.functional import InterpolationMode
22
+ from torchvision.transforms import Lambda
23
+ from torchvision.transforms._transforms_video import NormalizeVideo, RandomCropVideo, RandomHorizontalFlipVideo, CenterCropVideo
24
+ from pytorchvideo.transforms import ApplyTransformToKey, ShortSideScale
25
+ from .configuration_cogvlm import CogVLMConfig
26
+ from .util import FastRotaryEmbedding
27
+ from .visual import EVA2CLIPModel
28
+
29
+
30
+
31
+ if TYPE_CHECKING:
32
+ from transformers.utils import ModelOutput
33
+
34
+ logger = get_logger(__name__)
35
+
36
+ LANGUAGE_TOKEN_TYPE = 0
37
+ VISION_TOKEN_TYPE = 1
38
+
39
+
40
+ # Copied from transformers.models.bart.modeling_bart._make_causal_mask
41
+ def _make_causal_mask(
42
+ input_ids_shape: torch.Size, dtype: torch.dtype, device: torch.device, past_key_values_length: int = 0
43
+ ):
44
+ """
45
+ Make causal mask used for bi-directional self-attention.
46
+ """
47
+ bsz, tgt_len = input_ids_shape
48
+ mask = torch.full((tgt_len, tgt_len), torch.finfo(dtype).min, device=device)
49
+ mask_cond = torch.arange(mask.size(-1), device=device)
50
+ mask.masked_fill_(mask_cond < (mask_cond + 1).view(mask.size(-1), 1), 0)
51
+ mask = mask.to(dtype)
52
+
53
+ if past_key_values_length > 0:
54
+ mask = torch.cat([torch.zeros(tgt_len, past_key_values_length, dtype=dtype, device=device), mask], dim=-1)
55
+ return mask[None, None, :, :].expand(bsz, 1, tgt_len, tgt_len + past_key_values_length)
56
+
57
+
58
+ # Copied from transformers.models.bart.modeling_bart._expand_mask
59
+ def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Optional[int] = None):
60
+ """
61
+ Expands attention_mask from `[bsz, seq_len]` to `[bsz, 1, tgt_seq_len, src_seq_len]`.
62
+ """
63
+ bsz, src_len = mask.size()
64
+ tgt_len = tgt_len if tgt_len is not None else src_len
65
+
66
+ expanded_mask = mask[:, None, None, :].expand(bsz, 1, tgt_len, src_len).to(dtype)
67
+
68
+ inverted_mask = 1.0 - expanded_mask
69
+
70
+ return inverted_mask.masked_fill(inverted_mask.to(torch.bool), torch.finfo(dtype).min)
71
+
72
+
73
+ class RMSNorm(nn.Module):
74
+ def __init__(self, hidden_size, eps=1e-5):
75
+ super().__init__()
76
+ self.weight = nn.Parameter(torch.ones(hidden_size))
77
+ self.variance_epsilon = eps
78
+
79
+ def forward(self, hidden_states):
80
+ input_dtype = hidden_states.dtype
81
+ hidden_states = hidden_states.to(torch.float32)
82
+ variance = hidden_states.pow(2).mean(-1, keepdim=True)
83
+ hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)
84
+ return (self.weight * hidden_states).to(input_dtype)
85
+
86
+
87
+ class MLP(nn.Module):
88
+ def __init__(self, config):
89
+ super().__init__()
90
+ self.hidden_size = config.hidden_size
91
+ self.intermediate_size = config.intermediate_size
92
+ self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
93
+ self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
94
+ self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
95
+ self.act_fn = ACT2FN[config.hidden_act]
96
+
97
+ def forward(self, x):
98
+ down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
99
+ return down_proj
100
+
101
+
102
+ def get_expert_mask(token_type_ids: "torch.LongTensor(B, L)") -> "[torch.BoolTensor(B, L), torch.BoolTensor(B, L)]":
103
+ vision_token_mask = torch.zeros_like(token_type_ids, dtype=torch.bool)
104
+ vision_token_mask[:, :-1] = (token_type_ids[:, :-1] == VISION_TOKEN_TYPE) & (token_type_ids[:, 1:] == VISION_TOKEN_TYPE)
105
+ language_token_mask = ~vision_token_mask
106
+ return vision_token_mask, language_token_mask
107
+
108
+
109
+ class VisionExpertMLP(nn.Module):
110
+ def __init__(self, config):
111
+ super().__init__()
112
+ self.language_mlp = MLP(config)
113
+ # self.vision_mlp = MLP(config)
114
+
115
+ def forward(self, hidden_states: "torch.Tensor(B, L, D)", token_type_ids: "torch.LongTensor(B, L)"):
116
+ # output = torch.empty(hidden_states.shape, dtype=hidden_states.dtype, device=hidden_states.device)
117
+ # vision_token_mask, language_token_mask = get_expert_mask(token_type_ids)
118
+ # output[vision_token_mask] = self.vision_mlp(hidden_states[vision_token_mask])
119
+ # output[language_token_mask] = self.language_mlp(hidden_states[language_token_mask])
120
+
121
+ output = self.language_mlp(hidden_states)
122
+ return output
123
+
124
+
125
+ def attention_fn(
126
+ query_layer: "torch.tensor(B, H, L, HD)",
127
+ key_layer: "torch.tensor(B, H, L, HD)",
128
+ value_layer: "torch.tensor(B, H, L, HD)",
129
+ attention_mask: "torch.tensor(B, H, L, HD)",
130
+ *,
131
+ scaling_attention_score: bool = True,
132
+ attention_dropout: nn.Module = None
133
+ ):
134
+ attention_mask_bool = (attention_mask == 0)
135
+ is_low_triangle = (attention_mask_bool == torch.ones_like(attention_mask_bool, dtype=torch.float).tril()).all()
136
+ is_full = (attention_mask_bool > 0).all()
137
+ if not (int(torch.__version__.split('.')[0]) >= 2):
138
+ warnings.warn("It's recommended to use torch2.0 or higher.")
139
+ if int(torch.__version__.split('.')[0]) >= 2 and scaling_attention_score and (is_full or is_low_triangle):
140
+ dropout_p = 0. if attention_dropout is None or not attention_dropout.training else attention_dropout.p
141
+ return torch.nn.functional.scaled_dot_product_attention(
142
+ query_layer, key_layer, value_layer,
143
+ attn_mask=None,
144
+ dropout_p=dropout_p,
145
+ is_causal=not is_full
146
+ )
147
+ else:
148
+ if scaling_attention_score:
149
+ query_layer = query_layer / math.sqrt(query_layer.shape[-1])
150
+ attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
151
+ attention_scores = attention_scores + attention_mask
152
+ attention_scores = nn.functional.softmax(attention_scores, dim=-1, dtype=torch.float32).to(query_layer.dtype)
153
+ if attention_dropout is not None:
154
+ attention_scores = attention_dropout(attention_scores)
155
+ context_layer = torch.matmul(attention_scores, value_layer)
156
+ return context_layer
157
+
158
+
159
+ class VisionExpertAttention(nn.Module):
160
+ def __init__(self, config):
161
+ super().__init__()
162
+ self.config = config
163
+ self.hidden_size = config.hidden_size
164
+ self.num_attention_heads = config.num_attention_heads
165
+ self.num_multi_query_heads = config.num_multi_query_heads
166
+ self.hidden_size_per_attention_head = self.hidden_size // self.num_attention_heads
167
+ self.stride = [self.num_attention_heads, self.num_multi_query_heads, self.num_multi_query_heads]
168
+ self.qkv_size = self.hidden_size + self.hidden_size_per_attention_head * self.num_multi_query_heads * 2
169
+ self.head_dim = self.hidden_size // self.num_attention_heads
170
+ self.max_position_embeddings = config.max_position_embeddings
171
+ self.rotary_emb = FastRotaryEmbedding(dim=self.head_dim, pos_idx_in_fp32=False, base=500000)
172
+ # self.vision_expert_query_key_value = nn.Linear(self.hidden_size, self.qkv_size, bias=True)
173
+ # self.vision_expert_dense = nn.Linear(self.hidden_size, self.hidden_size, bias=False)
174
+ self.language_expert_query_key_value = nn.Linear(self.hidden_size, self.qkv_size, bias=False)
175
+ self.language_expert_dense = nn.Linear(self.hidden_size, self.hidden_size, bias=False)
176
+
177
+ def _transpose_for_scores(self, tensor):
178
+ """Transpose a 3D tensor [B, L, H*HD] into a 4D tensor with size [B H L HD]."""
179
+ new_tensor_shape = tensor.size()[:-1] + \
180
+ (-1, # flexible for multi-query
181
+ self.hidden_size_per_attention_head)
182
+ tensor = tensor.view(*new_tensor_shape)
183
+ return tensor.permute(0, 2, 1, 3)
184
+
185
+ def forward(
186
+ self,
187
+ hidden_states: torch.Tensor,
188
+ token_type_ids: torch.LongTensor,
189
+ position_ids: torch.LongTensor,
190
+ attention_mask: Optional[torch.Tensor] = None,
191
+ past_key_value: Optional[Tuple[torch.Tensor]] = None,
192
+ output_attentions: bool = False,
193
+ use_cache: bool = False,
194
+ ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
195
+ bsz, q_len, _ = hidden_states.size()
196
+ # vision_token_mask, language_token_mask = get_expert_mask(token_type_ids)
197
+
198
+ shape = list(hidden_states.shape)
199
+ shape[-1] = self.qkv_size
200
+ # mixed_raw_layer = torch.empty(shape, dtype=hidden_states.dtype, device=hidden_states.device)
201
+ # mixed_raw_layer[vision_token_mask] = self.vision_expert_query_key_value(hidden_states[vision_token_mask])
202
+ # mixed_raw_layer[language_token_mask] = self.language_expert_query_key_value(hidden_states[language_token_mask])
203
+ mixed_raw_layer = self.language_expert_query_key_value(hidden_states)
204
+
205
+ # query_states, key_states, value_states = torch.split(mixed_raw_layer, self.hidden_size, dim=-1)
206
+ factor = mixed_raw_layer.size()[-1] // sum(self.stride)
207
+ query_states, key_states, value_states = torch.split(mixed_raw_layer, [factor * x for x in self.stride], dim=-1)
208
+
209
+ query_states = self._transpose_for_scores(query_states) # B, H, L, HD
210
+ key_states = self._transpose_for_scores(key_states) # B, H, L, HD
211
+ value_states = self._transpose_for_scores(value_states) # B, H, L, HD
212
+
213
+ kv_seq_len = key_states.shape[-2]
214
+ if past_key_value is not None:
215
+ kv_seq_len += past_key_value[0].shape[-2]
216
+
217
+ query_states, key_states = self.rotary_emb(query_states, key_states, position_ids=position_ids, max_seqlen=position_ids.max() + 1)
218
+
219
+ if past_key_value is not None:
220
+ key_states = torch.cat([past_key_value[0], key_states], dim=2)
221
+ value_states = torch.cat([past_key_value[1], value_states], dim=2)
222
+
223
+ past_key_value = (key_states, value_states) if use_cache else None
224
+
225
+ key_states = key_states.unsqueeze(2).expand(-1, -1, self.num_attention_heads // self.num_multi_query_heads, -1, -1).contiguous().view(
226
+ bsz, self.num_attention_heads, *key_states.shape[2:])
227
+ value_states = value_states.unsqueeze(2).expand(-1, -1, self.num_attention_heads // self.num_multi_query_heads, -1,
228
+ -1).contiguous().view(bsz, self.num_attention_heads, *value_states.shape[2:])
229
+
230
+ context_layer = attention_fn(
231
+ query_layer=query_states, key_layer=key_states, value_layer=value_states, attention_mask=attention_mask,
232
+ scaling_attention_score=True, attention_dropout=None)
233
+ if context_layer.size() != (bsz, self.num_attention_heads, q_len, self.head_dim):
234
+ raise ValueError(
235
+ f"`attn_output` should be of size {(bsz, self.num_attention_heads, q_len, self.head_dim)}, but is"
236
+ f" {context_layer.size()}"
237
+ )
238
+ context_layer = context_layer.transpose(1, 2).contiguous().reshape(bsz, q_len, self.hidden_size)
239
+
240
+ # attn_output = torch.empty(context_layer.shape, dtype=hidden_states.dtype, device=hidden_states.device)
241
+ # attn_output[vision_token_mask] = self.vision_expert_dense(context_layer[vision_token_mask])
242
+ # attn_output[language_token_mask] = self.language_expert_dense(context_layer[language_token_mask])
243
+
244
+ attn_output = self.language_expert_dense(context_layer)
245
+
246
+ if output_attentions:
247
+ warnings.warn("output_attentions is not implemented.")
248
+
249
+ return attn_output, None, past_key_value
250
+
251
+
252
+ class CogVLMDecoderLayer(nn.Module):
253
+ def __init__(self, config):
254
+ super().__init__()
255
+ self.hidden_size = config.hidden_size
256
+ self.self_attn = VisionExpertAttention(config=config)
257
+ self.mlp = VisionExpertMLP(config)
258
+ self.input_layernorm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
259
+ self.post_attention_layernorm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
260
+
261
+ def forward(
262
+ self,
263
+ hidden_states: torch.Tensor,
264
+ token_type_ids: torch.LongTensor,
265
+ position_ids: torch.LongTensor,
266
+ attention_mask: Optional[torch.Tensor] = None,
267
+ past_key_value: Optional[Tuple[torch.Tensor]] = None,
268
+ output_attentions: Optional[bool] = False,
269
+ use_cache: Optional[bool] = False,
270
+ ) -> Tuple[torch.FloatTensor, Optional[Tuple[torch.FloatTensor, torch.FloatTensor]]]:
271
+ residual = hidden_states
272
+
273
+ hidden_states = self.input_layernorm(hidden_states)
274
+
275
+ # Self Attention
276
+ hidden_states, self_attn_weights, present_key_value = self.self_attn(
277
+ hidden_states=hidden_states,
278
+ token_type_ids=token_type_ids,
279
+ position_ids=position_ids,
280
+ attention_mask=attention_mask,
281
+ past_key_value=past_key_value,
282
+ output_attentions=output_attentions,
283
+ use_cache=use_cache,
284
+ )
285
+ hidden_states = residual + hidden_states
286
+
287
+ # Fully Connected
288
+ residual = hidden_states
289
+ hidden_states = self.post_attention_layernorm(hidden_states)
290
+ hidden_states = self.mlp(hidden_states, token_type_ids=token_type_ids)
291
+ hidden_states = residual + hidden_states
292
+
293
+ outputs = (hidden_states,)
294
+
295
+ if output_attentions:
296
+ outputs += (self_attn_weights,)
297
+
298
+ if use_cache:
299
+ outputs += (present_key_value,)
300
+
301
+ return outputs # type: ignore
302
+
303
+
304
+ class CogVLMPreTrainedModel(PreTrainedModel):
305
+ config_class = CogVLMConfig
306
+ base_model_prefix = "model"
307
+ supports_gradient_checkpointing = False
308
+ _no_split_modules = ["CogVLMDecoderLayer"]
309
+ _skip_keys_device_placement = "past_key_values"
310
+
311
+ def _init_weights(self, module):
312
+ std = self.config.initializer_range
313
+ if isinstance(module, nn.Linear):
314
+ module.weight.data.normal_(mean=0.0, std=std)
315
+ if module.bias is not None:
316
+ module.bias.data.zero_()
317
+ elif isinstance(module, nn.Embedding):
318
+ module.weight.data.normal_(mean=0.0, std=std)
319
+ if module.padding_idx is not None:
320
+ module.weight.data[module.padding_idx].zero_()
321
+
322
+
323
+ def is_empty(images_list: Optional[List[List[torch.Tensor]]]):
324
+ if images_list is None or len(images_list) == 0:
325
+ return True
326
+ for image_list in images_list:
327
+ if len(image_list):
328
+ return False
329
+ return True
330
+
331
+
332
+ def build_position_ids(x: "torch.BoolTensor(B, L)", attention_mask: Optional["torch.BoolTensor(B, L)"] = None) -> "torch.LongTensor(B, L)":
333
+ if attention_mask is not None:
334
+ tmp = x.clone()
335
+ tmp[~(attention_mask.bool())] = -1
336
+ else:
337
+ tmp = x.clone()
338
+ # image boi eoi token as LANGUAGE_TOKEN_TYPE
339
+ is_boi_eoi = torch.zeros_like(x, dtype=torch.bool)
340
+ is_boi_eoi[:, 1:] |= (tmp[:, 1:] == VISION_TOKEN_TYPE) & (tmp[:, :-1] == LANGUAGE_TOKEN_TYPE)
341
+ is_boi_eoi[:, 0] |= (tmp[:, 0] == VISION_TOKEN_TYPE)
342
+ is_boi_eoi[:, :-1] |= (tmp[:, :-1] == VISION_TOKEN_TYPE) & (tmp[:, 1:] == LANGUAGE_TOKEN_TYPE)
343
+ is_boi_eoi[:, -1] |= (tmp[:, -1] == VISION_TOKEN_TYPE)
344
+ tmp[is_boi_eoi] = LANGUAGE_TOKEN_TYPE
345
+ # final position ids
346
+ y = torch.zeros_like(x, dtype=torch.long)
347
+ y[:, 1:] = (tmp[:, 1:] == LANGUAGE_TOKEN_TYPE) | ((tmp[:, 1:] == VISION_TOKEN_TYPE) & (tmp[:, :-1] == LANGUAGE_TOKEN_TYPE))
348
+ y = y.cumsum(dim=-1)
349
+ return y
350
+
351
+
352
+ class CogVLMVideoModel(CogVLMPreTrainedModel):
353
+ def __init__(self, config):
354
+ super().__init__(config)
355
+ self.padding_idx = 128002
356
+ self.vocab_size = config.vocab_size
357
+ self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
358
+ self.layers = nn.ModuleList([CogVLMDecoderLayer(config) for _ in range(config.num_hidden_layers)])
359
+ self.norm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
360
+
361
+ self.vision = EVA2CLIPModel(config)
362
+
363
+ self.gradient_checkpointing = False
364
+ # Initialize weights and apply final processing
365
+ self.post_init()
366
+
367
+ def encode_images(self, images: List[List[torch.Tensor]], ) -> torch.Tensor:
368
+ images_list, images = images, []
369
+
370
+ images = []
371
+ for image_list in images_list:
372
+ for image in image_list:
373
+ images.append(image)
374
+
375
+ # images = torch.stack(images) # video images is already stacked
376
+ images_features = self.vision(images[0])
377
+ return images_features
378
+
379
+ def forward(
380
+ self,
381
+ input_ids: torch.LongTensor = None,
382
+ images: List[List[torch.Tensor]] = None,
383
+ token_type_ids: Optional[torch.LongTensor] = None,
384
+ attention_mask: Optional[torch.Tensor] = None,
385
+ position_ids: Optional[torch.LongTensor] = None,
386
+ past_key_values: Optional[List[torch.FloatTensor]] = None,
387
+ inputs_embeds: Optional[torch.FloatTensor] = None,
388
+ use_cache: Optional[bool] = None,
389
+ output_attentions: Optional[bool] = None,
390
+ output_hidden_states: Optional[bool] = None,
391
+ return_dict: Optional[bool] = None,
392
+ ) -> Union[Tuple, BaseModelOutputWithPast]:
393
+ """take care of image_encode, token_type_ids, position_ids and (attention_mask = None is fine)"""
394
+
395
+ if past_key_values is not None:
396
+ pass # generate mode with past_key_values. the image features are already mapped
397
+ else:
398
+ # not allow for inputs_embeds, because we want to process image feature
399
+ assert input_ids is not None and inputs_embeds is None, f"{input_ids} {inputs_embeds}"
400
+ if not is_empty(images): # multi-modality
401
+ assert token_type_ids is not None, f"multi-modality requires `token_type_ids`!"
402
+ assert len(input_ids) == len(images), f"{len(input_ids)} {len(images)}"
403
+ inputs_embeds = self.embed_tokens(input_ids)
404
+ images_features = self.encode_images(images)
405
+ images_features = rearrange(images_features, 'b n d -> (b n) d')
406
+ images_features = images_features.to(dtype=inputs_embeds.dtype, device=inputs_embeds.device)
407
+
408
+ inputs_embeds = inputs_embeds.index_put([token_type_ids == VISION_TOKEN_TYPE], images_features)
409
+ else: # single-modality
410
+ if token_type_ids is None:
411
+ token_type_ids = torch.ones_like(input_ids, dtype=torch.long, device=input_ids.device) * LANGUAGE_TOKEN_TYPE
412
+ assert not (token_type_ids == VISION_TOKEN_TYPE).any(), f"{(token_type_ids == VISION_TOKEN_TYPE).sum()}"
413
+ inputs_embeds = self.embed_tokens(input_ids)
414
+
415
+ if position_ids is None:
416
+ position_ids = build_position_ids(token_type_ids, attention_mask)
417
+ input_ids = None
418
+ return self.llm_forward(
419
+ input_ids=input_ids,
420
+ token_type_ids=token_type_ids,
421
+ attention_mask=attention_mask,
422
+ position_ids=position_ids,
423
+ past_key_values=past_key_values,
424
+ inputs_embeds=inputs_embeds,
425
+ use_cache=use_cache,
426
+ output_attentions=output_attentions,
427
+ output_hidden_states=output_hidden_states,
428
+ return_dict=return_dict,
429
+ )
430
+
431
+ def llm_forward(
432
+ self,
433
+ input_ids: torch.LongTensor = None,
434
+ token_type_ids: torch.LongTensor = None,
435
+ attention_mask: Optional[torch.Tensor] = None,
436
+ position_ids: Optional[torch.LongTensor] = None,
437
+ past_key_values: Optional[List[torch.FloatTensor]] = None,
438
+ inputs_embeds: Optional[torch.FloatTensor] = None,
439
+ use_cache: Optional[bool] = None,
440
+ output_attentions: Optional[bool] = None,
441
+ output_hidden_states: Optional[bool] = None,
442
+ return_dict: Optional[bool] = None,
443
+ ) -> Union[Tuple, BaseModelOutputWithPast]:
444
+ """largely copy from llama forward and adapt for cogvlm with `token_type_ids`"""
445
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
446
+ output_hidden_states = (
447
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
448
+ )
449
+ use_cache = use_cache if use_cache is not None else self.config.use_cache
450
+
451
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
452
+
453
+ # retrieve input_ids and inputs_embeds
454
+ if input_ids is not None and inputs_embeds is not None:
455
+ raise ValueError("You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time")
456
+ elif input_ids is not None:
457
+ batch_size, seq_length = input_ids.shape
458
+ elif inputs_embeds is not None:
459
+ batch_size, seq_length, _ = inputs_embeds.shape
460
+ else:
461
+ raise ValueError("You have to specify either decoder_input_ids or decoder_inputs_embeds")
462
+
463
+ seq_length_with_past = seq_length
464
+ past_key_values_length = 0
465
+
466
+ if past_key_values is not None:
467
+ past_key_values_length = past_key_values[0][0].shape[2]
468
+ seq_length_with_past = seq_length_with_past + past_key_values_length
469
+
470
+ if position_ids is None:
471
+ device = input_ids.device if input_ids is not None else inputs_embeds.device
472
+ position_ids = torch.arange(
473
+ past_key_values_length, seq_length + past_key_values_length, dtype=torch.long, device=device
474
+ )
475
+ position_ids = position_ids.unsqueeze(0).view(-1, seq_length)
476
+ else:
477
+ position_ids = position_ids.view(-1, seq_length).long()
478
+
479
+ if inputs_embeds is None:
480
+ inputs_embeds = self.embed_tokens(input_ids)
481
+ # embed positions
482
+ if attention_mask is None:
483
+ attention_mask = torch.ones(
484
+ (batch_size, seq_length_with_past), dtype=torch.bool, device=inputs_embeds.device
485
+ )
486
+ attention_mask = self._prepare_decoder_attention_mask(
487
+ attention_mask, (batch_size, seq_length), inputs_embeds, past_key_values_length
488
+ )
489
+
490
+ hidden_states = inputs_embeds
491
+
492
+ # decoder layers
493
+ all_hidden_states = () if output_hidden_states else None
494
+ all_self_attns = () if output_attentions else None
495
+ next_decoder_cache = () if use_cache else None
496
+
497
+ for idx, decoder_layer in enumerate(self.layers):
498
+ if output_hidden_states:
499
+ all_hidden_states += (hidden_states,)
500
+
501
+ past_key_value = past_key_values[idx] if past_key_values is not None else None
502
+ layer_outputs = decoder_layer(
503
+ hidden_states,
504
+ token_type_ids=token_type_ids,
505
+ attention_mask=attention_mask,
506
+ position_ids=position_ids,
507
+ past_key_value=past_key_value,
508
+ output_attentions=output_attentions,
509
+ use_cache=use_cache,
510
+ )
511
+ hidden_states = layer_outputs[0]
512
+
513
+ if use_cache:
514
+ next_decoder_cache += (layer_outputs[2 if output_attentions else 1],)
515
+
516
+ if output_attentions:
517
+ all_self_attns += (layer_outputs[1],)
518
+
519
+ hidden_states = self.norm(hidden_states)
520
+
521
+ # add hidden states from the last decoder layer
522
+ if output_hidden_states:
523
+ all_hidden_states += (hidden_states,)
524
+
525
+ next_cache = next_decoder_cache if use_cache else None
526
+ if not return_dict:
527
+ return tuple(v for v in [hidden_states, next_cache, all_hidden_states, all_self_attns] if v is not None)
528
+ return BaseModelOutputWithPast(
529
+ last_hidden_state=hidden_states,
530
+ past_key_values=next_cache,
531
+ hidden_states=all_hidden_states,
532
+ attentions=all_self_attns,
533
+ )
534
+
535
+ def get_input_embeddings(self):
536
+ return self.embed_tokens
537
+
538
+ def set_input_embeddings(self, value):
539
+ self.embed_tokens = value
540
+
541
+ # noinspection PyMethodMayBeStatic
542
+ # Copied from transformers.models.bart.modeling_bart.BartDecoder._prepare_decoder_attention_mask
543
+ def _prepare_decoder_attention_mask(self, attention_mask, input_shape, inputs_embeds, past_key_values_length):
544
+ # create causal mask
545
+ # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
546
+ combined_attention_mask = None
547
+ if input_shape[-1] > 1:
548
+ combined_attention_mask = _make_causal_mask(
549
+ input_shape,
550
+ inputs_embeds.dtype,
551
+ device=inputs_embeds.device,
552
+ past_key_values_length=past_key_values_length,
553
+ )
554
+
555
+ if attention_mask is not None:
556
+ # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
557
+ expanded_attn_mask = _expand_mask(attention_mask, inputs_embeds.dtype, tgt_len=input_shape[-1]).to(
558
+ inputs_embeds.device
559
+ )
560
+ combined_attention_mask = (
561
+ expanded_attn_mask if combined_attention_mask is None else expanded_attn_mask + combined_attention_mask
562
+ )
563
+
564
+ return combined_attention_mask
565
+
566
+
567
+ def _history_to_prompt(signal_type, history, query):
568
+ if signal_type == 'base':
569
+ return query
570
+ elif signal_type == 'vqa':
571
+ answer_format = 'Short answer:'
572
+ elif signal_type == 'chat':
573
+ answer_format = 'Answer:'
574
+ else:
575
+ assert False, f"Unknown signal type {signal_type}"
576
+
577
+ prompt = ''
578
+ for i, (old_query, response) in enumerate(history):
579
+ prompt += 'Question: ' + old_query + " {} ".format(answer_format) + response + "\n"
580
+ prompt += 'Question: {} {}'.format(query, answer_format)
581
+ return prompt
582
+
583
+ def load_video(video_path):
584
+ mp4_stream = None
585
+ decord.bridge.set_bridge('torch')
586
+ with open(video_path, 'rb') as f:
587
+ mp4_stream = f.read()
588
+ clip_end_sec = 60 # clip video to <= 1 minute
589
+ clip_start_sec = 0
590
+ num_frames = 24
591
+ # decord.bridge.set_bridge('torch')
592
+ if mp4_stream is not None:
593
+ decord_vr = VideoReader(io.BytesIO(mp4_stream), ctx=cpu(0))
594
+ else:
595
+ decord_vr = VideoReader(video_path, ctx=cpu(0))
596
+ duration = len(decord_vr) # duration in terms of frames
597
+ start_frame = int(clip_start_sec * decord_vr.get_avg_fps())
598
+ end_frame = min(duration, int(clip_end_sec*decord_vr.get_avg_fps())) if \
599
+ clip_end_sec is not None else duration
600
+ frame_id_list = np.linspace(start_frame, end_frame-1, num_frames, dtype=int)
601
+ # frame_id_list = np.linspace(0, duration-1, num_frames, dtype=int)
602
+ video_data = decord_vr.get_batch(frame_id_list)
603
+ video_data = video_data.permute(3, 0, 1, 2) # (T, H, W, C) -> (C, T, H, W)
604
+ # video_outputs = transform(video_data)
605
+ return video_data
606
+
607
+ def load_video_1fps(video_path):
608
+ mp4_stream = None
609
+ decord.bridge.set_bridge('torch')
610
+ with open(video_path, 'rb') as f:
611
+ mp4_stream = f.read()
612
+
613
+ num_frames = 24
614
+ # decord.bridge.set_bridge('torch')
615
+ if mp4_stream is not None:
616
+ decord_vr = VideoReader(io.BytesIO(mp4_stream), ctx=cpu(0))
617
+ else:
618
+ decord_vr = VideoReader(video_path, ctx=cpu(0))
619
+
620
+ total_frames = len(decord_vr)
621
+ timestamps = decord_vr.get_frame_timestamp(np.arange(total_frames))
622
+ timestamps = [i[0] for i in timestamps]
623
+
624
+ max_second = round(max(timestamps)) + 1
625
+ frame_id_list = []
626
+ for second in range(max_second):
627
+ closest_num = min(timestamps, key=lambda x: abs(x - second))
628
+ index = timestamps.index(closest_num)
629
+ frame_id_list.append(index)
630
+ if len(frame_id_list) > num_frames:
631
+ break
632
+
633
+ video_data = decord_vr.get_batch(frame_id_list)
634
+ video_data = video_data.permute(3, 0, 1, 2) # (T, H, W, C) -> (C, T, H, W)
635
+ # video_outputs = transform(video_data)
636
+ return video_data
637
+
638
+
639
+
640
+ class CogVLMVideoForCausalLM(CogVLMPreTrainedModel):
641
+ _auto_class = "AutoModelForCausalLM"
642
+
643
+ def __init__(self, config):
644
+ super().__init__(config)
645
+ self.model = CogVLMVideoModel(config)
646
+ self.vocab_size = config.vocab_size
647
+ self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
648
+ self.video_downsample = 1 # TODO: change this to config
649
+
650
+ # Initialize weights and apply final processing
651
+ self.post_init()
652
+
653
+ def get_input_embeddings(self):
654
+ return self.model.embed_tokens
655
+
656
+ def set_input_embeddings(self, value):
657
+ self.model.embed_tokens = value
658
+
659
+ def get_output_embeddings(self):
660
+ return self.lm_head
661
+
662
+ def set_output_embeddings(self, new_embeddings):
663
+ self.lm_head = new_embeddings
664
+
665
+ def set_decoder(self, decoder):
666
+ self.model = decoder
667
+
668
+ def get_decoder(self):
669
+ return self.model
670
+
671
+ def forward(
672
+ self,
673
+ input_ids: torch.LongTensor = None,
674
+ images: List[List[torch.Tensor]] = None,
675
+ token_type_ids: Optional[torch.LongTensor] = None,
676
+ attention_mask: Optional[torch.Tensor] = None,
677
+ position_ids: Optional[torch.LongTensor] = None,
678
+ past_key_values: Optional[List[torch.FloatTensor]] = None,
679
+ inputs_embeds: Optional[torch.FloatTensor] = None,
680
+ use_cache: Optional[bool] = None,
681
+ output_attentions: Optional[bool] = None,
682
+ output_hidden_states: Optional[bool] = None,
683
+ return_dict: Optional[bool] = None,
684
+ labels: Optional[torch.LongTensor] = None,
685
+ ) -> Union[Tuple, CausalLMOutputWithPast]:
686
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
687
+ output_hidden_states = (
688
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
689
+ )
690
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
691
+
692
+ # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
693
+ outputs = self.model(
694
+ input_ids=input_ids,
695
+ images=images,
696
+ token_type_ids=token_type_ids,
697
+ attention_mask=attention_mask,
698
+ position_ids=position_ids,
699
+ past_key_values=past_key_values,
700
+ inputs_embeds=inputs_embeds,
701
+ use_cache=use_cache,
702
+ output_attentions=output_attentions,
703
+ output_hidden_states=output_hidden_states,
704
+ return_dict=return_dict,
705
+ )
706
+
707
+ hidden_states = outputs[0]
708
+ logits = self.lm_head(hidden_states)
709
+ logits = logits.float()
710
+
711
+ loss = None
712
+ if labels is not None:
713
+ # Shift so that tokens < n predict n
714
+ shift_logits = logits[..., :-1, :].contiguous()
715
+ shift_labels = labels[..., 1:].contiguous()
716
+ # Flatten the tokens
717
+ loss_fct = CrossEntropyLoss()
718
+ shift_logits = shift_logits.view(-1, self.config.vocab_size)
719
+ shift_labels = shift_labels.view(-1)
720
+ # Enable model parallelism
721
+ shift_labels = shift_labels.to(shift_logits.device)
722
+ loss = loss_fct(shift_logits, shift_labels)
723
+
724
+ if not return_dict:
725
+ output = (logits,) + outputs[1:]
726
+ return (loss,) + output if loss is not None else output
727
+
728
+ return CausalLMOutputWithPast(
729
+ loss=loss,
730
+ logits=logits,
731
+ past_key_values=outputs.past_key_values,
732
+ hidden_states=outputs.hidden_states,
733
+ attentions=outputs.attentions,
734
+ )
735
+
736
+ def _prepare_attention_mask_for_generation(
737
+ self,
738
+ inputs: torch.Tensor,
739
+ pad_token_id: Optional[int],
740
+ eos_token_id: Optional[Union[int, List[int]]],
741
+ ) -> torch.LongTensor:
742
+ return torch.ones(inputs.shape[:2], dtype=torch.long, device=inputs.device) # type: ignore
743
+
744
+ def prepare_inputs_for_generation(
745
+ self, input_ids, token_type_ids, images=None, past_key_values=None, attention_mask=None, inputs_embeds=None, **kwargs
746
+ ):
747
+ # build position_ids if needed
748
+ position_ids = kwargs.get("position_ids", None)
749
+ if position_ids is None:
750
+ position_ids = build_position_ids(token_type_ids, attention_mask)
751
+
752
+ if past_key_values:
753
+ input_ids = input_ids[:, -1:]
754
+ token_type_ids = token_type_ids[:, -1:]
755
+ position_ids = position_ids[:, -1:]
756
+
757
+ # if `inputs_embeds` are passed, we only want to use them in the 1st generation step
758
+ if inputs_embeds is not None and past_key_values is None:
759
+ model_inputs = {"inputs_embeds": inputs_embeds}
760
+ else:
761
+ model_inputs = {"input_ids": input_ids}
762
+
763
+ model_inputs.update(
764
+ {
765
+ "token_type_ids": token_type_ids,
766
+ "images": images,
767
+ "position_ids": position_ids,
768
+ "past_key_values": past_key_values,
769
+ "use_cache": kwargs.get("use_cache"),
770
+ "attention_mask": attention_mask,
771
+ }
772
+ )
773
+ return model_inputs
774
+
775
+ def _update_model_kwargs_for_generation(
776
+ self,
777
+ outputs: "ModelOutput",
778
+ model_kwargs: Dict[str, Any],
779
+ is_encoder_decoder: bool = False,
780
+ standardize_cache_format: bool = False,
781
+ ) -> Dict[str, Any]:
782
+ # update past_key_values
783
+ model_kwargs["past_key_values"] = self._extract_past_from_model_output(
784
+ outputs, standardize_cache_format=standardize_cache_format
785
+ )
786
+ if getattr(outputs, "state", None) is not None:
787
+ model_kwargs["state"] = outputs.state
788
+
789
+ # update token_type_ids with last value
790
+ if "token_type_ids" in model_kwargs:
791
+ token_type_ids = model_kwargs["token_type_ids"]
792
+ new_token_type_ids = torch.ones(size=(token_type_ids.shape[0], 1), dtype=token_type_ids.dtype, device=token_type_ids.device) * LANGUAGE_TOKEN_TYPE
793
+ model_kwargs["token_type_ids"] = torch.cat([token_type_ids, new_token_type_ids], dim=-1)
794
+
795
+ if not is_encoder_decoder:
796
+ # update attention mask
797
+ if "attention_mask" in model_kwargs:
798
+ attention_mask = model_kwargs["attention_mask"]
799
+ model_kwargs["attention_mask"] = torch.cat(
800
+ [attention_mask, attention_mask.new_ones((attention_mask.shape[0], 1))], dim=-1
801
+ )
802
+ else:
803
+ # update decoder attention mask
804
+ if "decoder_attention_mask" in model_kwargs:
805
+ decoder_attention_mask = model_kwargs["decoder_attention_mask"]
806
+ model_kwargs["decoder_attention_mask"] = torch.cat(
807
+ [decoder_attention_mask, decoder_attention_mask.new_ones((decoder_attention_mask.shape[0], 1))],
808
+ dim=-1,
809
+ )
810
+
811
+ return model_kwargs
812
+
813
+ def _reorder_cache(self, past_key_values, beam_idx):
814
+ reordered_past = ()
815
+ for layer_past in past_key_values:
816
+ reordered_past += (
817
+ tuple(past_state.index_select(0, beam_idx.to(past_state.device)) for past_state in layer_past),
818
+ )
819
+ return reordered_past
820
+
821
+
822
+
823
+ def build_conversation_input_ids(
824
+ self,
825
+ tokenizer: "PreTrainedTokenizer",
826
+ *,
827
+ query: str,
828
+ history: Optional[List[Tuple[str, str]]] = None,
829
+ images: Optional[List["PIL.Image"]] = None,
830
+ template_version: Optional[Literal["base", "chat", "vqa"]] = None,
831
+ answer: str = None,
832
+ ):
833
+ image_size: int = self.config.vision_config['image_size']
834
+ template_version = template_version or self.config.template_version
835
+ assert images is None or len(images) <= 1, f"not support multi images by now."
836
+ history = history or []
837
+ text = _history_to_prompt(template_version, history, query)
838
+ input_ids = [tokenizer.bos_token_id]
839
+ token_type_ids = [LANGUAGE_TOKEN_TYPE]
840
+ add_time_indices = False
841
+ if images is not None and len(images) == 1:
842
+ # vision
843
+ transform = transforms.Compose(
844
+ [
845
+ # UniformTemporalSubsample(num_frames),
846
+ Lambda(lambda x: x / 255.0),
847
+ NormalizeVideo(mean=(0.48145466, 0.4578275, 0.40821073), std=(0.26862954, 0.26130258, 0.27577711)),
848
+ ShortSideScale(size=image_size),
849
+ CenterCropVideo(image_size),
850
+ # RandomHorizontalFlipVideo(p=0.5),
851
+ ]
852
+ )
853
+ images = [transform(images[0]).transpose(0, 1)] # (T, C, H, W)
854
+ num_eois = len(images[0])
855
+ tokenizer.pad_token_id = 128002
856
+ vision_token_num = (64 + 2) * num_eois
857
+ if not add_time_indices:
858
+ input_ids += [tokenizer.pad_token_id] * vision_token_num # add spetial token
859
+ token_type_ids += [VISION_TOKEN_TYPE] * vision_token_num
860
+ else:
861
+ video_ids, video_type_ids = [], []
862
+ for _time_idx in range(num_eois):
863
+ video_ids += [tokenizer.pad_token_id] * vision_token_num
864
+ video_type_ids += [VISION_TOKEN_TYPE] * vision_token_num
865
+ # add time indices
866
+ time_indices = tokenizer.encode(str(_time_idx), add_special_tokens=False)
867
+ video_ids += time_indices
868
+ video_type_ids += [LANGUAGE_TOKEN_TYPE] * len(time_indices)
869
+ # llama3 adapt for cogvlm
870
+ input_ids += video_ids
871
+ token_type_ids += video_type_ids
872
+
873
+ text_ids = tokenizer.encode(text, add_special_tokens=False)
874
+
875
+ if answer is not None:
876
+ answer_ids = tokenizer.encode(answer, add_special_tokens=False)
877
+ answer_ids += [tokenizer.eos_token_id]
878
+ text_ids += answer_ids
879
+
880
+
881
+ input_ids += text_ids
882
+ token_type_ids += [LANGUAGE_TOKEN_TYPE] * len(text_ids)
883
+ attention_mask = [1] * len(input_ids)
884
+ if answer is not None:
885
+ labels = [-100 for _ in range(len(input_ids) - len(answer_ids))] + answer_ids
886
+ labels = torch.tensor(labels, dtype=torch.long)
887
+ else:
888
+ labels = None
889
+
890
+ return {
891
+ 'input_ids': torch.tensor(input_ids, dtype=torch.long),
892
+ 'token_type_ids': torch.tensor(token_type_ids, dtype=torch.long),
893
+ 'attention_mask': torch.tensor(attention_mask, dtype=torch.long),
894
+ 'images': images,
895
+ 'labels': labels,
896
+ }
897
+
898
+
special_tokens_map.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "bos_token": "<|begin_of_text|>",
3
+ "eos_token": "<|end_of_text|>"
4
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,2064 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "128000": {
4
+ "content": "<|begin_of_text|>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "128001": {
12
+ "content": "<|end_of_text|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "128002": {
20
+ "content": "<|reserved_special_token_0|>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "128003": {
28
+ "content": "<|reserved_special_token_1|>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "128004": {
36
+ "content": "<|reserved_special_token_2|>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "128005": {
44
+ "content": "<|reserved_special_token_3|>",
45
+ "lstrip": false,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ },
51
+ "128006": {
52
+ "content": "<|start_header_id|>",
53
+ "lstrip": false,
54
+ "normalized": false,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": true
58
+ },
59
+ "128007": {
60
+ "content": "<|end_header_id|>",
61
+ "lstrip": false,
62
+ "normalized": false,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": true
66
+ },
67
+ "128008": {
68
+ "content": "<|reserved_special_token_4|>",
69
+ "lstrip": false,
70
+ "normalized": false,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": true
74
+ },
75
+ "128009": {
76
+ "content": "<|eot_id|>",
77
+ "lstrip": false,
78
+ "normalized": false,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": true
82
+ },
83
+ "128010": {
84
+ "content": "<|reserved_special_token_5|>",
85
+ "lstrip": false,
86
+ "normalized": false,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": true
90
+ },
91
+ "128011": {
92
+ "content": "<|reserved_special_token_6|>",
93
+ "lstrip": false,
94
+ "normalized": false,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": true
98
+ },
99
+ "128012": {
100
+ "content": "<|reserved_special_token_7|>",
101
+ "lstrip": false,
102
+ "normalized": false,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": true
106
+ },
107
+ "128013": {
108
+ "content": "<|reserved_special_token_8|>",
109
+ "lstrip": false,
110
+ "normalized": false,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": true
114
+ },
115
+ "128014": {
116
+ "content": "<|reserved_special_token_9|>",
117
+ "lstrip": false,
118
+ "normalized": false,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": true
122
+ },
123
+ "128015": {
124
+ "content": "<|reserved_special_token_10|>",
125
+ "lstrip": false,
126
+ "normalized": false,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": true
130
+ },
131
+ "128016": {
132
+ "content": "<|reserved_special_token_11|>",
133
+ "lstrip": false,
134
+ "normalized": false,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": true
138
+ },
139
+ "128017": {
140
+ "content": "<|reserved_special_token_12|>",
141
+ "lstrip": false,
142
+ "normalized": false,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": true
146
+ },
147
+ "128018": {
148
+ "content": "<|reserved_special_token_13|>",
149
+ "lstrip": false,
150
+ "normalized": false,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": true
154
+ },
155
+ "128019": {
156
+ "content": "<|reserved_special_token_14|>",
157
+ "lstrip": false,
158
+ "normalized": false,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": true
162
+ },
163
+ "128020": {
164
+ "content": "<|reserved_special_token_15|>",
165
+ "lstrip": false,
166
+ "normalized": false,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": true
170
+ },
171
+ "128021": {
172
+ "content": "<|reserved_special_token_16|>",
173
+ "lstrip": false,
174
+ "normalized": false,
175
+ "rstrip": false,
176
+ "single_word": false,
177
+ "special": true
178
+ },
179
+ "128022": {
180
+ "content": "<|reserved_special_token_17|>",
181
+ "lstrip": false,
182
+ "normalized": false,
183
+ "rstrip": false,
184
+ "single_word": false,
185
+ "special": true
186
+ },
187
+ "128023": {
188
+ "content": "<|reserved_special_token_18|>",
189
+ "lstrip": false,
190
+ "normalized": false,
191
+ "rstrip": false,
192
+ "single_word": false,
193
+ "special": true
194
+ },
195
+ "128024": {
196
+ "content": "<|reserved_special_token_19|>",
197
+ "lstrip": false,
198
+ "normalized": false,
199
+ "rstrip": false,
200
+ "single_word": false,
201
+ "special": true
202
+ },
203
+ "128025": {
204
+ "content": "<|reserved_special_token_20|>",
205
+ "lstrip": false,
206
+ "normalized": false,
207
+ "rstrip": false,
208
+ "single_word": false,
209
+ "special": true
210
+ },
211
+ "128026": {
212
+ "content": "<|reserved_special_token_21|>",
213
+ "lstrip": false,
214
+ "normalized": false,
215
+ "rstrip": false,
216
+ "single_word": false,
217
+ "special": true
218
+ },
219
+ "128027": {
220
+ "content": "<|reserved_special_token_22|>",
221
+ "lstrip": false,
222
+ "normalized": false,
223
+ "rstrip": false,
224
+ "single_word": false,
225
+ "special": true
226
+ },
227
+ "128028": {
228
+ "content": "<|reserved_special_token_23|>",
229
+ "lstrip": false,
230
+ "normalized": false,
231
+ "rstrip": false,
232
+ "single_word": false,
233
+ "special": true
234
+ },
235
+ "128029": {
236
+ "content": "<|reserved_special_token_24|>",
237
+ "lstrip": false,
238
+ "normalized": false,
239
+ "rstrip": false,
240
+ "single_word": false,
241
+ "special": true
242
+ },
243
+ "128030": {
244
+ "content": "<|reserved_special_token_25|>",
245
+ "lstrip": false,
246
+ "normalized": false,
247
+ "rstrip": false,
248
+ "single_word": false,
249
+ "special": true
250
+ },
251
+ "128031": {
252
+ "content": "<|reserved_special_token_26|>",
253
+ "lstrip": false,
254
+ "normalized": false,
255
+ "rstrip": false,
256
+ "single_word": false,
257
+ "special": true
258
+ },
259
+ "128032": {
260
+ "content": "<|reserved_special_token_27|>",
261
+ "lstrip": false,
262
+ "normalized": false,
263
+ "rstrip": false,
264
+ "single_word": false,
265
+ "special": true
266
+ },
267
+ "128033": {
268
+ "content": "<|reserved_special_token_28|>",
269
+ "lstrip": false,
270
+ "normalized": false,
271
+ "rstrip": false,
272
+ "single_word": false,
273
+ "special": true
274
+ },
275
+ "128034": {
276
+ "content": "<|reserved_special_token_29|>",
277
+ "lstrip": false,
278
+ "normalized": false,
279
+ "rstrip": false,
280
+ "single_word": false,
281
+ "special": true
282
+ },
283
+ "128035": {
284
+ "content": "<|reserved_special_token_30|>",
285
+ "lstrip": false,
286
+ "normalized": false,
287
+ "rstrip": false,
288
+ "single_word": false,
289
+ "special": true
290
+ },
291
+ "128036": {
292
+ "content": "<|reserved_special_token_31|>",
293
+ "lstrip": false,
294
+ "normalized": false,
295
+ "rstrip": false,
296
+ "single_word": false,
297
+ "special": true
298
+ },
299
+ "128037": {
300
+ "content": "<|reserved_special_token_32|>",
301
+ "lstrip": false,
302
+ "normalized": false,
303
+ "rstrip": false,
304
+ "single_word": false,
305
+ "special": true
306
+ },
307
+ "128038": {
308
+ "content": "<|reserved_special_token_33|>",
309
+ "lstrip": false,
310
+ "normalized": false,
311
+ "rstrip": false,
312
+ "single_word": false,
313
+ "special": true
314
+ },
315
+ "128039": {
316
+ "content": "<|reserved_special_token_34|>",
317
+ "lstrip": false,
318
+ "normalized": false,
319
+ "rstrip": false,
320
+ "single_word": false,
321
+ "special": true
322
+ },
323
+ "128040": {
324
+ "content": "<|reserved_special_token_35|>",
325
+ "lstrip": false,
326
+ "normalized": false,
327
+ "rstrip": false,
328
+ "single_word": false,
329
+ "special": true
330
+ },
331
+ "128041": {
332
+ "content": "<|reserved_special_token_36|>",
333
+ "lstrip": false,
334
+ "normalized": false,
335
+ "rstrip": false,
336
+ "single_word": false,
337
+ "special": true
338
+ },
339
+ "128042": {
340
+ "content": "<|reserved_special_token_37|>",
341
+ "lstrip": false,
342
+ "normalized": false,
343
+ "rstrip": false,
344
+ "single_word": false,
345
+ "special": true
346
+ },
347
+ "128043": {
348
+ "content": "<|reserved_special_token_38|>",
349
+ "lstrip": false,
350
+ "normalized": false,
351
+ "rstrip": false,
352
+ "single_word": false,
353
+ "special": true
354
+ },
355
+ "128044": {
356
+ "content": "<|reserved_special_token_39|>",
357
+ "lstrip": false,
358
+ "normalized": false,
359
+ "rstrip": false,
360
+ "single_word": false,
361
+ "special": true
362
+ },
363
+ "128045": {
364
+ "content": "<|reserved_special_token_40|>",
365
+ "lstrip": false,
366
+ "normalized": false,
367
+ "rstrip": false,
368
+ "single_word": false,
369
+ "special": true
370
+ },
371
+ "128046": {
372
+ "content": "<|reserved_special_token_41|>",
373
+ "lstrip": false,
374
+ "normalized": false,
375
+ "rstrip": false,
376
+ "single_word": false,
377
+ "special": true
378
+ },
379
+ "128047": {
380
+ "content": "<|reserved_special_token_42|>",
381
+ "lstrip": false,
382
+ "normalized": false,
383
+ "rstrip": false,
384
+ "single_word": false,
385
+ "special": true
386
+ },
387
+ "128048": {
388
+ "content": "<|reserved_special_token_43|>",
389
+ "lstrip": false,
390
+ "normalized": false,
391
+ "rstrip": false,
392
+ "single_word": false,
393
+ "special": true
394
+ },
395
+ "128049": {
396
+ "content": "<|reserved_special_token_44|>",
397
+ "lstrip": false,
398
+ "normalized": false,
399
+ "rstrip": false,
400
+ "single_word": false,
401
+ "special": true
402
+ },
403
+ "128050": {
404
+ "content": "<|reserved_special_token_45|>",
405
+ "lstrip": false,
406
+ "normalized": false,
407
+ "rstrip": false,
408
+ "single_word": false,
409
+ "special": true
410
+ },
411
+ "128051": {
412
+ "content": "<|reserved_special_token_46|>",
413
+ "lstrip": false,
414
+ "normalized": false,
415
+ "rstrip": false,
416
+ "single_word": false,
417
+ "special": true
418
+ },
419
+ "128052": {
420
+ "content": "<|reserved_special_token_47|>",
421
+ "lstrip": false,
422
+ "normalized": false,
423
+ "rstrip": false,
424
+ "single_word": false,
425
+ "special": true
426
+ },
427
+ "128053": {
428
+ "content": "<|reserved_special_token_48|>",
429
+ "lstrip": false,
430
+ "normalized": false,
431
+ "rstrip": false,
432
+ "single_word": false,
433
+ "special": true
434
+ },
435
+ "128054": {
436
+ "content": "<|reserved_special_token_49|>",
437
+ "lstrip": false,
438
+ "normalized": false,
439
+ "rstrip": false,
440
+ "single_word": false,
441
+ "special": true
442
+ },
443
+ "128055": {
444
+ "content": "<|reserved_special_token_50|>",
445
+ "lstrip": false,
446
+ "normalized": false,
447
+ "rstrip": false,
448
+ "single_word": false,
449
+ "special": true
450
+ },
451
+ "128056": {
452
+ "content": "<|reserved_special_token_51|>",
453
+ "lstrip": false,
454
+ "normalized": false,
455
+ "rstrip": false,
456
+ "single_word": false,
457
+ "special": true
458
+ },
459
+ "128057": {
460
+ "content": "<|reserved_special_token_52|>",
461
+ "lstrip": false,
462
+ "normalized": false,
463
+ "rstrip": false,
464
+ "single_word": false,
465
+ "special": true
466
+ },
467
+ "128058": {
468
+ "content": "<|reserved_special_token_53|>",
469
+ "lstrip": false,
470
+ "normalized": false,
471
+ "rstrip": false,
472
+ "single_word": false,
473
+ "special": true
474
+ },
475
+ "128059": {
476
+ "content": "<|reserved_special_token_54|>",
477
+ "lstrip": false,
478
+ "normalized": false,
479
+ "rstrip": false,
480
+ "single_word": false,
481
+ "special": true
482
+ },
483
+ "128060": {
484
+ "content": "<|reserved_special_token_55|>",
485
+ "lstrip": false,
486
+ "normalized": false,
487
+ "rstrip": false,
488
+ "single_word": false,
489
+ "special": true
490
+ },
491
+ "128061": {
492
+ "content": "<|reserved_special_token_56|>",
493
+ "lstrip": false,
494
+ "normalized": false,
495
+ "rstrip": false,
496
+ "single_word": false,
497
+ "special": true
498
+ },
499
+ "128062": {
500
+ "content": "<|reserved_special_token_57|>",
501
+ "lstrip": false,
502
+ "normalized": false,
503
+ "rstrip": false,
504
+ "single_word": false,
505
+ "special": true
506
+ },
507
+ "128063": {
508
+ "content": "<|reserved_special_token_58|>",
509
+ "lstrip": false,
510
+ "normalized": false,
511
+ "rstrip": false,
512
+ "single_word": false,
513
+ "special": true
514
+ },
515
+ "128064": {
516
+ "content": "<|reserved_special_token_59|>",
517
+ "lstrip": false,
518
+ "normalized": false,
519
+ "rstrip": false,
520
+ "single_word": false,
521
+ "special": true
522
+ },
523
+ "128065": {
524
+ "content": "<|reserved_special_token_60|>",
525
+ "lstrip": false,
526
+ "normalized": false,
527
+ "rstrip": false,
528
+ "single_word": false,
529
+ "special": true
530
+ },
531
+ "128066": {
532
+ "content": "<|reserved_special_token_61|>",
533
+ "lstrip": false,
534
+ "normalized": false,
535
+ "rstrip": false,
536
+ "single_word": false,
537
+ "special": true
538
+ },
539
+ "128067": {
540
+ "content": "<|reserved_special_token_62|>",
541
+ "lstrip": false,
542
+ "normalized": false,
543
+ "rstrip": false,
544
+ "single_word": false,
545
+ "special": true
546
+ },
547
+ "128068": {
548
+ "content": "<|reserved_special_token_63|>",
549
+ "lstrip": false,
550
+ "normalized": false,
551
+ "rstrip": false,
552
+ "single_word": false,
553
+ "special": true
554
+ },
555
+ "128069": {
556
+ "content": "<|reserved_special_token_64|>",
557
+ "lstrip": false,
558
+ "normalized": false,
559
+ "rstrip": false,
560
+ "single_word": false,
561
+ "special": true
562
+ },
563
+ "128070": {
564
+ "content": "<|reserved_special_token_65|>",
565
+ "lstrip": false,
566
+ "normalized": false,
567
+ "rstrip": false,
568
+ "single_word": false,
569
+ "special": true
570
+ },
571
+ "128071": {
572
+ "content": "<|reserved_special_token_66|>",
573
+ "lstrip": false,
574
+ "normalized": false,
575
+ "rstrip": false,
576
+ "single_word": false,
577
+ "special": true
578
+ },
579
+ "128072": {
580
+ "content": "<|reserved_special_token_67|>",
581
+ "lstrip": false,
582
+ "normalized": false,
583
+ "rstrip": false,
584
+ "single_word": false,
585
+ "special": true
586
+ },
587
+ "128073": {
588
+ "content": "<|reserved_special_token_68|>",
589
+ "lstrip": false,
590
+ "normalized": false,
591
+ "rstrip": false,
592
+ "single_word": false,
593
+ "special": true
594
+ },
595
+ "128074": {
596
+ "content": "<|reserved_special_token_69|>",
597
+ "lstrip": false,
598
+ "normalized": false,
599
+ "rstrip": false,
600
+ "single_word": false,
601
+ "special": true
602
+ },
603
+ "128075": {
604
+ "content": "<|reserved_special_token_70|>",
605
+ "lstrip": false,
606
+ "normalized": false,
607
+ "rstrip": false,
608
+ "single_word": false,
609
+ "special": true
610
+ },
611
+ "128076": {
612
+ "content": "<|reserved_special_token_71|>",
613
+ "lstrip": false,
614
+ "normalized": false,
615
+ "rstrip": false,
616
+ "single_word": false,
617
+ "special": true
618
+ },
619
+ "128077": {
620
+ "content": "<|reserved_special_token_72|>",
621
+ "lstrip": false,
622
+ "normalized": false,
623
+ "rstrip": false,
624
+ "single_word": false,
625
+ "special": true
626
+ },
627
+ "128078": {
628
+ "content": "<|reserved_special_token_73|>",
629
+ "lstrip": false,
630
+ "normalized": false,
631
+ "rstrip": false,
632
+ "single_word": false,
633
+ "special": true
634
+ },
635
+ "128079": {
636
+ "content": "<|reserved_special_token_74|>",
637
+ "lstrip": false,
638
+ "normalized": false,
639
+ "rstrip": false,
640
+ "single_word": false,
641
+ "special": true
642
+ },
643
+ "128080": {
644
+ "content": "<|reserved_special_token_75|>",
645
+ "lstrip": false,
646
+ "normalized": false,
647
+ "rstrip": false,
648
+ "single_word": false,
649
+ "special": true
650
+ },
651
+ "128081": {
652
+ "content": "<|reserved_special_token_76|>",
653
+ "lstrip": false,
654
+ "normalized": false,
655
+ "rstrip": false,
656
+ "single_word": false,
657
+ "special": true
658
+ },
659
+ "128082": {
660
+ "content": "<|reserved_special_token_77|>",
661
+ "lstrip": false,
662
+ "normalized": false,
663
+ "rstrip": false,
664
+ "single_word": false,
665
+ "special": true
666
+ },
667
+ "128083": {
668
+ "content": "<|reserved_special_token_78|>",
669
+ "lstrip": false,
670
+ "normalized": false,
671
+ "rstrip": false,
672
+ "single_word": false,
673
+ "special": true
674
+ },
675
+ "128084": {
676
+ "content": "<|reserved_special_token_79|>",
677
+ "lstrip": false,
678
+ "normalized": false,
679
+ "rstrip": false,
680
+ "single_word": false,
681
+ "special": true
682
+ },
683
+ "128085": {
684
+ "content": "<|reserved_special_token_80|>",
685
+ "lstrip": false,
686
+ "normalized": false,
687
+ "rstrip": false,
688
+ "single_word": false,
689
+ "special": true
690
+ },
691
+ "128086": {
692
+ "content": "<|reserved_special_token_81|>",
693
+ "lstrip": false,
694
+ "normalized": false,
695
+ "rstrip": false,
696
+ "single_word": false,
697
+ "special": true
698
+ },
699
+ "128087": {
700
+ "content": "<|reserved_special_token_82|>",
701
+ "lstrip": false,
702
+ "normalized": false,
703
+ "rstrip": false,
704
+ "single_word": false,
705
+ "special": true
706
+ },
707
+ "128088": {
708
+ "content": "<|reserved_special_token_83|>",
709
+ "lstrip": false,
710
+ "normalized": false,
711
+ "rstrip": false,
712
+ "single_word": false,
713
+ "special": true
714
+ },
715
+ "128089": {
716
+ "content": "<|reserved_special_token_84|>",
717
+ "lstrip": false,
718
+ "normalized": false,
719
+ "rstrip": false,
720
+ "single_word": false,
721
+ "special": true
722
+ },
723
+ "128090": {
724
+ "content": "<|reserved_special_token_85|>",
725
+ "lstrip": false,
726
+ "normalized": false,
727
+ "rstrip": false,
728
+ "single_word": false,
729
+ "special": true
730
+ },
731
+ "128091": {
732
+ "content": "<|reserved_special_token_86|>",
733
+ "lstrip": false,
734
+ "normalized": false,
735
+ "rstrip": false,
736
+ "single_word": false,
737
+ "special": true
738
+ },
739
+ "128092": {
740
+ "content": "<|reserved_special_token_87|>",
741
+ "lstrip": false,
742
+ "normalized": false,
743
+ "rstrip": false,
744
+ "single_word": false,
745
+ "special": true
746
+ },
747
+ "128093": {
748
+ "content": "<|reserved_special_token_88|>",
749
+ "lstrip": false,
750
+ "normalized": false,
751
+ "rstrip": false,
752
+ "single_word": false,
753
+ "special": true
754
+ },
755
+ "128094": {
756
+ "content": "<|reserved_special_token_89|>",
757
+ "lstrip": false,
758
+ "normalized": false,
759
+ "rstrip": false,
760
+ "single_word": false,
761
+ "special": true
762
+ },
763
+ "128095": {
764
+ "content": "<|reserved_special_token_90|>",
765
+ "lstrip": false,
766
+ "normalized": false,
767
+ "rstrip": false,
768
+ "single_word": false,
769
+ "special": true
770
+ },
771
+ "128096": {
772
+ "content": "<|reserved_special_token_91|>",
773
+ "lstrip": false,
774
+ "normalized": false,
775
+ "rstrip": false,
776
+ "single_word": false,
777
+ "special": true
778
+ },
779
+ "128097": {
780
+ "content": "<|reserved_special_token_92|>",
781
+ "lstrip": false,
782
+ "normalized": false,
783
+ "rstrip": false,
784
+ "single_word": false,
785
+ "special": true
786
+ },
787
+ "128098": {
788
+ "content": "<|reserved_special_token_93|>",
789
+ "lstrip": false,
790
+ "normalized": false,
791
+ "rstrip": false,
792
+ "single_word": false,
793
+ "special": true
794
+ },
795
+ "128099": {
796
+ "content": "<|reserved_special_token_94|>",
797
+ "lstrip": false,
798
+ "normalized": false,
799
+ "rstrip": false,
800
+ "single_word": false,
801
+ "special": true
802
+ },
803
+ "128100": {
804
+ "content": "<|reserved_special_token_95|>",
805
+ "lstrip": false,
806
+ "normalized": false,
807
+ "rstrip": false,
808
+ "single_word": false,
809
+ "special": true
810
+ },
811
+ "128101": {
812
+ "content": "<|reserved_special_token_96|>",
813
+ "lstrip": false,
814
+ "normalized": false,
815
+ "rstrip": false,
816
+ "single_word": false,
817
+ "special": true
818
+ },
819
+ "128102": {
820
+ "content": "<|reserved_special_token_97|>",
821
+ "lstrip": false,
822
+ "normalized": false,
823
+ "rstrip": false,
824
+ "single_word": false,
825
+ "special": true
826
+ },
827
+ "128103": {
828
+ "content": "<|reserved_special_token_98|>",
829
+ "lstrip": false,
830
+ "normalized": false,
831
+ "rstrip": false,
832
+ "single_word": false,
833
+ "special": true
834
+ },
835
+ "128104": {
836
+ "content": "<|reserved_special_token_99|>",
837
+ "lstrip": false,
838
+ "normalized": false,
839
+ "rstrip": false,
840
+ "single_word": false,
841
+ "special": true
842
+ },
843
+ "128105": {
844
+ "content": "<|reserved_special_token_100|>",
845
+ "lstrip": false,
846
+ "normalized": false,
847
+ "rstrip": false,
848
+ "single_word": false,
849
+ "special": true
850
+ },
851
+ "128106": {
852
+ "content": "<|reserved_special_token_101|>",
853
+ "lstrip": false,
854
+ "normalized": false,
855
+ "rstrip": false,
856
+ "single_word": false,
857
+ "special": true
858
+ },
859
+ "128107": {
860
+ "content": "<|reserved_special_token_102|>",
861
+ "lstrip": false,
862
+ "normalized": false,
863
+ "rstrip": false,
864
+ "single_word": false,
865
+ "special": true
866
+ },
867
+ "128108": {
868
+ "content": "<|reserved_special_token_103|>",
869
+ "lstrip": false,
870
+ "normalized": false,
871
+ "rstrip": false,
872
+ "single_word": false,
873
+ "special": true
874
+ },
875
+ "128109": {
876
+ "content": "<|reserved_special_token_104|>",
877
+ "lstrip": false,
878
+ "normalized": false,
879
+ "rstrip": false,
880
+ "single_word": false,
881
+ "special": true
882
+ },
883
+ "128110": {
884
+ "content": "<|reserved_special_token_105|>",
885
+ "lstrip": false,
886
+ "normalized": false,
887
+ "rstrip": false,
888
+ "single_word": false,
889
+ "special": true
890
+ },
891
+ "128111": {
892
+ "content": "<|reserved_special_token_106|>",
893
+ "lstrip": false,
894
+ "normalized": false,
895
+ "rstrip": false,
896
+ "single_word": false,
897
+ "special": true
898
+ },
899
+ "128112": {
900
+ "content": "<|reserved_special_token_107|>",
901
+ "lstrip": false,
902
+ "normalized": false,
903
+ "rstrip": false,
904
+ "single_word": false,
905
+ "special": true
906
+ },
907
+ "128113": {
908
+ "content": "<|reserved_special_token_108|>",
909
+ "lstrip": false,
910
+ "normalized": false,
911
+ "rstrip": false,
912
+ "single_word": false,
913
+ "special": true
914
+ },
915
+ "128114": {
916
+ "content": "<|reserved_special_token_109|>",
917
+ "lstrip": false,
918
+ "normalized": false,
919
+ "rstrip": false,
920
+ "single_word": false,
921
+ "special": true
922
+ },
923
+ "128115": {
924
+ "content": "<|reserved_special_token_110|>",
925
+ "lstrip": false,
926
+ "normalized": false,
927
+ "rstrip": false,
928
+ "single_word": false,
929
+ "special": true
930
+ },
931
+ "128116": {
932
+ "content": "<|reserved_special_token_111|>",
933
+ "lstrip": false,
934
+ "normalized": false,
935
+ "rstrip": false,
936
+ "single_word": false,
937
+ "special": true
938
+ },
939
+ "128117": {
940
+ "content": "<|reserved_special_token_112|>",
941
+ "lstrip": false,
942
+ "normalized": false,
943
+ "rstrip": false,
944
+ "single_word": false,
945
+ "special": true
946
+ },
947
+ "128118": {
948
+ "content": "<|reserved_special_token_113|>",
949
+ "lstrip": false,
950
+ "normalized": false,
951
+ "rstrip": false,
952
+ "single_word": false,
953
+ "special": true
954
+ },
955
+ "128119": {
956
+ "content": "<|reserved_special_token_114|>",
957
+ "lstrip": false,
958
+ "normalized": false,
959
+ "rstrip": false,
960
+ "single_word": false,
961
+ "special": true
962
+ },
963
+ "128120": {
964
+ "content": "<|reserved_special_token_115|>",
965
+ "lstrip": false,
966
+ "normalized": false,
967
+ "rstrip": false,
968
+ "single_word": false,
969
+ "special": true
970
+ },
971
+ "128121": {
972
+ "content": "<|reserved_special_token_116|>",
973
+ "lstrip": false,
974
+ "normalized": false,
975
+ "rstrip": false,
976
+ "single_word": false,
977
+ "special": true
978
+ },
979
+ "128122": {
980
+ "content": "<|reserved_special_token_117|>",
981
+ "lstrip": false,
982
+ "normalized": false,
983
+ "rstrip": false,
984
+ "single_word": false,
985
+ "special": true
986
+ },
987
+ "128123": {
988
+ "content": "<|reserved_special_token_118|>",
989
+ "lstrip": false,
990
+ "normalized": false,
991
+ "rstrip": false,
992
+ "single_word": false,
993
+ "special": true
994
+ },
995
+ "128124": {
996
+ "content": "<|reserved_special_token_119|>",
997
+ "lstrip": false,
998
+ "normalized": false,
999
+ "rstrip": false,
1000
+ "single_word": false,
1001
+ "special": true
1002
+ },
1003
+ "128125": {
1004
+ "content": "<|reserved_special_token_120|>",
1005
+ "lstrip": false,
1006
+ "normalized": false,
1007
+ "rstrip": false,
1008
+ "single_word": false,
1009
+ "special": true
1010
+ },
1011
+ "128126": {
1012
+ "content": "<|reserved_special_token_121|>",
1013
+ "lstrip": false,
1014
+ "normalized": false,
1015
+ "rstrip": false,
1016
+ "single_word": false,
1017
+ "special": true
1018
+ },
1019
+ "128127": {
1020
+ "content": "<|reserved_special_token_122|>",
1021
+ "lstrip": false,
1022
+ "normalized": false,
1023
+ "rstrip": false,
1024
+ "single_word": false,
1025
+ "special": true
1026
+ },
1027
+ "128128": {
1028
+ "content": "<|reserved_special_token_123|>",
1029
+ "lstrip": false,
1030
+ "normalized": false,
1031
+ "rstrip": false,
1032
+ "single_word": false,
1033
+ "special": true
1034
+ },
1035
+ "128129": {
1036
+ "content": "<|reserved_special_token_124|>",
1037
+ "lstrip": false,
1038
+ "normalized": false,
1039
+ "rstrip": false,
1040
+ "single_word": false,
1041
+ "special": true
1042
+ },
1043
+ "128130": {
1044
+ "content": "<|reserved_special_token_125|>",
1045
+ "lstrip": false,
1046
+ "normalized": false,
1047
+ "rstrip": false,
1048
+ "single_word": false,
1049
+ "special": true
1050
+ },
1051
+ "128131": {
1052
+ "content": "<|reserved_special_token_126|>",
1053
+ "lstrip": false,
1054
+ "normalized": false,
1055
+ "rstrip": false,
1056
+ "single_word": false,
1057
+ "special": true
1058
+ },
1059
+ "128132": {
1060
+ "content": "<|reserved_special_token_127|>",
1061
+ "lstrip": false,
1062
+ "normalized": false,
1063
+ "rstrip": false,
1064
+ "single_word": false,
1065
+ "special": true
1066
+ },
1067
+ "128133": {
1068
+ "content": "<|reserved_special_token_128|>",
1069
+ "lstrip": false,
1070
+ "normalized": false,
1071
+ "rstrip": false,
1072
+ "single_word": false,
1073
+ "special": true
1074
+ },
1075
+ "128134": {
1076
+ "content": "<|reserved_special_token_129|>",
1077
+ "lstrip": false,
1078
+ "normalized": false,
1079
+ "rstrip": false,
1080
+ "single_word": false,
1081
+ "special": true
1082
+ },
1083
+ "128135": {
1084
+ "content": "<|reserved_special_token_130|>",
1085
+ "lstrip": false,
1086
+ "normalized": false,
1087
+ "rstrip": false,
1088
+ "single_word": false,
1089
+ "special": true
1090
+ },
1091
+ "128136": {
1092
+ "content": "<|reserved_special_token_131|>",
1093
+ "lstrip": false,
1094
+ "normalized": false,
1095
+ "rstrip": false,
1096
+ "single_word": false,
1097
+ "special": true
1098
+ },
1099
+ "128137": {
1100
+ "content": "<|reserved_special_token_132|>",
1101
+ "lstrip": false,
1102
+ "normalized": false,
1103
+ "rstrip": false,
1104
+ "single_word": false,
1105
+ "special": true
1106
+ },
1107
+ "128138": {
1108
+ "content": "<|reserved_special_token_133|>",
1109
+ "lstrip": false,
1110
+ "normalized": false,
1111
+ "rstrip": false,
1112
+ "single_word": false,
1113
+ "special": true
1114
+ },
1115
+ "128139": {
1116
+ "content": "<|reserved_special_token_134|>",
1117
+ "lstrip": false,
1118
+ "normalized": false,
1119
+ "rstrip": false,
1120
+ "single_word": false,
1121
+ "special": true
1122
+ },
1123
+ "128140": {
1124
+ "content": "<|reserved_special_token_135|>",
1125
+ "lstrip": false,
1126
+ "normalized": false,
1127
+ "rstrip": false,
1128
+ "single_word": false,
1129
+ "special": true
1130
+ },
1131
+ "128141": {
1132
+ "content": "<|reserved_special_token_136|>",
1133
+ "lstrip": false,
1134
+ "normalized": false,
1135
+ "rstrip": false,
1136
+ "single_word": false,
1137
+ "special": true
1138
+ },
1139
+ "128142": {
1140
+ "content": "<|reserved_special_token_137|>",
1141
+ "lstrip": false,
1142
+ "normalized": false,
1143
+ "rstrip": false,
1144
+ "single_word": false,
1145
+ "special": true
1146
+ },
1147
+ "128143": {
1148
+ "content": "<|reserved_special_token_138|>",
1149
+ "lstrip": false,
1150
+ "normalized": false,
1151
+ "rstrip": false,
1152
+ "single_word": false,
1153
+ "special": true
1154
+ },
1155
+ "128144": {
1156
+ "content": "<|reserved_special_token_139|>",
1157
+ "lstrip": false,
1158
+ "normalized": false,
1159
+ "rstrip": false,
1160
+ "single_word": false,
1161
+ "special": true
1162
+ },
1163
+ "128145": {
1164
+ "content": "<|reserved_special_token_140|>",
1165
+ "lstrip": false,
1166
+ "normalized": false,
1167
+ "rstrip": false,
1168
+ "single_word": false,
1169
+ "special": true
1170
+ },
1171
+ "128146": {
1172
+ "content": "<|reserved_special_token_141|>",
1173
+ "lstrip": false,
1174
+ "normalized": false,
1175
+ "rstrip": false,
1176
+ "single_word": false,
1177
+ "special": true
1178
+ },
1179
+ "128147": {
1180
+ "content": "<|reserved_special_token_142|>",
1181
+ "lstrip": false,
1182
+ "normalized": false,
1183
+ "rstrip": false,
1184
+ "single_word": false,
1185
+ "special": true
1186
+ },
1187
+ "128148": {
1188
+ "content": "<|reserved_special_token_143|>",
1189
+ "lstrip": false,
1190
+ "normalized": false,
1191
+ "rstrip": false,
1192
+ "single_word": false,
1193
+ "special": true
1194
+ },
1195
+ "128149": {
1196
+ "content": "<|reserved_special_token_144|>",
1197
+ "lstrip": false,
1198
+ "normalized": false,
1199
+ "rstrip": false,
1200
+ "single_word": false,
1201
+ "special": true
1202
+ },
1203
+ "128150": {
1204
+ "content": "<|reserved_special_token_145|>",
1205
+ "lstrip": false,
1206
+ "normalized": false,
1207
+ "rstrip": false,
1208
+ "single_word": false,
1209
+ "special": true
1210
+ },
1211
+ "128151": {
1212
+ "content": "<|reserved_special_token_146|>",
1213
+ "lstrip": false,
1214
+ "normalized": false,
1215
+ "rstrip": false,
1216
+ "single_word": false,
1217
+ "special": true
1218
+ },
1219
+ "128152": {
1220
+ "content": "<|reserved_special_token_147|>",
1221
+ "lstrip": false,
1222
+ "normalized": false,
1223
+ "rstrip": false,
1224
+ "single_word": false,
1225
+ "special": true
1226
+ },
1227
+ "128153": {
1228
+ "content": "<|reserved_special_token_148|>",
1229
+ "lstrip": false,
1230
+ "normalized": false,
1231
+ "rstrip": false,
1232
+ "single_word": false,
1233
+ "special": true
1234
+ },
1235
+ "128154": {
1236
+ "content": "<|reserved_special_token_149|>",
1237
+ "lstrip": false,
1238
+ "normalized": false,
1239
+ "rstrip": false,
1240
+ "single_word": false,
1241
+ "special": true
1242
+ },
1243
+ "128155": {
1244
+ "content": "<|reserved_special_token_150|>",
1245
+ "lstrip": false,
1246
+ "normalized": false,
1247
+ "rstrip": false,
1248
+ "single_word": false,
1249
+ "special": true
1250
+ },
1251
+ "128156": {
1252
+ "content": "<|reserved_special_token_151|>",
1253
+ "lstrip": false,
1254
+ "normalized": false,
1255
+ "rstrip": false,
1256
+ "single_word": false,
1257
+ "special": true
1258
+ },
1259
+ "128157": {
1260
+ "content": "<|reserved_special_token_152|>",
1261
+ "lstrip": false,
1262
+ "normalized": false,
1263
+ "rstrip": false,
1264
+ "single_word": false,
1265
+ "special": true
1266
+ },
1267
+ "128158": {
1268
+ "content": "<|reserved_special_token_153|>",
1269
+ "lstrip": false,
1270
+ "normalized": false,
1271
+ "rstrip": false,
1272
+ "single_word": false,
1273
+ "special": true
1274
+ },
1275
+ "128159": {
1276
+ "content": "<|reserved_special_token_154|>",
1277
+ "lstrip": false,
1278
+ "normalized": false,
1279
+ "rstrip": false,
1280
+ "single_word": false,
1281
+ "special": true
1282
+ },
1283
+ "128160": {
1284
+ "content": "<|reserved_special_token_155|>",
1285
+ "lstrip": false,
1286
+ "normalized": false,
1287
+ "rstrip": false,
1288
+ "single_word": false,
1289
+ "special": true
1290
+ },
1291
+ "128161": {
1292
+ "content": "<|reserved_special_token_156|>",
1293
+ "lstrip": false,
1294
+ "normalized": false,
1295
+ "rstrip": false,
1296
+ "single_word": false,
1297
+ "special": true
1298
+ },
1299
+ "128162": {
1300
+ "content": "<|reserved_special_token_157|>",
1301
+ "lstrip": false,
1302
+ "normalized": false,
1303
+ "rstrip": false,
1304
+ "single_word": false,
1305
+ "special": true
1306
+ },
1307
+ "128163": {
1308
+ "content": "<|reserved_special_token_158|>",
1309
+ "lstrip": false,
1310
+ "normalized": false,
1311
+ "rstrip": false,
1312
+ "single_word": false,
1313
+ "special": true
1314
+ },
1315
+ "128164": {
1316
+ "content": "<|reserved_special_token_159|>",
1317
+ "lstrip": false,
1318
+ "normalized": false,
1319
+ "rstrip": false,
1320
+ "single_word": false,
1321
+ "special": true
1322
+ },
1323
+ "128165": {
1324
+ "content": "<|reserved_special_token_160|>",
1325
+ "lstrip": false,
1326
+ "normalized": false,
1327
+ "rstrip": false,
1328
+ "single_word": false,
1329
+ "special": true
1330
+ },
1331
+ "128166": {
1332
+ "content": "<|reserved_special_token_161|>",
1333
+ "lstrip": false,
1334
+ "normalized": false,
1335
+ "rstrip": false,
1336
+ "single_word": false,
1337
+ "special": true
1338
+ },
1339
+ "128167": {
1340
+ "content": "<|reserved_special_token_162|>",
1341
+ "lstrip": false,
1342
+ "normalized": false,
1343
+ "rstrip": false,
1344
+ "single_word": false,
1345
+ "special": true
1346
+ },
1347
+ "128168": {
1348
+ "content": "<|reserved_special_token_163|>",
1349
+ "lstrip": false,
1350
+ "normalized": false,
1351
+ "rstrip": false,
1352
+ "single_word": false,
1353
+ "special": true
1354
+ },
1355
+ "128169": {
1356
+ "content": "<|reserved_special_token_164|>",
1357
+ "lstrip": false,
1358
+ "normalized": false,
1359
+ "rstrip": false,
1360
+ "single_word": false,
1361
+ "special": true
1362
+ },
1363
+ "128170": {
1364
+ "content": "<|reserved_special_token_165|>",
1365
+ "lstrip": false,
1366
+ "normalized": false,
1367
+ "rstrip": false,
1368
+ "single_word": false,
1369
+ "special": true
1370
+ },
1371
+ "128171": {
1372
+ "content": "<|reserved_special_token_166|>",
1373
+ "lstrip": false,
1374
+ "normalized": false,
1375
+ "rstrip": false,
1376
+ "single_word": false,
1377
+ "special": true
1378
+ },
1379
+ "128172": {
1380
+ "content": "<|reserved_special_token_167|>",
1381
+ "lstrip": false,
1382
+ "normalized": false,
1383
+ "rstrip": false,
1384
+ "single_word": false,
1385
+ "special": true
1386
+ },
1387
+ "128173": {
1388
+ "content": "<|reserved_special_token_168|>",
1389
+ "lstrip": false,
1390
+ "normalized": false,
1391
+ "rstrip": false,
1392
+ "single_word": false,
1393
+ "special": true
1394
+ },
1395
+ "128174": {
1396
+ "content": "<|reserved_special_token_169|>",
1397
+ "lstrip": false,
1398
+ "normalized": false,
1399
+ "rstrip": false,
1400
+ "single_word": false,
1401
+ "special": true
1402
+ },
1403
+ "128175": {
1404
+ "content": "<|reserved_special_token_170|>",
1405
+ "lstrip": false,
1406
+ "normalized": false,
1407
+ "rstrip": false,
1408
+ "single_word": false,
1409
+ "special": true
1410
+ },
1411
+ "128176": {
1412
+ "content": "<|reserved_special_token_171|>",
1413
+ "lstrip": false,
1414
+ "normalized": false,
1415
+ "rstrip": false,
1416
+ "single_word": false,
1417
+ "special": true
1418
+ },
1419
+ "128177": {
1420
+ "content": "<|reserved_special_token_172|>",
1421
+ "lstrip": false,
1422
+ "normalized": false,
1423
+ "rstrip": false,
1424
+ "single_word": false,
1425
+ "special": true
1426
+ },
1427
+ "128178": {
1428
+ "content": "<|reserved_special_token_173|>",
1429
+ "lstrip": false,
1430
+ "normalized": false,
1431
+ "rstrip": false,
1432
+ "single_word": false,
1433
+ "special": true
1434
+ },
1435
+ "128179": {
1436
+ "content": "<|reserved_special_token_174|>",
1437
+ "lstrip": false,
1438
+ "normalized": false,
1439
+ "rstrip": false,
1440
+ "single_word": false,
1441
+ "special": true
1442
+ },
1443
+ "128180": {
1444
+ "content": "<|reserved_special_token_175|>",
1445
+ "lstrip": false,
1446
+ "normalized": false,
1447
+ "rstrip": false,
1448
+ "single_word": false,
1449
+ "special": true
1450
+ },
1451
+ "128181": {
1452
+ "content": "<|reserved_special_token_176|>",
1453
+ "lstrip": false,
1454
+ "normalized": false,
1455
+ "rstrip": false,
1456
+ "single_word": false,
1457
+ "special": true
1458
+ },
1459
+ "128182": {
1460
+ "content": "<|reserved_special_token_177|>",
1461
+ "lstrip": false,
1462
+ "normalized": false,
1463
+ "rstrip": false,
1464
+ "single_word": false,
1465
+ "special": true
1466
+ },
1467
+ "128183": {
1468
+ "content": "<|reserved_special_token_178|>",
1469
+ "lstrip": false,
1470
+ "normalized": false,
1471
+ "rstrip": false,
1472
+ "single_word": false,
1473
+ "special": true
1474
+ },
1475
+ "128184": {
1476
+ "content": "<|reserved_special_token_179|>",
1477
+ "lstrip": false,
1478
+ "normalized": false,
1479
+ "rstrip": false,
1480
+ "single_word": false,
1481
+ "special": true
1482
+ },
1483
+ "128185": {
1484
+ "content": "<|reserved_special_token_180|>",
1485
+ "lstrip": false,
1486
+ "normalized": false,
1487
+ "rstrip": false,
1488
+ "single_word": false,
1489
+ "special": true
1490
+ },
1491
+ "128186": {
1492
+ "content": "<|reserved_special_token_181|>",
1493
+ "lstrip": false,
1494
+ "normalized": false,
1495
+ "rstrip": false,
1496
+ "single_word": false,
1497
+ "special": true
1498
+ },
1499
+ "128187": {
1500
+ "content": "<|reserved_special_token_182|>",
1501
+ "lstrip": false,
1502
+ "normalized": false,
1503
+ "rstrip": false,
1504
+ "single_word": false,
1505
+ "special": true
1506
+ },
1507
+ "128188": {
1508
+ "content": "<|reserved_special_token_183|>",
1509
+ "lstrip": false,
1510
+ "normalized": false,
1511
+ "rstrip": false,
1512
+ "single_word": false,
1513
+ "special": true
1514
+ },
1515
+ "128189": {
1516
+ "content": "<|reserved_special_token_184|>",
1517
+ "lstrip": false,
1518
+ "normalized": false,
1519
+ "rstrip": false,
1520
+ "single_word": false,
1521
+ "special": true
1522
+ },
1523
+ "128190": {
1524
+ "content": "<|reserved_special_token_185|>",
1525
+ "lstrip": false,
1526
+ "normalized": false,
1527
+ "rstrip": false,
1528
+ "single_word": false,
1529
+ "special": true
1530
+ },
1531
+ "128191": {
1532
+ "content": "<|reserved_special_token_186|>",
1533
+ "lstrip": false,
1534
+ "normalized": false,
1535
+ "rstrip": false,
1536
+ "single_word": false,
1537
+ "special": true
1538
+ },
1539
+ "128192": {
1540
+ "content": "<|reserved_special_token_187|>",
1541
+ "lstrip": false,
1542
+ "normalized": false,
1543
+ "rstrip": false,
1544
+ "single_word": false,
1545
+ "special": true
1546
+ },
1547
+ "128193": {
1548
+ "content": "<|reserved_special_token_188|>",
1549
+ "lstrip": false,
1550
+ "normalized": false,
1551
+ "rstrip": false,
1552
+ "single_word": false,
1553
+ "special": true
1554
+ },
1555
+ "128194": {
1556
+ "content": "<|reserved_special_token_189|>",
1557
+ "lstrip": false,
1558
+ "normalized": false,
1559
+ "rstrip": false,
1560
+ "single_word": false,
1561
+ "special": true
1562
+ },
1563
+ "128195": {
1564
+ "content": "<|reserved_special_token_190|>",
1565
+ "lstrip": false,
1566
+ "normalized": false,
1567
+ "rstrip": false,
1568
+ "single_word": false,
1569
+ "special": true
1570
+ },
1571
+ "128196": {
1572
+ "content": "<|reserved_special_token_191|>",
1573
+ "lstrip": false,
1574
+ "normalized": false,
1575
+ "rstrip": false,
1576
+ "single_word": false,
1577
+ "special": true
1578
+ },
1579
+ "128197": {
1580
+ "content": "<|reserved_special_token_192|>",
1581
+ "lstrip": false,
1582
+ "normalized": false,
1583
+ "rstrip": false,
1584
+ "single_word": false,
1585
+ "special": true
1586
+ },
1587
+ "128198": {
1588
+ "content": "<|reserved_special_token_193|>",
1589
+ "lstrip": false,
1590
+ "normalized": false,
1591
+ "rstrip": false,
1592
+ "single_word": false,
1593
+ "special": true
1594
+ },
1595
+ "128199": {
1596
+ "content": "<|reserved_special_token_194|>",
1597
+ "lstrip": false,
1598
+ "normalized": false,
1599
+ "rstrip": false,
1600
+ "single_word": false,
1601
+ "special": true
1602
+ },
1603
+ "128200": {
1604
+ "content": "<|reserved_special_token_195|>",
1605
+ "lstrip": false,
1606
+ "normalized": false,
1607
+ "rstrip": false,
1608
+ "single_word": false,
1609
+ "special": true
1610
+ },
1611
+ "128201": {
1612
+ "content": "<|reserved_special_token_196|>",
1613
+ "lstrip": false,
1614
+ "normalized": false,
1615
+ "rstrip": false,
1616
+ "single_word": false,
1617
+ "special": true
1618
+ },
1619
+ "128202": {
1620
+ "content": "<|reserved_special_token_197|>",
1621
+ "lstrip": false,
1622
+ "normalized": false,
1623
+ "rstrip": false,
1624
+ "single_word": false,
1625
+ "special": true
1626
+ },
1627
+ "128203": {
1628
+ "content": "<|reserved_special_token_198|>",
1629
+ "lstrip": false,
1630
+ "normalized": false,
1631
+ "rstrip": false,
1632
+ "single_word": false,
1633
+ "special": true
1634
+ },
1635
+ "128204": {
1636
+ "content": "<|reserved_special_token_199|>",
1637
+ "lstrip": false,
1638
+ "normalized": false,
1639
+ "rstrip": false,
1640
+ "single_word": false,
1641
+ "special": true
1642
+ },
1643
+ "128205": {
1644
+ "content": "<|reserved_special_token_200|>",
1645
+ "lstrip": false,
1646
+ "normalized": false,
1647
+ "rstrip": false,
1648
+ "single_word": false,
1649
+ "special": true
1650
+ },
1651
+ "128206": {
1652
+ "content": "<|reserved_special_token_201|>",
1653
+ "lstrip": false,
1654
+ "normalized": false,
1655
+ "rstrip": false,
1656
+ "single_word": false,
1657
+ "special": true
1658
+ },
1659
+ "128207": {
1660
+ "content": "<|reserved_special_token_202|>",
1661
+ "lstrip": false,
1662
+ "normalized": false,
1663
+ "rstrip": false,
1664
+ "single_word": false,
1665
+ "special": true
1666
+ },
1667
+ "128208": {
1668
+ "content": "<|reserved_special_token_203|>",
1669
+ "lstrip": false,
1670
+ "normalized": false,
1671
+ "rstrip": false,
1672
+ "single_word": false,
1673
+ "special": true
1674
+ },
1675
+ "128209": {
1676
+ "content": "<|reserved_special_token_204|>",
1677
+ "lstrip": false,
1678
+ "normalized": false,
1679
+ "rstrip": false,
1680
+ "single_word": false,
1681
+ "special": true
1682
+ },
1683
+ "128210": {
1684
+ "content": "<|reserved_special_token_205|>",
1685
+ "lstrip": false,
1686
+ "normalized": false,
1687
+ "rstrip": false,
1688
+ "single_word": false,
1689
+ "special": true
1690
+ },
1691
+ "128211": {
1692
+ "content": "<|reserved_special_token_206|>",
1693
+ "lstrip": false,
1694
+ "normalized": false,
1695
+ "rstrip": false,
1696
+ "single_word": false,
1697
+ "special": true
1698
+ },
1699
+ "128212": {
1700
+ "content": "<|reserved_special_token_207|>",
1701
+ "lstrip": false,
1702
+ "normalized": false,
1703
+ "rstrip": false,
1704
+ "single_word": false,
1705
+ "special": true
1706
+ },
1707
+ "128213": {
1708
+ "content": "<|reserved_special_token_208|>",
1709
+ "lstrip": false,
1710
+ "normalized": false,
1711
+ "rstrip": false,
1712
+ "single_word": false,
1713
+ "special": true
1714
+ },
1715
+ "128214": {
1716
+ "content": "<|reserved_special_token_209|>",
1717
+ "lstrip": false,
1718
+ "normalized": false,
1719
+ "rstrip": false,
1720
+ "single_word": false,
1721
+ "special": true
1722
+ },
1723
+ "128215": {
1724
+ "content": "<|reserved_special_token_210|>",
1725
+ "lstrip": false,
1726
+ "normalized": false,
1727
+ "rstrip": false,
1728
+ "single_word": false,
1729
+ "special": true
1730
+ },
1731
+ "128216": {
1732
+ "content": "<|reserved_special_token_211|>",
1733
+ "lstrip": false,
1734
+ "normalized": false,
1735
+ "rstrip": false,
1736
+ "single_word": false,
1737
+ "special": true
1738
+ },
1739
+ "128217": {
1740
+ "content": "<|reserved_special_token_212|>",
1741
+ "lstrip": false,
1742
+ "normalized": false,
1743
+ "rstrip": false,
1744
+ "single_word": false,
1745
+ "special": true
1746
+ },
1747
+ "128218": {
1748
+ "content": "<|reserved_special_token_213|>",
1749
+ "lstrip": false,
1750
+ "normalized": false,
1751
+ "rstrip": false,
1752
+ "single_word": false,
1753
+ "special": true
1754
+ },
1755
+ "128219": {
1756
+ "content": "<|reserved_special_token_214|>",
1757
+ "lstrip": false,
1758
+ "normalized": false,
1759
+ "rstrip": false,
1760
+ "single_word": false,
1761
+ "special": true
1762
+ },
1763
+ "128220": {
1764
+ "content": "<|reserved_special_token_215|>",
1765
+ "lstrip": false,
1766
+ "normalized": false,
1767
+ "rstrip": false,
1768
+ "single_word": false,
1769
+ "special": true
1770
+ },
1771
+ "128221": {
1772
+ "content": "<|reserved_special_token_216|>",
1773
+ "lstrip": false,
1774
+ "normalized": false,
1775
+ "rstrip": false,
1776
+ "single_word": false,
1777
+ "special": true
1778
+ },
1779
+ "128222": {
1780
+ "content": "<|reserved_special_token_217|>",
1781
+ "lstrip": false,
1782
+ "normalized": false,
1783
+ "rstrip": false,
1784
+ "single_word": false,
1785
+ "special": true
1786
+ },
1787
+ "128223": {
1788
+ "content": "<|reserved_special_token_218|>",
1789
+ "lstrip": false,
1790
+ "normalized": false,
1791
+ "rstrip": false,
1792
+ "single_word": false,
1793
+ "special": true
1794
+ },
1795
+ "128224": {
1796
+ "content": "<|reserved_special_token_219|>",
1797
+ "lstrip": false,
1798
+ "normalized": false,
1799
+ "rstrip": false,
1800
+ "single_word": false,
1801
+ "special": true
1802
+ },
1803
+ "128225": {
1804
+ "content": "<|reserved_special_token_220|>",
1805
+ "lstrip": false,
1806
+ "normalized": false,
1807
+ "rstrip": false,
1808
+ "single_word": false,
1809
+ "special": true
1810
+ },
1811
+ "128226": {
1812
+ "content": "<|reserved_special_token_221|>",
1813
+ "lstrip": false,
1814
+ "normalized": false,
1815
+ "rstrip": false,
1816
+ "single_word": false,
1817
+ "special": true
1818
+ },
1819
+ "128227": {
1820
+ "content": "<|reserved_special_token_222|>",
1821
+ "lstrip": false,
1822
+ "normalized": false,
1823
+ "rstrip": false,
1824
+ "single_word": false,
1825
+ "special": true
1826
+ },
1827
+ "128228": {
1828
+ "content": "<|reserved_special_token_223|>",
1829
+ "lstrip": false,
1830
+ "normalized": false,
1831
+ "rstrip": false,
1832
+ "single_word": false,
1833
+ "special": true
1834
+ },
1835
+ "128229": {
1836
+ "content": "<|reserved_special_token_224|>",
1837
+ "lstrip": false,
1838
+ "normalized": false,
1839
+ "rstrip": false,
1840
+ "single_word": false,
1841
+ "special": true
1842
+ },
1843
+ "128230": {
1844
+ "content": "<|reserved_special_token_225|>",
1845
+ "lstrip": false,
1846
+ "normalized": false,
1847
+ "rstrip": false,
1848
+ "single_word": false,
1849
+ "special": true
1850
+ },
1851
+ "128231": {
1852
+ "content": "<|reserved_special_token_226|>",
1853
+ "lstrip": false,
1854
+ "normalized": false,
1855
+ "rstrip": false,
1856
+ "single_word": false,
1857
+ "special": true
1858
+ },
1859
+ "128232": {
1860
+ "content": "<|reserved_special_token_227|>",
1861
+ "lstrip": false,
1862
+ "normalized": false,
1863
+ "rstrip": false,
1864
+ "single_word": false,
1865
+ "special": true
1866
+ },
1867
+ "128233": {
1868
+ "content": "<|reserved_special_token_228|>",
1869
+ "lstrip": false,
1870
+ "normalized": false,
1871
+ "rstrip": false,
1872
+ "single_word": false,
1873
+ "special": true
1874
+ },
1875
+ "128234": {
1876
+ "content": "<|reserved_special_token_229|>",
1877
+ "lstrip": false,
1878
+ "normalized": false,
1879
+ "rstrip": false,
1880
+ "single_word": false,
1881
+ "special": true
1882
+ },
1883
+ "128235": {
1884
+ "content": "<|reserved_special_token_230|>",
1885
+ "lstrip": false,
1886
+ "normalized": false,
1887
+ "rstrip": false,
1888
+ "single_word": false,
1889
+ "special": true
1890
+ },
1891
+ "128236": {
1892
+ "content": "<|reserved_special_token_231|>",
1893
+ "lstrip": false,
1894
+ "normalized": false,
1895
+ "rstrip": false,
1896
+ "single_word": false,
1897
+ "special": true
1898
+ },
1899
+ "128237": {
1900
+ "content": "<|reserved_special_token_232|>",
1901
+ "lstrip": false,
1902
+ "normalized": false,
1903
+ "rstrip": false,
1904
+ "single_word": false,
1905
+ "special": true
1906
+ },
1907
+ "128238": {
1908
+ "content": "<|reserved_special_token_233|>",
1909
+ "lstrip": false,
1910
+ "normalized": false,
1911
+ "rstrip": false,
1912
+ "single_word": false,
1913
+ "special": true
1914
+ },
1915
+ "128239": {
1916
+ "content": "<|reserved_special_token_234|>",
1917
+ "lstrip": false,
1918
+ "normalized": false,
1919
+ "rstrip": false,
1920
+ "single_word": false,
1921
+ "special": true
1922
+ },
1923
+ "128240": {
1924
+ "content": "<|reserved_special_token_235|>",
1925
+ "lstrip": false,
1926
+ "normalized": false,
1927
+ "rstrip": false,
1928
+ "single_word": false,
1929
+ "special": true
1930
+ },
1931
+ "128241": {
1932
+ "content": "<|reserved_special_token_236|>",
1933
+ "lstrip": false,
1934
+ "normalized": false,
1935
+ "rstrip": false,
1936
+ "single_word": false,
1937
+ "special": true
1938
+ },
1939
+ "128242": {
1940
+ "content": "<|reserved_special_token_237|>",
1941
+ "lstrip": false,
1942
+ "normalized": false,
1943
+ "rstrip": false,
1944
+ "single_word": false,
1945
+ "special": true
1946
+ },
1947
+ "128243": {
1948
+ "content": "<|reserved_special_token_238|>",
1949
+ "lstrip": false,
1950
+ "normalized": false,
1951
+ "rstrip": false,
1952
+ "single_word": false,
1953
+ "special": true
1954
+ },
1955
+ "128244": {
1956
+ "content": "<|reserved_special_token_239|>",
1957
+ "lstrip": false,
1958
+ "normalized": false,
1959
+ "rstrip": false,
1960
+ "single_word": false,
1961
+ "special": true
1962
+ },
1963
+ "128245": {
1964
+ "content": "<|reserved_special_token_240|>",
1965
+ "lstrip": false,
1966
+ "normalized": false,
1967
+ "rstrip": false,
1968
+ "single_word": false,
1969
+ "special": true
1970
+ },
1971
+ "128246": {
1972
+ "content": "<|reserved_special_token_241|>",
1973
+ "lstrip": false,
1974
+ "normalized": false,
1975
+ "rstrip": false,
1976
+ "single_word": false,
1977
+ "special": true
1978
+ },
1979
+ "128247": {
1980
+ "content": "<|reserved_special_token_242|>",
1981
+ "lstrip": false,
1982
+ "normalized": false,
1983
+ "rstrip": false,
1984
+ "single_word": false,
1985
+ "special": true
1986
+ },
1987
+ "128248": {
1988
+ "content": "<|reserved_special_token_243|>",
1989
+ "lstrip": false,
1990
+ "normalized": false,
1991
+ "rstrip": false,
1992
+ "single_word": false,
1993
+ "special": true
1994
+ },
1995
+ "128249": {
1996
+ "content": "<|reserved_special_token_244|>",
1997
+ "lstrip": false,
1998
+ "normalized": false,
1999
+ "rstrip": false,
2000
+ "single_word": false,
2001
+ "special": true
2002
+ },
2003
+ "128250": {
2004
+ "content": "<|reserved_special_token_245|>",
2005
+ "lstrip": false,
2006
+ "normalized": false,
2007
+ "rstrip": false,
2008
+ "single_word": false,
2009
+ "special": true
2010
+ },
2011
+ "128251": {
2012
+ "content": "<|reserved_special_token_246|>",
2013
+ "lstrip": false,
2014
+ "normalized": false,
2015
+ "rstrip": false,
2016
+ "single_word": false,
2017
+ "special": true
2018
+ },
2019
+ "128252": {
2020
+ "content": "<|reserved_special_token_247|>",
2021
+ "lstrip": false,
2022
+ "normalized": false,
2023
+ "rstrip": false,
2024
+ "single_word": false,
2025
+ "special": true
2026
+ },
2027
+ "128253": {
2028
+ "content": "<|reserved_special_token_248|>",
2029
+ "lstrip": false,
2030
+ "normalized": false,
2031
+ "rstrip": false,
2032
+ "single_word": false,
2033
+ "special": true
2034
+ },
2035
+ "128254": {
2036
+ "content": "<|reserved_special_token_249|>",
2037
+ "lstrip": false,
2038
+ "normalized": false,
2039
+ "rstrip": false,
2040
+ "single_word": false,
2041
+ "special": true
2042
+ },
2043
+ "128255": {
2044
+ "content": "<|reserved_special_token_250|>",
2045
+ "lstrip": false,
2046
+ "normalized": false,
2047
+ "rstrip": false,
2048
+ "single_word": false,
2049
+ "special": true
2050
+ }
2051
+ },
2052
+ "bos_token": "<|begin_of_text|>",
2053
+ "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% else %}{{ eos_token }}{% endif %}",
2054
+ "clean_up_tokenization_spaces": true,
2055
+ "eos_token": "<|end_of_text|>",
2056
+ "model_input_names": [
2057
+ "input_ids",
2058
+ "token_type_ids",
2059
+ "attention_mask",
2060
+ "images"
2061
+ ],
2062
+ "model_max_length": 2048,
2063
+ "tokenizer_class": "PreTrainedTokenizerFast"
2064
+ }
util.py ADDED
@@ -0,0 +1,472 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Optional, Tuple, Union
2
+
3
+ import torch
4
+ from einops import rearrange
5
+ import torch.nn.functional as F
6
+
7
+ import triton
8
+ import triton.language as tl
9
+
10
+
11
+ @triton.jit
12
+ def rotary_kernel(
13
+ OUT,
14
+ X,
15
+ COS,
16
+ SIN,
17
+ CU_SEQLENS,
18
+ SEQLEN_OFFSETS,
19
+ seqlen,
20
+ nheads,
21
+ rotary_dim,
22
+ seqlen_ro,
23
+ CACHE_KEY_SEQLEN,
24
+ # strides
25
+ stride_out_batch,
26
+ stride_out_nheads,
27
+ stride_out_seqlen,
28
+ stride_out_headdim,
29
+ stride_x_batch,
30
+ stride_x_nheads,
31
+ stride_x_seqlen,
32
+ stride_x_headdim,
33
+ BLOCK_K: tl.constexpr,
34
+ IS_SEQLEN_OFFSETS_TENSOR: tl.constexpr,
35
+ IS_VARLEN: tl.constexpr,
36
+ INTERLEAVED: tl.constexpr,
37
+ CONJUGATE: tl.constexpr,
38
+ BLOCK_M: tl.constexpr,
39
+ ):
40
+ pid_m = tl.program_id(axis=0)
41
+ pid_batch = tl.program_id(axis=1)
42
+ pid_head = tl.program_id(axis=2)
43
+ rotary_dim_half = rotary_dim // 2
44
+
45
+ if not IS_VARLEN:
46
+ X = X + pid_batch * stride_x_batch + pid_head * stride_x_nheads
47
+ OUT = OUT + pid_batch * stride_out_batch + pid_head * stride_out_nheads
48
+ COS = COS + pid_batch * seqlen_ro * rotary_dim_half
49
+ SIN = SIN + pid_batch * seqlen_ro * rotary_dim_half
50
+ else:
51
+ start_idx = tl.load(CU_SEQLENS + pid_batch)
52
+ seqlen = tl.load(CU_SEQLENS + pid_batch + 1) - start_idx
53
+ X = X + start_idx * stride_x_seqlen + pid_head * stride_x_nheads
54
+ OUT = OUT + start_idx * stride_out_seqlen + pid_head * stride_out_nheads
55
+
56
+ if pid_m * BLOCK_M >= seqlen:
57
+ return
58
+ rm = pid_m * BLOCK_M + tl.arange(0, BLOCK_M)
59
+ if not IS_SEQLEN_OFFSETS_TENSOR:
60
+ rm_cs = rm + SEQLEN_OFFSETS
61
+ else:
62
+ rm_cs = rm + tl.load(SEQLEN_OFFSETS + pid_batch)
63
+ rk = tl.arange(0, BLOCK_K)
64
+ rk_half = tl.arange(0, BLOCK_K // 2)
65
+
66
+ if not INTERLEAVED:
67
+ # Load the 1st and 2nd halves of X, do calculation, then store to 1st and 2nd halves of OUT
68
+ X = X + (rm[:, None] * stride_x_seqlen + rk_half[None, :] * stride_x_headdim)
69
+ COS = COS + (rm_cs[:, None] * rotary_dim_half + rk_half[None, :])
70
+ SIN = SIN + (rm_cs[:, None] * rotary_dim_half + rk_half[None, :])
71
+ cos = tl.load(
72
+ COS, mask=(rm_cs[:, None] < seqlen_ro) & (rk_half[None, :] < rotary_dim_half), other=1.0
73
+ )
74
+ sin = tl.load(
75
+ SIN, mask=(rm_cs[:, None] < seqlen_ro) & (rk_half[None, :] < rotary_dim_half), other=0.0
76
+ )
77
+ x0 = tl.load(
78
+ X, mask=(rm[:, None] < seqlen) & (rk_half[None, :] < rotary_dim_half), other=0.0
79
+ )
80
+ x1 = tl.load(
81
+ X + rotary_dim_half * stride_x_headdim,
82
+ mask=(rm[:, None] < seqlen) & (rk_half[None, :] < rotary_dim_half),
83
+ other=0.0,
84
+ )
85
+ if CONJUGATE:
86
+ sin = -sin
87
+ o0 = x0 * cos - x1 * sin
88
+ o1 = x0 * sin + x1 * cos
89
+ # write back result
90
+ OUT = OUT + (rm[:, None] * stride_out_seqlen + rk_half[None, :] * stride_out_headdim)
91
+ tl.store(OUT, o0, mask=(rm[:, None] < seqlen) & (rk_half[None, :] < rotary_dim_half))
92
+ tl.store(
93
+ OUT + rotary_dim_half * stride_out_headdim,
94
+ o1,
95
+ mask=(rm[:, None] < seqlen) & (rk_half[None, :] < rotary_dim_half),
96
+ )
97
+ else:
98
+ # We don't want to load X[0, 2, 4, ...] and X[1, 3, 5, ...] separately since both are slow.
99
+ # Instead, we load x0 = X[0, 1, 2, 3, ...] and x1 = X[1, 0, 3, 2, ...].
100
+ # Loading x0 will be fast but x1 will be slow.
101
+ # Then we load cos = COS[0, 0, 1, 1, ...] and sin = SIN[0, 0, 1, 1, ...].
102
+ # Then we do the calculation and use tl.where to pick put the right outputs for the even
103
+ # and for the odd indices.
104
+ rk_swap = rk + ((rk + 1) % 2) * 2 - 1 # 1, 0, 3, 2, 5, 4, ...
105
+ rk_repeat = tl.arange(0, BLOCK_K) // 2
106
+ X0 = X + (rm[:, None] * stride_x_seqlen + rk[None, :] * stride_x_headdim)
107
+ X1 = X + (rm[:, None] * stride_x_seqlen + rk_swap[None, :] * stride_x_headdim)
108
+ COS = COS + (rm_cs[:, None] * rotary_dim_half + rk_repeat[None, :])
109
+ SIN = SIN + (rm_cs[:, None] * rotary_dim_half + rk_repeat[None, :])
110
+ cos = tl.load(
111
+ COS,
112
+ mask=(rm_cs[:, None] < seqlen_ro) & (rk_repeat[None, :] < rotary_dim_half),
113
+ other=1.0,
114
+ ).to(tl.float32)
115
+ sin = tl.load(
116
+ SIN,
117
+ mask=(rm_cs[:, None] < seqlen_ro) & (rk_repeat[None, :] < rotary_dim_half),
118
+ other=0.0,
119
+ ).to(tl.float32)
120
+ x0 = tl.load(X0, mask=(rm[:, None] < seqlen) & (rk[None, :] < rotary_dim), other=0.0).to(
121
+ tl.float32
122
+ )
123
+ x1 = tl.load(
124
+ X1, mask=(rm[:, None] < seqlen) & (rk_swap[None, :] < rotary_dim), other=0.0
125
+ ).to(tl.float32)
126
+ if CONJUGATE:
127
+ sin = -sin
128
+ x0_cos = x0 * cos
129
+ x1_sin = x1 * sin
130
+ out = tl.where(rk[None, :] % 2 == 0, x0_cos - x1_sin, x0_cos + x1_sin)
131
+ OUT = OUT + (rm[:, None] * stride_out_seqlen + rk[None, :] * stride_out_headdim)
132
+ tl.store(OUT, out, mask=(rm[:, None] < seqlen) & (rk[None, :] < rotary_dim))
133
+
134
+
135
+ def apply_rotary(
136
+ x: torch.Tensor,
137
+ cos: torch.Tensor,
138
+ sin: torch.Tensor,
139
+ seqlen_offsets: Union[int, torch.Tensor] = 0,
140
+ cu_seqlens: Optional[torch.Tensor] = None,
141
+ max_seqlen: Optional[int] = None,
142
+ interleaved=False,
143
+ inplace=False,
144
+ conjugate=False,
145
+ ) -> torch.Tensor:
146
+ """
147
+ Arguments:
148
+ x: (batch, seqlen, nheads, headdim) if cu_seqlens is None
149
+ else (total_seqlen, nheads, headdim).
150
+ cos: (seqlen_ro, rotary_dim / 2)
151
+ sin: (seqlen_ro, rotary_dim / 2)
152
+ seqlen_offsets: integer or integer tensor of size (batch,)
153
+ cu_seqlens: (batch + 1,) or None
154
+ max_seqlen: int
155
+ Returns:
156
+ y: (batch, seqlen, nheads, headdim)
157
+ """
158
+
159
+ batch, nheads, seqlen, headdim = x.shape
160
+
161
+ batch_ro, seqlen_ro, rotary_dim = cos.shape
162
+
163
+ assert batch == batch_ro
164
+ assert sin.shape == cos.shape
165
+ rotary_dim *= 2
166
+ assert rotary_dim <= headdim, "rotary_dim must be <= headdim"
167
+ assert headdim <= 256, "Only support headdim <= 256"
168
+
169
+ assert seqlen_ro >= seqlen, "seqlen_ro must be >= seqlen"
170
+
171
+ assert (
172
+ cos.dtype == sin.dtype
173
+ ), f"cos and sin must have the same dtype, got {cos.dtype} and {sin.dtype}"
174
+ assert (
175
+ x.dtype == cos.dtype
176
+ ), f"Input and cos/sin must have the same dtype, got {x.dtype} and {cos.dtype}"
177
+
178
+ cos, sin = cos.contiguous(), sin.contiguous()
179
+ if isinstance(seqlen_offsets, torch.Tensor):
180
+ assert seqlen_offsets.shape == (batch,)
181
+ assert seqlen_offsets.dtype in [torch.int32, torch.int64]
182
+ seqlen_offsets = seqlen_offsets.contiguous()
183
+ else:
184
+ assert seqlen_offsets + seqlen <= seqlen_ro
185
+
186
+ output = torch.empty_like(x) if not inplace else x
187
+ if rotary_dim < headdim and not inplace:
188
+ output[..., rotary_dim:].copy_(x[..., rotary_dim:])
189
+
190
+ BLOCK_K = (
191
+ 32
192
+ if rotary_dim <= 32
193
+ else (64 if rotary_dim <= 64 else (128 if rotary_dim <= 128 else 256))
194
+ )
195
+ grid = lambda META: (triton.cdiv(seqlen, META["BLOCK_M"]), batch, nheads) # noqa
196
+ BLOCK_M = 4 if interleaved else (8 if rotary_dim <= 64 else 4)
197
+
198
+ # Need this, otherwise Triton tries to launch from cuda:0 and we get
199
+ # ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?)
200
+ with torch.cuda.device(x.device.index):
201
+ rotary_kernel[grid](
202
+ output, # data ptrs
203
+ x,
204
+ cos,
205
+ sin,
206
+ cu_seqlens,
207
+ seqlen_offsets,
208
+ seqlen, # shapes
209
+ nheads,
210
+ rotary_dim,
211
+ seqlen_ro,
212
+ seqlen // 128, # key for triton cache (limit number of compilations)
213
+ output.stride(0), # batch_strides
214
+ output.stride(-3), # nheads_stride
215
+ output.stride(-2), # seqlen_stride
216
+ output.stride(-1), # headdim_stride
217
+ x.stride(0), # batch_strides
218
+ x.stride(-3), # nheads stride
219
+ x.stride(-2), # seqlen stride
220
+ x.stride(-1), # headdim stride
221
+ BLOCK_K,
222
+ isinstance(seqlen_offsets, torch.Tensor),
223
+ False,
224
+ interleaved,
225
+ conjugate,
226
+ BLOCK_M,
227
+ )
228
+ return output
229
+
230
+
231
+ class ApplyRotaryEmb(torch.autograd.Function):
232
+ @staticmethod
233
+ def forward(
234
+ ctx,
235
+ x,
236
+ cos,
237
+ sin,
238
+ interleaved=False,
239
+ inplace=False,
240
+ seqlen_offsets: Union[int, torch.Tensor] = 0,
241
+ cu_seqlens: Optional[torch.Tensor] = None,
242
+ max_seqlen: Optional[int] = None,
243
+ ):
244
+ out = apply_rotary(
245
+ x,
246
+ cos,
247
+ sin,
248
+ seqlen_offsets=seqlen_offsets,
249
+ cu_seqlens=cu_seqlens,
250
+ max_seqlen=max_seqlen,
251
+ interleaved=interleaved,
252
+ inplace=inplace,
253
+ )
254
+ if isinstance(seqlen_offsets, int):
255
+ ctx.save_for_backward(cos, sin, cu_seqlens) # Can't save int with save_for_backward
256
+ ctx.seqlen_offsets = seqlen_offsets
257
+ else:
258
+ ctx.save_for_backward(cos, sin, cu_seqlens, seqlen_offsets)
259
+ ctx.seqlen_offsets = None
260
+ ctx.interleaved = interleaved
261
+ ctx.inplace = inplace
262
+ ctx.max_seqlen = max_seqlen
263
+ return out if not inplace else x
264
+
265
+ @staticmethod
266
+ def backward(ctx, do):
267
+ seqlen_offsets = ctx.seqlen_offsets
268
+ if seqlen_offsets is None:
269
+ cos, sin, cu_seqlens, seqlen_offsets = ctx.saved_tensors
270
+ else:
271
+ cos, sin, cu_seqlens = ctx.saved_tensors
272
+ # TD [2023-09-02]: For some reason Triton (2.0.0.post1) errors with
273
+ # "[CUDA]: invalid device context", and cloning makes it work. Idk why. Triton 2.1.0 works.
274
+ if not ctx.interleaved and not ctx.inplace:
275
+ do = do.clone()
276
+ dx = apply_rotary(
277
+ do,
278
+ cos,
279
+ sin,
280
+ seqlen_offsets=seqlen_offsets,
281
+ cu_seqlens=cu_seqlens,
282
+ max_seqlen=ctx.max_seqlen,
283
+ interleaved=ctx.interleaved,
284
+ inplace=ctx.inplace,
285
+ conjugate=True,
286
+ )
287
+ return dx, None, None, None, None, None, None, None
288
+
289
+
290
+ def apply_rotary_emb(
291
+ x,
292
+ cos,
293
+ sin,
294
+ interleaved=False,
295
+ inplace=False,
296
+ seqlen_offsets: Union[int, torch.Tensor] = 0,
297
+ cu_seqlens: Optional[torch.Tensor] = None,
298
+ max_seqlen: Optional[int] = None,
299
+ ):
300
+ """
301
+ Arguments:
302
+ x: (batch_size, seqlen, nheads, headdim) if cu_seqlens is None
303
+ else (total_seqlen, nheads, headdim)
304
+ cos, sin: (seqlen_rotary, rotary_dim / 2)
305
+ interleaved: if True, rotate pairs of even and odd dimensions (GPT-J style) instead
306
+ of 1st half and 2nd half (GPT-NeoX style).
307
+ inplace: if True, apply rotary embedding in-place.
308
+ seqlen_offsets: (batch_size,) or int. Each sequence in x is shifted by this amount.
309
+ Most commonly used in inference when we have KV cache.
310
+ cu_seqlens: (batch + 1,) or None
311
+ max_seqlen: int
312
+ Return:
313
+ out: (batch_size, seqlen, nheads, headdim) if cu_seqlens is None
314
+ else (total_seqlen, nheads, headdim)
315
+ rotary_dim must be <= headdim
316
+ Apply rotary embedding to the first rotary_dim of x.
317
+ """
318
+ return ApplyRotaryEmb.apply(
319
+ x, cos, sin, interleaved, inplace, seqlen_offsets, cu_seqlens, max_seqlen
320
+ )
321
+
322
+
323
+ # For backward compatibility
324
+ apply_rotary_emb_func = apply_rotary_emb
325
+
326
+
327
+ class FastRotaryEmbedding(torch.nn.Module):
328
+ """
329
+ The rotary position embeddings from RoFormer_ (Su et. al).
330
+ A crucial insight from the method is that the query and keys are
331
+ transformed by rotation matrices which depend on the relative positions.
332
+
333
+ Other implementations are available in the Rotary Transformer repo_ and in
334
+ GPT-NeoX_, GPT-NeoX was an inspiration
335
+
336
+ .. _RoFormer: https://arxiv.org/abs/2104.09864
337
+ .. _repo: https://github.com/ZhuiyiTechnology/roformer
338
+ .. _GPT-NeoX: https://github.com/EleutherAI/gpt-neox
339
+
340
+ If scale_base is not None, this implements XPos (Sun et al., https://arxiv.org/abs/2212.10554).
341
+ A recommended value for scale_base is 512: https://github.com/HazyResearch/flash-attention/issues/96
342
+ Reference: https://github.com/sunyt32/torchscale/blob/main/torchscale/component/xpos_relative_position.py
343
+ """
344
+
345
+ def __init__(
346
+ self,
347
+ dim: int,
348
+ base=10000,
349
+ interleaved=False,
350
+ scale_base=None,
351
+ pos_idx_in_fp32=True,
352
+ device=None,
353
+ ):
354
+ """
355
+ interleaved: if True, rotate pairs of even and odd dimensions (GPT-J style) instead
356
+ of 1st half and 2nd half (GPT-NeoX style).
357
+ pos_idx_in_fp32: if True, the position indices [0.0, ..., seqlen - 1] are in fp32,
358
+ otherwise they might be in lower precision.
359
+ This option was added because previously (before 2023-07-02), when we construct
360
+ the position indices, we use the dtype of self.inv_freq. In most cases this would
361
+ be fp32, but if the model is trained in pure bf16 (not mixed precision), then
362
+ self.inv_freq would be bf16, and the position indices are also in bf16.
363
+ Because of the limited precision of bf16 (e.g. 1995.0 is rounded to 2000.0), the
364
+ embeddings for some positions will coincide.
365
+ To maintain compatibility with models previously trained in pure bf16,
366
+ we add this option.
367
+ """
368
+ super().__init__()
369
+ self.dim = dim
370
+ self.base = base
371
+ self.pos_idx_in_fp32 = pos_idx_in_fp32
372
+ # Generate and save the inverse frequency buffer (non trainable)
373
+ inv_freq = self._compute_inv_freq(device)
374
+ self.register_buffer("inv_freq", inv_freq)
375
+ self.interleaved = interleaved
376
+ self.scale_base = scale_base
377
+ scale = (
378
+ (torch.arange(0, dim, 2, device=device, dtype=torch.float32) + 0.4 * dim) / (1.4 * dim)
379
+ if scale_base is not None
380
+ else None
381
+ )
382
+ self.register_buffer("scale", scale, persistent=False)
383
+
384
+ self._seq_len_cached = 0
385
+ self._cos_cached = None
386
+ self._sin_cached = None
387
+ self._cos_k_cached = None
388
+ self._sin_k_cached = None
389
+ self.cos = None
390
+ self.sin = None
391
+
392
+ def _compute_inv_freq(self, device=None):
393
+ return 1.0 / (
394
+ self.base
395
+ ** (torch.arange(0, self.dim, 2, device=device) / self.dim)
396
+ # ** (torch.arange(0, self.dim, 2, device=device).float() / self.dim)
397
+ )
398
+
399
+ def _update_cos_sin_cache(self, seqlen, position_id, device=None, dtype=None):
400
+
401
+ if (
402
+ seqlen > self._seq_len_cached
403
+ ):
404
+ self._seq_len_cached = seqlen
405
+ # We want fp32 here, not self.inv_freq.dtype, since the model could be loaded in bf16
406
+ # And the output of arange can be quite large, so bf16 would lose a lot of precision.
407
+ # However, for compatibility reason, we add an option to use the dtype of self.inv_freq.
408
+ if self.pos_idx_in_fp32:
409
+ t = torch.arange(seqlen, device=device, dtype=torch.float32)
410
+ # We want fp32 here as well since inv_freq will be multiplied with t, and the output
411
+ # will be large. Having it in bf16 will lose a lot of precision and cause the
412
+ # cos & sin output to change significantly.
413
+ # We want to recompute self.inv_freq if it was not loaded in fp32
414
+ if self.inv_freq.dtype != torch.float32:
415
+ inv_freq = self._compute_inv_freq(device=device)
416
+ else:
417
+ inv_freq = self.inv_freq
418
+ else:
419
+ t = torch.arange(seqlen, device=device, dtype=self.inv_freq.dtype)
420
+ inv_freq = self.inv_freq
421
+ freqs = torch.einsum("i,j->ij", t, inv_freq)
422
+ if self.scale is None:
423
+ self._cos_cached = torch.cos(freqs).to(dtype)
424
+ self._sin_cached = torch.sin(freqs).to(dtype)
425
+
426
+ else:
427
+ power = (
428
+ torch.arange(seqlen, dtype=self.scale.dtype, device=self.scale.device)
429
+ - seqlen // 2
430
+ ) / self.scale_base
431
+ scale = self.scale.to(device=power.device) ** rearrange(power, "s -> s 1")
432
+ # We want the multiplication by scale to happen in fp32
433
+ self._cos_cached = (torch.cos(freqs) * scale).to(dtype)
434
+ self._sin_cached = (torch.sin(freqs) * scale).to(dtype)
435
+ self._cos_k_cached = (torch.cos(freqs) / scale).to(dtype)
436
+ self._sin_k_cached = (torch.sin(freqs) / scale).to(dtype)
437
+
438
+ def forward(
439
+ self,
440
+ q: torch.Tensor,
441
+ k: torch.Tensor,
442
+ position_ids: torch.Tensor,
443
+ max_seqlen,
444
+ ) -> Tuple[torch.Tensor, torch.Tensor]:
445
+ """
446
+ q: (batch, nheads, seqlen, headdim)
447
+ k: (batch, nheads, seqlen, headdim)
448
+ position_id: (batch, seqlen)
449
+ max_seqlen: int
450
+ layer_id: int
451
+ only if layer_id == 0, then update cons and sin
452
+ Apply rotary embedding *inplace* to q k.
453
+ """
454
+
455
+ self._update_cos_sin_cache(max_seqlen, position_ids, device=q.device, dtype=q.dtype)
456
+ cos, sin = F.embedding(position_ids, self._cos_cached), F.embedding(position_ids, self._sin_cached)
457
+
458
+ q = apply_rotary_emb_func(
459
+ q,
460
+ cos,
461
+ sin,
462
+ interleaved=self.interleaved,
463
+ inplace=True
464
+ )
465
+ k = apply_rotary_emb_func(
466
+ k,
467
+ cos,
468
+ sin,
469
+ interleaved=self.interleaved,
470
+ inplace=True
471
+ )
472
+ return q, k
visual.py ADDED
@@ -0,0 +1,177 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from torch import nn
3
+ from argparse import Namespace
4
+ import torch.nn.functional as F
5
+ from transformers.activations import ACT2FN
6
+ import math
7
+
8
+ def standard_attention(query_layer, key_layer, value_layer, scaling_attention_score=True):
9
+ if scaling_attention_score:
10
+ query_layer = query_layer / math.sqrt(query_layer.shape[-1])
11
+ attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
12
+
13
+ attention_probs = F.softmax(attention_scores, dim=-1)
14
+
15
+ context_layer = torch.matmul(attention_probs, value_layer)
16
+ return context_layer
17
+
18
+ def attention_fn_default(query_layer, key_layer, value_layer, scaling_attention_score=True):
19
+ # expand head dim to query dim, if necessary
20
+ # only useful for multi-query attention
21
+ batch_size, num_query_heads = query_layer.shape[:2] # [b, np, s, hn]
22
+ num_kv_heads = key_layer.shape[1] # [b, np, s, hn]
23
+ key_layer = key_layer.unsqueeze(2).expand(-1, -1, num_query_heads//num_kv_heads, -1, -1).contiguous().view(batch_size, num_query_heads, *key_layer.shape[2:])
24
+ value_layer = value_layer.unsqueeze(2).expand(-1, -1, num_query_heads//num_kv_heads, -1, -1).contiguous().view(batch_size, num_query_heads, *value_layer.shape[2:])
25
+
26
+ if int(torch.__version__.split('.')[0]) >= 2 and scaling_attention_score:
27
+ # Pytorch 2.0 attention uses very much memory if attention_mask is float, and has NaN bug if attention_mask is None.
28
+ attn_output = torch.nn.functional.scaled_dot_product_attention(
29
+ query_layer, key_layer, value_layer,
30
+ attn_mask=None,
31
+ dropout_p=0.,
32
+ is_causal=False
33
+ )
34
+ return attn_output
35
+ else:
36
+ return standard_attention(
37
+ query_layer, key_layer, value_layer, scaling_attention_score=scaling_attention_score
38
+ )
39
+
40
+ class PatchEmbedding(nn.Module):
41
+ def __init__(self, config):
42
+ super().__init__()
43
+ self.proj = nn.Conv2d(config.in_channels, config.hidden_size, kernel_size=config.patch_size, stride=config.patch_size)
44
+ self.cls_embedding = nn.Parameter(torch.zeros(1, config.hidden_size))
45
+ self.position_embedding = nn.Embedding(config.num_positions, config.hidden_size)
46
+
47
+ def forward(self, images: "tensor(B, C, H, W)") -> "tensor(B, L, D)":
48
+ x = self.proj(images)
49
+ x = x.flatten(2).transpose(1, 2)
50
+ cls_token = self.cls_embedding.expand(x.shape[0], -1, -1)
51
+ x = torch.cat((cls_token, x), dim=1)
52
+ x += self.position_embedding.weight.unsqueeze(0)
53
+ return x
54
+
55
+
56
+ class Attention(nn.Module):
57
+ def __init__(self, config):
58
+ super().__init__()
59
+ self.num_heads = config.num_heads
60
+ head_dim = config.hidden_size // config.num_heads
61
+ self.scale = head_dim ** -0.5
62
+ self.query_key_value = nn.Linear(config.hidden_size, config.hidden_size * 3)
63
+ self.dense = nn.Linear(config.hidden_size, config.hidden_size)
64
+ self.output_dropout = torch.nn.Dropout(config.dropout_prob)
65
+
66
+ def forward(self, x: "tensor(B, L, D)") -> "tensor(B, L, D)":
67
+ B, L, _ = x.shape
68
+ qkv = self.query_key_value(x)
69
+ qkv = qkv.reshape(B, L, 3, self.num_heads, -1).permute(2, 0, 3, 1, 4) # 3, B, H, L, D
70
+ q, k, v = qkv[0], qkv[1], qkv[2]
71
+
72
+ out = attention_fn_default(
73
+ q, k, v
74
+ ) # 24 x 3 x
75
+ out = out.transpose(2, 1)
76
+ # breakpoint()
77
+ # output = self.dense(out.reshape(B, L, -1))
78
+ output = self.dense(out.view(B, L, -1))
79
+ output = self.output_dropout(output)
80
+ return output
81
+
82
+ def attention(self, q, k, v):
83
+ attn_weights = torch.matmul(q * self.scale, k.transpose(-2, -1))
84
+ attn_weights = attn_weights.softmax(dim=-1)
85
+ output = torch.matmul(attn_weights, v)
86
+ return output
87
+
88
+
89
+ class MLP(nn.Module):
90
+ def __init__(self, config):
91
+ super().__init__()
92
+ self.config = config
93
+ self.activation_fn = ACT2FN[config.hidden_act]
94
+ self.fc1 = nn.Linear(config.hidden_size, config.intermediate_size)
95
+ self.fc2 = nn.Linear(config.intermediate_size, config.hidden_size)
96
+
97
+ def forward(self, x: torch.Tensor) -> torch.Tensor:
98
+ x = self.fc1(x)
99
+ x = self.activation_fn(x)
100
+ x = self.fc2(x)
101
+ return x
102
+
103
+
104
+ class TransformerLayer(nn.Module):
105
+ def __init__(self, config):
106
+ super().__init__()
107
+ self.input_layernorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
108
+ self.attention = Attention(config)
109
+ self.mlp = MLP(config)
110
+ self.post_attention_layernorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
111
+
112
+ def forward(self, hidden_states):
113
+ attention_input = hidden_states
114
+ attention_output = self.input_layernorm(self.attention(attention_input))
115
+ hidden_states = attention_input + attention_output
116
+ mlp_input = hidden_states
117
+ mlp_output = self.post_attention_layernorm(self.mlp(mlp_input))
118
+ output = mlp_input + mlp_output
119
+ return output
120
+
121
+
122
+ class Transformer(nn.Module):
123
+ def __init__(self, config):
124
+ super().__init__()
125
+ self.layers = nn.ModuleList([TransformerLayer(config) for _ in range(config.num_hidden_layers)])
126
+
127
+ def forward(self, hidden_states):
128
+ for layer_module in self.layers:
129
+ hidden_states = layer_module(hidden_states)
130
+ return hidden_states
131
+
132
+
133
+ class GLU(nn.Module):
134
+ def __init__(self, config, in_features):
135
+ super().__init__()
136
+ self.linear_proj = nn.Linear(in_features, config.hidden_size, bias=False)
137
+ self.norm1 = nn.LayerNorm(config.hidden_size)
138
+ self.act1 = nn.GELU()
139
+ self.act2 = nn.functional.silu
140
+ self.dense_h_to_4h = nn.Linear(config.hidden_size, config.intermediate_size, bias=False)
141
+ self.gate_proj = nn.Linear(config.hidden_size, config.intermediate_size, bias=False)
142
+ self.dense_4h_to_h = nn.Linear(config.intermediate_size, config.hidden_size, bias=False)
143
+
144
+ def forward(self, x):
145
+ x = self.linear_proj(x)
146
+ x = self.act1(self.norm1(x))
147
+ x = self.act2(self.gate_proj(x)) * self.dense_h_to_4h(x)
148
+ x = self.dense_4h_to_h(x)
149
+ return x
150
+
151
+
152
+ class EVA2CLIPModel(nn.Module):
153
+ def __init__(self, config):
154
+ super().__init__()
155
+ vision_config = Namespace(**config.vision_config)
156
+ self.patch_embedding = PatchEmbedding(vision_config)
157
+ self.transformer = Transformer(vision_config)
158
+ self.linear_proj = GLU(config, in_features=vision_config.hidden_size)
159
+ self.conv = nn.Conv2d(in_channels=vision_config.hidden_size, out_channels=vision_config.hidden_size, kernel_size=2, stride=2)
160
+ self.boi = nn.Parameter(torch.zeros(1, 1, config.hidden_size))
161
+ self.eoi = nn.Parameter(torch.zeros(1, 1, config.hidden_size))
162
+
163
+ def forward(self, images: "tensor(B, C, H, W)") -> "tensor(B, L, D)":
164
+ x = self.patch_embedding(images)
165
+ x = self.transformer(x)
166
+ x = x[:, 1:]
167
+ b, s, h = x.shape
168
+ grid_size = int(s**0.5)
169
+ x = x.view(b, grid_size, grid_size, h).permute(0, 3, 1, 2)
170
+ x = self.conv(x)
171
+
172
+ x = x.flatten(2).transpose(1, 2)
173
+ x = self.linear_proj(x)
174
+ boi = self.boi.expand(x.shape[0], -1, -1)
175
+ eoi = self.eoi.expand(x.shape[0], -1, -1)
176
+ x = torch.cat((boi, x, eoi), dim=1)
177
+ return x