add model files

Browse files

Files changed (13) hide show

README.md +50 -90
config.json +31 -0
merges.txt +0 -0
pytorch_model-00001-of-00005.bin +3 -0
pytorch_model-00002-of-00005.bin +3 -0
pytorch_model-00003-of-00005.bin +3 -0
pytorch_model-00004-of-00005.bin +3 -0
pytorch_model-00005-of-00005.bin +3 -0
pytorch_model.bin.index.json +628 -0
special_tokens_map.json +5 -0
tokenizer.json +0 -0
tokenizer_config.json +10 -0
vocab.json +0 -0

README.md CHANGED Viewed

@@ -1,17 +1,23 @@
 ---
 language:
 - en
-library_name: nemo
 datasets:
-- Writer-data
 tags:
 - text generation
 - pytorch
 - causal-lm
-license: other
 ---
-# Palmyra-20B
 <style>
 img {
@@ -19,113 +25,67 @@ img {
 }
 </style>
 ## Model Description
-Model description
-Palmyra was primarily pretrained with English text, there is still a trace amount of non-English data present within the training corpus that was accessed through CommonCrawl. A causal language modeling (CLM) objective was utilized during the process of the model's pretraining. Similar to GPT-3, Palmyra is a member of the same family of models that only contain a decoder. As a result, it was pretrained utilizing the objective of self-supervised causal language modeling.
-Palmyra uses the prompts and general experimental setup from GPT-3 in order to conduct its evaluation in accordance with GPT-3. Read the official paper if you want more information about this.
-## Getting started
-### Step 1: Install NeMo and dependencies
-You will need to install NVIDIA Apex and NeMo.
-```
-git clone https://github.com/ericharper/apex.git
-cd apex
-git checkout nm_v1.11.0
-pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_layer_norm" --global-option="--distributed_adam" --global-option="--deprecated_fused_adam" ./
-```
-```
-pip install nemo_toolkit['nlp']==1.11.0
-```
-### Step 2: Launch eval server
-**Note.** The example below launches a model variant with Tensor Parallelism (TP) of 4 and Pipeline Parallelism (PP) of 1 on two GPUs.
-```
-git clone https://github.com/NVIDIA/NeMo.git
-cd NeMo/examples/nlp/language_modeling
-git checkout v1.11.0
-python megatron_gpt_eval.py gpt_model_file=palmyara_gpt_20b.nemo server=True tensor_model_parallel_size=4 trainer.devices=4
-```
-### Step 3: Send prompts to your model!
 ```python
-import json
-import requests
-port_num = 5555
-headers = {"Content-Type": "application/json"}
-def request_data(data):
-    resp = requests.put('http://localhost:{}/generate'.format(port_num),
-                        data=json.dumps(data),
-                        headers=headers)
-    sentences = resp.json()['sentences']
-    return sentences
-data = {
-    "sentences": ["Tell me an interesting fact about space travel."]*1,
-    "tokens_to_generate": 50,
-    "temperature": 1.0,
-    "add_BOS": True,
-    "top_k": 0,
-    "top_p": 0.9,
-    "greedy": False,
-    "all_probs": False,
-    "repetition_penalty": 1.2,
-    "min_tokens_to_generate": 2,
-}
-sentences = request_data(data)
-print(sentences)
-```
-## Training Data
-|    part        | MassiveText (sampling) | tokens (B) |  url                                 | sampling ratio |
-|:---------------|-----------------------:|:----------:| :------------------------------------|---------------:|
-| mc4 filtered   |   MassiveWeb (48%)     |   1331     |   gs://mc4/final/web                 |  58%           |
-| TrustedWeb     |   -                    |    -       |   gs://mc4/final/trusted_web         |  -             |
-| realnews       |   News (10%)           |    21      |   gs://mc4/final/news                |  10%           |
-| c4             |   c4  (10%)            |    -       |   gs://mc4/final/c4                  |  -             |
-| wikipedia-40B  |   wikipedia  (2%)      |    2       |   gs://mc4/final/wikipedia           |  5%            |
-| github         |   github  (3%)         |    -       |   gs://mc4/final/github              |  -             |
-| books          |   books (27%)          |    24      |   gs://mc4/final/books               |  27%           |
-| youtube        |   -                    |    -       |   gs://mc4/final/youtube             |  -             |
-## Evaluation results
-*Zero-shot performance.* Evaluated using [LM Evaluation Test Suite from AI21](https://github.com/AI21Labs/lm-evaluation)
-| ARC-Challenge	| ARC-Easy | RACE-middle | RACE-high | Winogrande | RTE | BoolQA | HellaSwag | PiQA |
-| ------------- | -------- | ----------- | --------- | ---------- | --- | ------ | --------- | ---- |
-| 0.3976        | 0.5566  | 0.5007       | 0.4171    | 0.6133     | 0.5812 | 0.6356 | 0.6298 | 0.7492 |
-## Limitations
-The model was trained on the data originally crawled from the Internet. This data contains toxic language and societal biases. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts.
-## References
-[1] [Improving Language Understanding by Generative Pre-Training](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf)
-[2] [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/pdf/1909.08053.pdf)
-[3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
-[4] [The Pile: An 800GB Dataset of Diverse Text for Language Modeling](https://arxiv.org/abs/2101.00027)
-## Licence
-License to use this model is covered by the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). By downloading the public and release version of the model, you accept the terms and conditions of the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.

 ---
 language:
 - en
 datasets:
+- English
 tags:
 - text generation
 - pytorch
 - causal-lm
+- Writer-data
+- gpt
+- NeMo
+pipeline_tag: text-generation
+library_name: transformers
 ---
+license: cc-by-4.0
+# Palmyra Large 20B
 <style>
 img {
 }
 </style>
+|[![Model architecture](https://img.shields.io/badge/Model%20Arch-Transformer%20Decoder-green)](#model-architecture)|[![Model size](https://img.shields.io/badge/Params-20B-green)](#model-architecture)|[![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)
 ## Model Description
+Palmyra Large was primarily pre-trained with English text. Note that there is still a trace amount of non-English data present within the training corpus that was accessed through CommonCrawl. A causal language modeling (CLM) objective was utilized during the process of the model's pretraining. Similar to GPT-3, Palmyra Large is a member of the same family of models that only contain a decoder. As a result, it was pre-trained utilizing the objective of self-supervised causal language modeling. Palmyra Large uses the prompts and general experimental setup from GPT-3 in order to conduct its evaluation per GPT-3.
+### Use case
+Palmyra Large is extremely powerful while being extremely fast. This model excels at many nuanced tasks such as sentiment classification and summarization.
+## Training data
+Palmyra Large (20b) was trained on Writer’s custom dataset.
+## Intended Use and Limitations
+Palmyra Large learns an inner representation of the English language that can be used to extract features useful for downstream tasks. However, the model is best at what it was pre-trained for which is generating text from a prompt.
+### How to use
+This model can be easily loaded using the `AutoModelForCausalLM` functionality:
 ```python
+import os
+from transformers import AutoModelForCausalLM, AutoTokenizer
+# set HF environment variable
+auth_token = os.environ.get("HF_TOKEN", True)
+model = AutoModelForCausalLM.from_pretrained(
+    "Writer/palmyra-large",
+    device_map="auto",
+    torch_dtype=torch.float16,
+    use_auth_token=auth_token,
+)
+tokenizer = AutoTokenizer.from_pretrained(
+    "Writer/palmyra-large", use_auth_token=auth_token
+)
+```
+### Limitations and Biases
+Palmyra Large’s core functionality is to take a string of text and predict the next token. While language models are widely used for other tasks, there are many unknowns in this work. When prompting Palmyra Large, keep in mind that the next statistically likely token is not always the token that produces the most "accurate" text. Never rely on Palmyra Large to produce factually correct results.
+Palmyra Large was trained on Writer’s custom data. As with all language models, it is difficult to predict how Palmyra Large will respond to specific prompts, and offensive content may appear unexpectedly. We recommend that the outputs be curated or filtered by humans before they are released, both to censor undesirable content and to improve the quality of the results.
+## Citation and Related Information
+To cite this model:
+```
+@misc{Palmyra,
+  author = {Writer Engineering team},
+  title = {{Palmyra-Large Parameter Autoregressive Language Model}},
+  howpublished = {\url{https://dev.writer.com}},
+  year = 2023,
+  month = March
+}
+```

config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "activation_function": "gelu",
+  "architectures": [
+    "GPT2LMHeadModel"
+  ],
+  "attn_pdrop": 0.1,
+  "bos_token_id": 50256,
+  "embd_pdrop": 0.1,
+  "eos_token_id": 50256,
+  "initializer_range": 0.008165,
+  "layer_norm_epsilon": 1e-05,
+  "model_type": "gpt2",
+  "n_embd": 6144,
+  "n_head": 48,
+  "n_inner": 24576,
+  "n_layer": 44,
+  "n_positions": 2048,
+  "reorder_and_upcast_attn": false,
+  "resid_pdrop": 0.1,
+  "scale_attn_by_inverse_layer_idx": false,
+  "scale_attn_weights": true,
+  "summary_activation": null,
+  "summary_first_dropout": 0.1,
+  "summary_proj_to_labels": true,
+  "summary_type": "cls_index",
+  "summary_use_proj": true,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.21.2",
+  "use_cache": true,
+  "vocab_size": 50257
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

pytorch_model-00001-of-00005.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5ca5ead6219aec9de6f9998a02d96a0255a5ddb124e30bad96aed3759f84a38c
+size 9796367749

pytorch_model-00002-of-00005.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3e0a268a5649a7bb92f21277b38b5394f7e18e206dfc591dc7e649d6c65eac02
+size 9749334187

pytorch_model-00003-of-00005.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1738929b9c9619c1d46052e325f8e097ec0f44c2d71d3958e6e6954b3e2ed27c
+size 9757711879

pytorch_model-00004-of-00005.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f027d0036aa4638596266872f61b64aba4d001a2491e7ffcf0d7297b014f99c7
+size 9984215989

pytorch_model-00005-of-00005.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4a122fd074db804adc8a424367b02b1db77ecfefefcc29a02b9096c641e48540
+size 2211684561

pytorch_model.bin.index.json ADDED Viewed

	@@ -0,0 +1,628 @@

+{
+  "metadata": {
+    "total_size": 41499099224
+  },
+  "weight_map": {
+    "lm_head.weight": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.0.attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.0.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.0.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.0.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.0.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.0.attn.masked_bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.0.ln_1.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.0.ln_1.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.0.ln_2.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.0.ln_2.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.0.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.0.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.0.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.0.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.1.attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.1.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.1.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.1.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.1.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.1.attn.masked_bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.1.ln_1.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.1.ln_1.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.1.ln_2.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.1.ln_2.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.1.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.1.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.1.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.1.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.10.attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.10.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.10.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.10.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.10.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.10.attn.masked_bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.10.ln_1.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.10.ln_1.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.10.ln_2.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.10.ln_2.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.10.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.10.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.10.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.10.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.11.attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.11.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.11.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.11.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.11.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.11.attn.masked_bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.11.ln_1.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.11.ln_1.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.11.ln_2.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.11.ln_2.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.11.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.11.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.11.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.11.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.12.attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.12.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.12.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.12.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.12.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.12.attn.masked_bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.12.ln_1.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.12.ln_1.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.12.ln_2.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.12.ln_2.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.12.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.12.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.12.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.12.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.13.attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.13.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.13.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.13.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.13.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.13.attn.masked_bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.13.ln_1.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.13.ln_1.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.13.ln_2.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.13.ln_2.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.13.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.13.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.13.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.13.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.14.attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.14.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.14.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.14.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.14.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.14.attn.masked_bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.14.ln_1.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.14.ln_1.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.14.ln_2.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.14.ln_2.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.14.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.14.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.14.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.14.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.15.attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.15.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.15.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.15.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.15.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.15.attn.masked_bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.15.ln_1.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.15.ln_1.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.15.ln_2.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.15.ln_2.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.15.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.15.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.15.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.15.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.16.attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.16.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.16.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.16.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.16.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.16.attn.masked_bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.16.ln_1.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.16.ln_1.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.16.ln_2.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.16.ln_2.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.16.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.16.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.16.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.16.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.17.attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.17.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.17.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.17.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.17.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.17.attn.masked_bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.17.ln_1.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.17.ln_1.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.17.ln_2.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.17.ln_2.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.17.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.17.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.17.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.17.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.18.attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.18.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.18.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.18.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.18.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.18.attn.masked_bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.18.ln_1.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.18.ln_1.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.18.ln_2.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.18.ln_2.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.18.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.18.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.18.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.18.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.19.attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.19.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.19.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.19.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.19.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.19.attn.masked_bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.19.ln_1.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.19.ln_1.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.19.ln_2.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.19.ln_2.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.19.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.19.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.19.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.19.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.2.attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.2.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.2.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.2.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.2.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.2.attn.masked_bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.2.ln_1.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.2.ln_1.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.2.ln_2.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.2.ln_2.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.2.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.2.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.2.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.2.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.20.attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.20.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.20.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.20.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.20.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.20.attn.masked_bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.20.ln_1.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.20.ln_1.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.20.ln_2.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.20.ln_2.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.20.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.20.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.20.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.20.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.21.attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.21.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.21.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.21.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.21.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.21.attn.masked_bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.21.ln_1.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.21.ln_1.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.21.ln_2.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.21.ln_2.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.21.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.21.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.21.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.21.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.22.attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.22.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.22.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.22.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.22.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.22.attn.masked_bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.22.ln_1.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.22.ln_1.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.22.ln_2.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.22.ln_2.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.22.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.22.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.22.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.22.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.23.attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.23.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.23.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.23.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.23.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.23.attn.masked_bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.23.ln_1.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.23.ln_1.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.23.ln_2.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.23.ln_2.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.23.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.23.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.23.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.23.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.24.attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.24.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.24.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.24.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.24.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.24.attn.masked_bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.24.ln_1.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.24.ln_1.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.24.ln_2.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.24.ln_2.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.24.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.24.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.24.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.24.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.25.attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.25.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.25.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.25.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.25.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.25.attn.masked_bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.25.ln_1.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.25.ln_1.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.25.ln_2.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.25.ln_2.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.25.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.25.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.25.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.25.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.26.attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.26.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.26.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.26.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.26.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.26.attn.masked_bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.26.ln_1.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.26.ln_1.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.26.ln_2.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.26.ln_2.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.26.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.26.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.26.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.26.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.27.attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.27.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.27.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.27.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.27.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.27.attn.masked_bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.27.ln_1.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.27.ln_1.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.27.ln_2.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.27.ln_2.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.27.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.27.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.27.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.27.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.28.attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.28.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.28.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.28.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.28.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.28.attn.masked_bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.28.ln_1.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.28.ln_1.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.28.ln_2.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.28.ln_2.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.28.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.28.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.28.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.28.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.29.attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.29.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.29.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.29.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.29.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.29.attn.masked_bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.29.ln_1.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.29.ln_1.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.29.ln_2.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.29.ln_2.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.29.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.29.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.29.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.29.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.3.attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.3.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.3.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.3.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.3.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.3.attn.masked_bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.3.ln_1.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.3.ln_1.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.3.ln_2.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.3.ln_2.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.3.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.3.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.3.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.3.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.30.attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.30.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.30.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.30.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.30.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.30.attn.masked_bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.30.ln_1.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.30.ln_1.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.30.ln_2.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.30.ln_2.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.30.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.30.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.30.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.30.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.31.attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.31.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.31.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.31.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.31.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.31.attn.masked_bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.31.ln_1.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.31.ln_1.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.31.ln_2.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.31.ln_2.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.31.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.31.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.31.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.31.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.32.attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.32.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.32.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.32.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.32.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.32.attn.masked_bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.32.ln_1.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.32.ln_1.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.32.ln_2.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.32.ln_2.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.32.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.32.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.32.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.32.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.33.attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.33.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.33.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.33.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.33.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.33.attn.masked_bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.33.ln_1.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.33.ln_1.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.33.ln_2.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.33.ln_2.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.33.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.33.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.33.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.33.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.34.attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.34.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.34.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.34.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.34.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.34.attn.masked_bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.34.ln_1.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.34.ln_1.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.34.ln_2.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.34.ln_2.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.34.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.34.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.34.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.34.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.35.attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.35.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.35.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.35.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.35.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.35.attn.masked_bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.35.ln_1.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.35.ln_1.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.35.ln_2.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.35.ln_2.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.35.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.35.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.35.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.35.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.36.attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.36.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.36.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.36.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.36.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.36.attn.masked_bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.36.ln_1.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.36.ln_1.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.36.ln_2.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.36.ln_2.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.36.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.36.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.36.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.36.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.37.attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.37.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.37.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.37.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.37.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.37.attn.masked_bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.37.ln_1.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.37.ln_1.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.37.ln_2.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.37.ln_2.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.37.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.37.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.37.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.37.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.38.attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.38.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.38.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.38.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.38.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.38.attn.masked_bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.38.ln_1.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.38.ln_1.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.38.ln_2.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.38.ln_2.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.38.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.38.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.38.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.38.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.39.attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.39.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.39.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.39.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.39.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.39.attn.masked_bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.39.ln_1.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.39.ln_1.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.39.ln_2.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.39.ln_2.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.39.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.39.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.39.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.39.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.4.attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.4.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.4.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.4.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.4.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.4.attn.masked_bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.4.ln_1.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.4.ln_1.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.4.ln_2.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.4.ln_2.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.4.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.4.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.4.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.4.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.40.attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.40.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.40.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.40.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.40.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.40.attn.masked_bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.40.ln_1.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.40.ln_1.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.40.ln_2.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.40.ln_2.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.40.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.40.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.40.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.40.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.41.attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.41.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.41.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.41.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.41.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.41.attn.masked_bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.41.ln_1.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.41.ln_1.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.41.ln_2.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.41.ln_2.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.41.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.41.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.41.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.41.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.42.attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.42.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.42.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.42.attn.c_proj.bias": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.42.attn.c_proj.weight": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.42.attn.masked_bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.42.ln_1.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.42.ln_1.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.42.ln_2.bias": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.42.ln_2.weight": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.42.mlp.c_fc.bias": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.42.mlp.c_fc.weight": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.42.mlp.c_proj.bias": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.42.mlp.c_proj.weight": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.43.attn.bias": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.43.attn.c_attn.bias": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.43.attn.c_attn.weight": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.43.attn.c_proj.bias": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.43.attn.c_proj.weight": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.43.attn.masked_bias": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.43.ln_1.bias": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.43.ln_1.weight": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.43.ln_2.bias": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.43.ln_2.weight": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.43.mlp.c_fc.bias": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.43.mlp.c_fc.weight": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.43.mlp.c_proj.bias": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.43.mlp.c_proj.weight": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.5.attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.5.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.5.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.5.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.5.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.5.attn.masked_bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.5.ln_1.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.5.ln_1.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.5.ln_2.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.5.ln_2.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.5.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.5.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.5.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.5.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.6.attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.6.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.6.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.6.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.6.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.6.attn.masked_bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.6.ln_1.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.6.ln_1.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.6.ln_2.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.6.ln_2.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.6.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.6.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.6.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.6.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.7.attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.7.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.7.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.7.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.7.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.7.attn.masked_bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.7.ln_1.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.7.ln_1.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.7.ln_2.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.7.ln_2.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.7.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.7.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.7.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.7.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.8.attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.8.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.8.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.8.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.8.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.8.attn.masked_bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.8.ln_1.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.8.ln_1.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.8.ln_2.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.8.ln_2.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.8.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.8.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.8.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.8.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.9.attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.9.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.9.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.9.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.9.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.9.attn.masked_bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.9.ln_1.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.9.ln_1.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.9.ln_2.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.9.ln_2.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.9.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.9.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.9.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.9.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.ln_f.bias": "pytorch_model-00005-of-00005.bin",
+    "transformer.ln_f.weight": "pytorch_model-00005-of-00005.bin",
+    "transformer.wpe.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.wte.weight": "pytorch_model-00001-of-00005.bin"
+  }
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,5 @@

+{
+  "bos_token": "<|endoftext|>",
+  "eos_token": "<|endoftext|>",
+  "unk_token": "<|endoftext|>"
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "add_prefix_space": false,
+  "bos_token": "<|endoftext|>",
+  "eos_token": "<|endoftext|>",
+  "model_max_length": 1024,
+  "name_or_path": "gpt2",
+  "special_tokens_map_file": null,
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>"
+}

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff