AnanyaRaval
commited on
Commit
•
c3b2fa4
1
Parent(s):
faf906d
Delete model
Browse files- README.md +0 -169
- config.json +0 -26
- generation_config.json +0 -6
- model-00001-of-00003.safetensors +0 -3
- model-00002-of-00003.safetensors +0 -3
- model-00003-of-00003.safetensors +0 -3
- model.safetensors.index.json +0 -298
- special_tokens_map.json +0 -24
- tokenizer.json +0 -0
- tokenizer.model +0 -3
- tokenizer_config.json +0 -42
README.md
DELETED
@@ -1,169 +0,0 @@
|
|
1 |
-
---
|
2 |
-
datasets:
|
3 |
-
- newsmediabias/Bias-DeBiased
|
4 |
-
---
|
5 |
-
|
6 |
-
# Overview
|
7 |
-
The Mistral Debiaser Model is designed to enhance content moderation and fairness across various digital platforms.
|
8 |
-
It leverages the safety features and debiasing capabilities of the Mistral Model, organizations can ensure more equitable user interactions and content delivery.
|
9 |
-
|
10 |
-
## Key Applications
|
11 |
-
|
12 |
-
### Content Moderation
|
13 |
-
- **Social Media Platforms:** Improve the fairness of content visibility and reduce the spread of biased information by applying the Debiaser to filter and adjust feeds algorithmically.
|
14 |
-
- **Online Forums and Discussion Boards:** Automatically moderate discussions to prevent the amplification of biased or harmful content while promoting balanced viewpoints.
|
15 |
-
|
16 |
-
### Data Sanitization
|
17 |
-
- **Machine Learning Training Data:** Use the Debiaser to preprocess datasets, removing or reducing bias in training data, which can help in developing more fair and accurate machine learning models.
|
18 |
-
- **Research Data Analysis:** Ensure research findings are based on data that reflects debiased information, leading to more reliable and generalizable results.
|
19 |
-
|
20 |
-
### Educational Tools
|
21 |
-
- **E-Learning Platforms:** Integrate the Debiaser to provide educational content that is free from cultural, racial, or gender biases, supporting a more inclusive learning environment.
|
22 |
-
- **Textbook Review and Development:** Assist publishers in reviewing and revising educational materials to eliminate historical and implicit biases.
|
23 |
-
|
24 |
-
## Benefits
|
25 |
-
- **Increased Fairness:** By systematically reducing bias, platforms and data become more equitable.
|
26 |
-
- **Enhanced User Trust:** Users are more likely to trust platforms that actively combat bias and promote fairness.
|
27 |
-
- **Compliance with Regulations:** Helps organizations meet legal and ethical standards related to bias and discrimination in digital services.
|
28 |
-
|
29 |
-
Through this Mistral Debiaser Model, organizations can significantly enhance their operational fairness and safety, contributing to a more inclusive digital ecosystem.
|
30 |
-
|
31 |
-
|
32 |
-
|
33 |
-
# Hyperparameters
|
34 |
-
```
|
35 |
-
# Model Configuration
|
36 |
-
|
37 |
-
Below is a detailed table of the configuration parameters used for training and evaluating our model. This setup ensures optimal performance and efficient resource management during the model's operation.
|
38 |
-
|
39 |
-
## Configuration Parameters
|
40 |
-
|
41 |
-
| Parameter | Description | Value |
|
42 |
-
|-------------------------------|--------------------------------------------------------------------|---------------------------|
|
43 |
-
| `per_device_train_batch_size` | Batch size per GPU for training | 8 /16 |
|
44 |
-
| `per_device_eval_batch_size` | Batch size per GPU for evaluation | 4 /8 |
|
45 |
-
| `gradient_accumulation_steps` | Number of update steps to accumulate the gradients for | 1 |
|
46 |
-
| `gradient_checkpointing` | Enable gradient checkpointing | True |
|
47 |
-
| `max_grad_norm` | Maximum gradient normal (gradient clipping) | 0.3 |
|
48 |
-
| `learning_rate` | Initial learning rate (AdamW optimizer) | 2e-05 |
|
49 |
-
| `weight_decay` | Weight decay to apply to all layers except bias/LayerNorm weights | 0.001 |
|
50 |
-
| `optim` | Optimizer to use | "paged_adamw_8bit" |
|
51 |
-
| `lr_scheduler_type` | Learning rate schedule | "constant" |
|
52 |
-
| `max_steps` | Number of training steps (overrides num_train_epochs) | -1 |
|
53 |
-
| `warmup_ratio` | Ratio of steps for a linear warmup | 0.05 |
|
54 |
-
| `group_by_length` | Group sequences into batches with same length | True |
|
55 |
-
| `save_steps` | Save checkpoint every X updates steps | 25 |
|
56 |
-
| `logging_steps` | Log every X updates steps | 25 |
|
57 |
-
| `max_seq_length` | Maximum sequence length to use | 2048 |
|
58 |
-
| `packing` | Pack multiple short examples in the same input sequence | False |
|
59 |
-
| `device_map` | Load the entire model on the GPU specified | {"": 0} |
|
60 |
-
```
|
61 |
-
These settings are crucial for understanding the behavior of the model under training and evaluation phases. Each parameter is selected to balance performance, accuracy, and computational efficiency.
|
62 |
-
|
63 |
-
|
64 |
-
# Inference
|
65 |
-
```
|
66 |
-
from huggingface_hub import login
|
67 |
-
login(token=" enter your key")
|
68 |
-
from transformers import AutoTokenizer
|
69 |
-
import transformers
|
70 |
-
import torch
|
71 |
-
|
72 |
-
#Load model directly
|
73 |
-
from transformers import AutoTokenizer, AutoModelForCausalLM
|
74 |
-
|
75 |
-
tokenizer = AutoTokenizer.from_pretrained("newsmediabias/Mistral-unbiaser")
|
76 |
-
model = AutoModelForCausalLM.from_pretrained("newsmediabias/Mistral-unbiaser")
|
77 |
-
|
78 |
-
|
79 |
-
pipeline = transformers.pipeline(
|
80 |
-
"text-generation",
|
81 |
-
model=model, tokenizer=tokenizer ,
|
82 |
-
torch_dtype=torch.float16,
|
83 |
-
device_map="auto",
|
84 |
-
)
|
85 |
-
|
86 |
-
#Sample System Message:
|
87 |
-
sys_message = "Task: Please generate a bias-free version of the text provided, ensuring it's free from biases related to age, gender, politics, social nuances, or economic background, while keeping it roughly the same length as the original:"
|
88 |
-
#Sample Instruction:
|
89 |
-
instruction = "Instruction: As a helpful, respectful and trustworthy debiasing assistant, your task is to receive a text and return its unbiased version, without adding any unrelated content or additional outputs:"
|
90 |
-
|
91 |
-
|
92 |
-
#sample text data; replace this with your own data
|
93 |
-
texts = [
|
94 |
-
"Women are just too emotional to be leaders.",
|
95 |
-
"All young people are lazy and addicted to their phones.",
|
96 |
-
"People from that country are always dishonest and corrupt.",
|
97 |
-
"Men are inherently better at science and math than women.",
|
98 |
-
"The elderly are just burdens to society; they contribute nothing.",
|
99 |
-
"All poor people are lazy and don't want to work.",
|
100 |
-
"People who follow that religion are all terrorists.",
|
101 |
-
"Immigrants are taking all our jobs and ruining the country."
|
102 |
-
]
|
103 |
-
|
104 |
-
|
105 |
-
def get_debiased_sequence(prompt):
|
106 |
-
"""
|
107 |
-
Generate a debiased version of the provided text using the debiasing pipeline.
|
108 |
-
Args:
|
109 |
-
- prompt (str): Text to be debiased.
|
110 |
-
Returns:
|
111 |
-
- str: Debiased text.
|
112 |
-
"""
|
113 |
-
input_text = f"<s> <<SYS>> {instruction} <</SYS>> [INST]{prompt} [/INST]"
|
114 |
-
sequences = pipeline(
|
115 |
-
input_text,
|
116 |
-
do_sample=True,
|
117 |
-
top_k=10,
|
118 |
-
num_return_sequences=1,
|
119 |
-
eos_token_id=tokenizer.eos_token_id,
|
120 |
-
max_length=len(prompt)+100,
|
121 |
-
)
|
122 |
-
res = sequences[0]['generated_text']
|
123 |
-
result_part = res.split('[/INST]')[-1]
|
124 |
-
clean_result = ''.join(c for c in result_part if c.isprintable())
|
125 |
-
return clean_result.strip()
|
126 |
-
|
127 |
-
debiased_text = [get_debiased_sequence(text) for text in texts]
|
128 |
-
debiased_text
|
129 |
-
```
|
130 |
-
# Results
|
131 |
-
We conducted evaluations on both biased and unbiased datasets to assess the performance of our model. Here are the average scores for each dataset:
|
132 |
-
|
133 |
-
## Biased Evaluation Set
|
134 |
-
|
135 |
-
- **Original Bias Score:** 0.3318
|
136 |
-
- **Original Toxicity Score:** 0.4055
|
137 |
-
## Unbiased Evaluation Set
|
138 |
-
|
139 |
-
- **Bias Score:** 0.11248
|
140 |
-
- **Toxicity Score:** 0.0690
|
141 |
-
- **Knowledge Retention Score:** 0.8231
|
142 |
-
|
143 |
-
These results indicate that our model performs significantly better in reducing bias and toxicity, while retaining a high level of knowledge in the unbiased eval set.
|
144 |
-
|
145 |
-
|
146 |
-
|
147 |
-
|
148 |
-
If you use this tool in your research or project, please cite it using the following BibTeX entry:
|
149 |
-
|
150 |
-
```bibtex
|
151 |
-
@misc{newsmediabias2024,
|
152 |
-
title = {newsmediabias/Mistral-unbiaser},
|
153 |
-
author = {Ananya Raval, Veronica Chatrath, Shaina Raza},
|
154 |
-
year = 2024,
|
155 |
-
howpublished = {Web},
|
156 |
-
url = { https://huggingface.co/newsmediabias/Mistral-unbiaser},
|
157 |
-
note = {Accessed: your-date-of-access}
|
158 |
-
}
|
159 |
-
```
|
160 |
-
## Contact Us
|
161 |
-
|
162 |
-
For any questions, suggestions, or contributions, please feel free to reach out via email. We welcome feedback on the tool and are open to collaborative opportunities.
|
163 |
-
|
164 |
-
Please email us at:
|
165 |
-
|
166 |
-
- **Shaina Raza:** shaina.raza@utoronto.ca
|
167 |
-
|
168 |
-
When contacting us, please include a relevant subject line and provide details or context regarding your inquiry to help us respond more effectively.
|
169 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
config.json
DELETED
@@ -1,26 +0,0 @@
|
|
1 |
-
{
|
2 |
-
"_name_or_path": "mistralai/Mistral-7B-Instruct-v0.2",
|
3 |
-
"architectures": [
|
4 |
-
"MistralForCausalLM"
|
5 |
-
],
|
6 |
-
"attention_dropout": 0.0,
|
7 |
-
"bos_token_id": 1,
|
8 |
-
"eos_token_id": 2,
|
9 |
-
"hidden_act": "silu",
|
10 |
-
"hidden_size": 4096,
|
11 |
-
"initializer_range": 0.02,
|
12 |
-
"intermediate_size": 14336,
|
13 |
-
"max_position_embeddings": 32768,
|
14 |
-
"model_type": "mistral",
|
15 |
-
"num_attention_heads": 32,
|
16 |
-
"num_hidden_layers": 32,
|
17 |
-
"num_key_value_heads": 8,
|
18 |
-
"rms_norm_eps": 1e-05,
|
19 |
-
"rope_theta": 1000000.0,
|
20 |
-
"sliding_window": null,
|
21 |
-
"tie_word_embeddings": false,
|
22 |
-
"torch_dtype": "float16",
|
23 |
-
"transformers_version": "4.41.0.dev0",
|
24 |
-
"use_cache": true,
|
25 |
-
"vocab_size": 32000
|
26 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
generation_config.json
DELETED
@@ -1,6 +0,0 @@
|
|
1 |
-
{
|
2 |
-
"_from_model_config": true,
|
3 |
-
"bos_token_id": 1,
|
4 |
-
"eos_token_id": 2,
|
5 |
-
"transformers_version": "4.41.0.dev0"
|
6 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
model-00001-of-00003.safetensors
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:1d0f25d9ec02cb512933aa651bf2cabce994a4fb0fd1548727d3ee921b128a8d
|
3 |
-
size 4943162240
|
|
|
|
|
|
|
|
model-00002-of-00003.safetensors
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:e5db3c48a9537c7e8d40acf9a05512bfbce3853f64e9e84dc4f19b494170b705
|
3 |
-
size 4999819232
|
|
|
|
|
|
|
|
model-00003-of-00003.safetensors
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:cf2a3ba3e16cd4a54aa15f3b5a24e445e18775f39133b81562fd4cc045ec0172
|
3 |
-
size 4540516256
|
|
|
|
|
|
|
|
model.safetensors.index.json
DELETED
@@ -1,298 +0,0 @@
|
|
1 |
-
{
|
2 |
-
"metadata": {
|
3 |
-
"total_size": 14483464192
|
4 |
-
},
|
5 |
-
"weight_map": {
|
6 |
-
"lm_head.weight": "model-00003-of-00003.safetensors",
|
7 |
-
"model.embed_tokens.weight": "model-00001-of-00003.safetensors",
|
8 |
-
"model.layers.0.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
9 |
-
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
10 |
-
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
11 |
-
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
12 |
-
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
13 |
-
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
14 |
-
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
15 |
-
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
16 |
-
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
17 |
-
"model.layers.1.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
18 |
-
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
19 |
-
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
20 |
-
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
21 |
-
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
22 |
-
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
23 |
-
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
24 |
-
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
25 |
-
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
26 |
-
"model.layers.10.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
27 |
-
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
28 |
-
"model.layers.10.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
29 |
-
"model.layers.10.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
30 |
-
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
31 |
-
"model.layers.10.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
32 |
-
"model.layers.10.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
33 |
-
"model.layers.10.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
34 |
-
"model.layers.10.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
35 |
-
"model.layers.11.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
36 |
-
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
37 |
-
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
38 |
-
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
39 |
-
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
40 |
-
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
41 |
-
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
42 |
-
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
43 |
-
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
44 |
-
"model.layers.12.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
45 |
-
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
46 |
-
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
47 |
-
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
48 |
-
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
49 |
-
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
50 |
-
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
51 |
-
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
52 |
-
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
53 |
-
"model.layers.13.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
54 |
-
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
55 |
-
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
56 |
-
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
57 |
-
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
58 |
-
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
59 |
-
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
60 |
-
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
61 |
-
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
62 |
-
"model.layers.14.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
63 |
-
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
64 |
-
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
65 |
-
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
66 |
-
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
67 |
-
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
68 |
-
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
69 |
-
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
70 |
-
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
71 |
-
"model.layers.15.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
72 |
-
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
73 |
-
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
74 |
-
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
75 |
-
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
76 |
-
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
77 |
-
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
78 |
-
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
79 |
-
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
80 |
-
"model.layers.16.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
81 |
-
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
82 |
-
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
83 |
-
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
84 |
-
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
85 |
-
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
86 |
-
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
87 |
-
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
88 |
-
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
89 |
-
"model.layers.17.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
90 |
-
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
91 |
-
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
92 |
-
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
93 |
-
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
94 |
-
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
95 |
-
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
96 |
-
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
97 |
-
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
98 |
-
"model.layers.18.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
99 |
-
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
100 |
-
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
101 |
-
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
102 |
-
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
103 |
-
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
104 |
-
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
105 |
-
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
106 |
-
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
107 |
-
"model.layers.19.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
108 |
-
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
109 |
-
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
110 |
-
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
111 |
-
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
112 |
-
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
113 |
-
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
114 |
-
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
115 |
-
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
116 |
-
"model.layers.2.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
117 |
-
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
118 |
-
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
119 |
-
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
120 |
-
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
121 |
-
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
122 |
-
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
123 |
-
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
124 |
-
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
125 |
-
"model.layers.20.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
126 |
-
"model.layers.20.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
127 |
-
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
128 |
-
"model.layers.20.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
129 |
-
"model.layers.20.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
130 |
-
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
131 |
-
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
132 |
-
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
133 |
-
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
134 |
-
"model.layers.21.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
135 |
-
"model.layers.21.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
136 |
-
"model.layers.21.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
137 |
-
"model.layers.21.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
138 |
-
"model.layers.21.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
139 |
-
"model.layers.21.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
140 |
-
"model.layers.21.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
141 |
-
"model.layers.21.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
142 |
-
"model.layers.21.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
143 |
-
"model.layers.22.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
144 |
-
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
145 |
-
"model.layers.22.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
146 |
-
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
147 |
-
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
148 |
-
"model.layers.22.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
149 |
-
"model.layers.22.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
150 |
-
"model.layers.22.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
151 |
-
"model.layers.22.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
152 |
-
"model.layers.23.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
153 |
-
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
154 |
-
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
155 |
-
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
156 |
-
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
157 |
-
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
158 |
-
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
159 |
-
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
160 |
-
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
161 |
-
"model.layers.24.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
162 |
-
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
163 |
-
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
164 |
-
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
165 |
-
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
166 |
-
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
167 |
-
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
168 |
-
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
169 |
-
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
170 |
-
"model.layers.25.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
171 |
-
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
172 |
-
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
173 |
-
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
174 |
-
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
175 |
-
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
176 |
-
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
177 |
-
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
178 |
-
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
179 |
-
"model.layers.26.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
180 |
-
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
181 |
-
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
182 |
-
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
183 |
-
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
184 |
-
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
185 |
-
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
186 |
-
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
187 |
-
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
188 |
-
"model.layers.27.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
189 |
-
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
190 |
-
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
191 |
-
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
192 |
-
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
193 |
-
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
194 |
-
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
195 |
-
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
196 |
-
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
197 |
-
"model.layers.28.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
198 |
-
"model.layers.28.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
199 |
-
"model.layers.28.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
200 |
-
"model.layers.28.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
201 |
-
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
202 |
-
"model.layers.28.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
203 |
-
"model.layers.28.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
204 |
-
"model.layers.28.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
205 |
-
"model.layers.28.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
206 |
-
"model.layers.29.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
207 |
-
"model.layers.29.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
208 |
-
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
209 |
-
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
210 |
-
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
211 |
-
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
212 |
-
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
213 |
-
"model.layers.29.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
214 |
-
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
215 |
-
"model.layers.3.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
216 |
-
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
217 |
-
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
218 |
-
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
219 |
-
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
220 |
-
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
221 |
-
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
222 |
-
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
223 |
-
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
224 |
-
"model.layers.30.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
225 |
-
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
226 |
-
"model.layers.30.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
227 |
-
"model.layers.30.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
228 |
-
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
229 |
-
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
230 |
-
"model.layers.30.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
231 |
-
"model.layers.30.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
232 |
-
"model.layers.30.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
233 |
-
"model.layers.31.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
234 |
-
"model.layers.31.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
235 |
-
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
236 |
-
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
237 |
-
"model.layers.31.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
238 |
-
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
239 |
-
"model.layers.31.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
240 |
-
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
241 |
-
"model.layers.31.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
242 |
-
"model.layers.4.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
243 |
-
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
244 |
-
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
245 |
-
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
246 |
-
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
247 |
-
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
248 |
-
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
249 |
-
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
250 |
-
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
251 |
-
"model.layers.5.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
252 |
-
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
253 |
-
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
254 |
-
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
255 |
-
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
256 |
-
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
257 |
-
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
258 |
-
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
259 |
-
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
260 |
-
"model.layers.6.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
261 |
-
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
262 |
-
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
263 |
-
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
264 |
-
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
265 |
-
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
266 |
-
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
267 |
-
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
268 |
-
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
269 |
-
"model.layers.7.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
270 |
-
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
271 |
-
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
272 |
-
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
273 |
-
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
274 |
-
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
275 |
-
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
276 |
-
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
277 |
-
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
278 |
-
"model.layers.8.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
279 |
-
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
280 |
-
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
281 |
-
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
282 |
-
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
283 |
-
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
284 |
-
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
285 |
-
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
286 |
-
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
287 |
-
"model.layers.9.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
288 |
-
"model.layers.9.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
289 |
-
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
290 |
-
"model.layers.9.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
291 |
-
"model.layers.9.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
292 |
-
"model.layers.9.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
293 |
-
"model.layers.9.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
294 |
-
"model.layers.9.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
295 |
-
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
296 |
-
"model.norm.weight": "model-00003-of-00003.safetensors"
|
297 |
-
}
|
298 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
special_tokens_map.json
DELETED
@@ -1,24 +0,0 @@
|
|
1 |
-
{
|
2 |
-
"bos_token": {
|
3 |
-
"content": "<s>",
|
4 |
-
"lstrip": false,
|
5 |
-
"normalized": false,
|
6 |
-
"rstrip": false,
|
7 |
-
"single_word": false
|
8 |
-
},
|
9 |
-
"eos_token": {
|
10 |
-
"content": "</s>",
|
11 |
-
"lstrip": false,
|
12 |
-
"normalized": false,
|
13 |
-
"rstrip": false,
|
14 |
-
"single_word": false
|
15 |
-
},
|
16 |
-
"pad_token": "<unk>",
|
17 |
-
"unk_token": {
|
18 |
-
"content": "<unk>",
|
19 |
-
"lstrip": false,
|
20 |
-
"normalized": false,
|
21 |
-
"rstrip": false,
|
22 |
-
"single_word": false
|
23 |
-
}
|
24 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
tokenizer.json
DELETED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer.model
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
|
3 |
-
size 493443
|
|
|
|
|
|
|
|
tokenizer_config.json
DELETED
@@ -1,42 +0,0 @@
|
|
1 |
-
{
|
2 |
-
"add_bos_token": true,
|
3 |
-
"add_eos_token": true,
|
4 |
-
"added_tokens_decoder": {
|
5 |
-
"0": {
|
6 |
-
"content": "<unk>",
|
7 |
-
"lstrip": false,
|
8 |
-
"normalized": false,
|
9 |
-
"rstrip": false,
|
10 |
-
"single_word": false,
|
11 |
-
"special": true
|
12 |
-
},
|
13 |
-
"1": {
|
14 |
-
"content": "<s>",
|
15 |
-
"lstrip": false,
|
16 |
-
"normalized": false,
|
17 |
-
"rstrip": false,
|
18 |
-
"single_word": false,
|
19 |
-
"special": true
|
20 |
-
},
|
21 |
-
"2": {
|
22 |
-
"content": "</s>",
|
23 |
-
"lstrip": false,
|
24 |
-
"normalized": false,
|
25 |
-
"rstrip": false,
|
26 |
-
"single_word": false,
|
27 |
-
"special": true
|
28 |
-
}
|
29 |
-
},
|
30 |
-
"additional_special_tokens": [],
|
31 |
-
"bos_token": "<s>",
|
32 |
-
"chat_template": "{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token}}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}",
|
33 |
-
"clean_up_tokenization_spaces": false,
|
34 |
-
"eos_token": "</s>",
|
35 |
-
"model_max_length": 1000000000000000019884624838656,
|
36 |
-
"pad_token": "<unk>",
|
37 |
-
"sp_model_kwargs": {},
|
38 |
-
"spaces_between_special_tokens": false,
|
39 |
-
"tokenizer_class": "LlamaTokenizer",
|
40 |
-
"unk_token": "<unk>",
|
41 |
-
"use_default_system_prompt": false
|
42 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|