FractalGPT
/

RuQwen2.5-3B-Instruct-AWQ

@@ -28,4 +28,91 @@ library_name: transformers
 - **Architecture**: Transformer with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
 - **Layers**: 36
 - **Attention Heads (GQA)**: 24 for Q, 4 for KV
-- **Context Length**: Supports a full context of 131,072 tokens and generation of up to 8,192 tokens

 - **Architecture**: Transformer with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
 - **Layers**: 36
 - **Attention Heads (GQA)**: 24 for Q, 4 for KV
+- **Context Length**: Supports a full context of 131,072 tokens and generation of up to 8,192 tokens
+### Requirements
+The code of Qwen2.5 has been in the latest Hugging face transformers and we advise you to use the latest version of transformers.
+```
+pip install autoawq -q
+pip install --upgrade torch -q
+pip install --upgrade transformers -q
+```
+With transformers<4.37.0, you will encounter the following error:
+```
+KeyError: 'qwen2'
+```
+Also check out our [AWQ documentation](https://qwen.readthedocs.io/en/latest/quantization/awq.html) for more usage guide.
+With pytorch<2.4.0, you will encounter the following error:
+```
+AttributeError: module 'torch.library' has no attribute 'register_fake'
+```
+### Quickstart
+We use a special RuQwen2ForCausalLM class to work with this model:
+```
+from transformers import Qwen2ForCausalLM, AutoConfig, AutoTokenizer
+import torch
+class RuQwen2ForCausalLM(Qwen2ForCausalLM):
+    def __init__(self, config):
+        super().__init__(config)
+        if hasattr(self, "lm_head") and isinstance(self.lm_head, torch.nn.Linear):
+            if self.lm_head.bias is None:
+                self.config.add_bias_to_lm_head = True
+                self._add_bias_to_lm_head()
+    def _add_bias_to_lm_head(self):
+        """Добавляет bias в lm_head, если его нет."""
+        old_lm_head = self.lm_head
+        # lm_head с bias
+        self.lm_head = torch.nn.Linear(
+            old_lm_head.in_features,
+            old_lm_head.out_features,
+            dtype=self.model.dtype,
+            bias=True,
+        )
+        with torch.no_grad():
+            self.lm_head.weight = old_lm_head.weight
+            torch.nn.init.zeros_(self.lm_head.bias)
+    @classmethod
+    def from_pretrained(cls, model_name, *args, **kwargs):
+        # Загружает модель с конфигурацией
+        model = super().from_pretrained(model_name, *args, **kwargs)
+        if hasattr(model.config, "add_bias_to_lm_head") and not model.config.add_bias_to_lm_head:
+            model._add_bias_to_lm_head()
+        return model
+    def save_pretrained(self, save_directory, *args, **kwargs):
+        self.config.add_bias_to_lm_head = self.lm_head.bias is not None
+        super().save_pretrained(save_directory, *args, **kwargs)
+```
+Here provides a code snippet with apply_chat_template to show you how to load the tokenizer and model and how to generate contents.
+```
+def generate(messages):
+  input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
+  output = model.generate(input_ids,
+                          max_new_tokens=1024,
+                          do_sample=False,
+                          temperature=None,
+                          top_k=None,
+                          top_p=None)
+  generated_text = tokenizer.decode(output[0], skip_special_tokens=False)#.split('<|im_start|>assistant')[1]
+  return generated_text
+model_name = 'FractalGPT/RuQwen2.5-3B-Instruct-AWQ'
+model = Qwen2ForCausalLMWithBias.from_pretrained(model_name, torch_dtype=torch.float16)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+prompt = "Классификация медицинских терминов"
+messages = [
+    {"role": "system", "content": "You are RuQwen, created by FractalGPT. You are a helpful assistant."},
+    {"role": "user", "content": prompt}
+]
+print(generate(messages))
+```