---
license: apache-2.0
datasets:
- IlyaGusev/gazeta
- IlyaGusev/ru_turbo_alpaca_evol_instruct
- IlyaGusev/ru_turbo_alpaca
- IlyaGusev/ru_turbo_saiga
- RussianNLP/russian_super_glue
language:
- ru
pipeline_tag: question-answering
---
The model was trained on part of the datasets
*IlyaGusev/gazeta* ,
*IlyaGusev/ru_turbo_alpaca_evol_instruct*,
*IlyaGusev/ru_turbo_alpaca*,
*IlyaGusev/ru_turbo_saiga* ,
*RussianNLP/russian_super_glue (muserc)*
using LoRA
#### Base_model NousResearch/Yarn-Llama-2-7b-64k
#### Need cuda > 11.4
### GPU A100
```python
!pip install peft
!pip install flash-attn --no-build-isolation
!pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary
```
```python
model = AutoModelForCausalLM.from_pretrained(
'geldarr/saiga-Yarn-Llama-2-7b-64k',
trust_remote_code=True,
torch_dtype=torch.float16,
device_map={'':0}
)
tokenizer = AutoTokenizer.from_pretrained('geldarr/saiga-Yarn-Llama-2-7b-64k', use_fast=False)
```
```python
big_prompts = '''system\nТы — Сайга, русскоязычный автоматический ассистент. Ты разговариваешь с людьми и помогаешь им.\n
user
Дай ответы на вопрос основываясь только на тексте ниже:\n
вопрос?
Текст <65536 tokens
bot
'''
```python
gen_config = {
"pad_token_id": 0,
"bos_token_id": 1,
"eos_token_id": 2,
"temperature": 0.4,
"top_p": 0.9,
"top_k": 50,
"do_sample": True,
"max_new_tokens": 15360,
"repetition_penalty": 1.1,
"no_repeat_ngram_size": 15,
}
generation_config = GenerationConfig.from_dict(gen_config)
```
```python
def generate(model, tokenizer, prompt, generation_config):
data = tokenizer(prompt, return_tensors="pt")
data = {k: v.to(model.device) for k, v in data.items()}
output_ids = model.generate(
**data,
generation_config=generation_config
)[0]
output_ids = output_ids[len(data["input_ids"][0]):]
output = tokenizer.decode(output_ids)
return output.strip()
output = generate(model, tokenizer, big_prompts, generation_config)
print(output)
```