File size: 1,819 Bytes
8bfd16b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
---
tags:
- text-generation
# license: cc-by-nc-sa-4.0
language:
- ko
base_model: yanolja/KoSOLAR-10.7B-v0.1
pipeline_tag: text-generation
---
# **DataVortexS-10.7B-v0.2**
<img src="./DataVortex.png" alt="DataVortex" style="height: 8em;">
## **License**
Will be updated soon ...
<!-- [cc-by-nc-sa-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) -->
## **Model Details**
### **Base Model**
[yanolja/KoSOLAR-10.7B-v0.1](https://huggingface.co/yanolja/KoSOLAR-10.7B-v0.1)
### **Trained On**
H100 80GB 1ea
### **Instruction format**
It follows **(No Input) Alpaca** format.
## **Model Benchmark**
### **Ko-LLM-Leaderboard**
On Benchmarking...
# **Implementation Code**
Since, chat_template already contains insturction format above.
You can use the code below.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda"
model = AutoModelForCausalLM.from_pretrained("Edentns/DataVortexS-10.7B-v0.2", device_map=device)
tokenizer = AutoTokenizer.from_pretrained("Edentns/DataVortexS-10.7B-v0.2")
messages = [
{ "role": "user", "content": "๋ํ๋ฏผ๊ตญ์ ์๋๋ ์ด๋์ผ?" }
]
encoded = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt",
return_token_type_ids=False
).to(device)
decoded = model.generate(
input_ids=encoded,
temperature=0.2,
top_p=0.9,
repetition_penalty=1.2,
do_sample=True,
max_length=4096,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.eos_token_id
)
decoded = decoded[0][encoded.shape[1]:decoded[0].shape[-1]]
decoded_text = tokenizer.decode(decoded, skip_special_tokens=True)
print(decoded_text)
```
<div align="center">
<a href="https://edentns.com/">
<img src="./Logo.png" alt="Logo" style="height: 3em;">
</a>
</div>
|