|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- AIAT/Pangpuriye-dataset |
|
language: |
|
- th |
|
- en |
|
pipeline_tag: text-generation |
|
tags: |
|
- code_generation |
|
--- |
|
|
|
Example inference using huggingface transformers. |
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, LlamaTokenizer |
|
import pandas as pd |
|
|
|
def get_prediction(raw_prediction): |
|
if "[/INST]" in raw_prediction: |
|
index = raw_prediction.index("[/INST]") |
|
return raw_prediction[index + 7:] |
|
|
|
return raw_prediction |
|
|
|
tokenizer = LlamaTokenizer.from_pretrained("AIAT/Pangpuriye-openthaigpt-1.0.0-7b-chat", trust_remote_code=True) |
|
model = AutoModelForCausalLM.from_pretrained("AIAT/Pangpuriye-openthaigpt-1.0.0-7b-chat", trust_remote_code=True) |
|
|
|
schema = """your SQL schema""" |
|
query = "หาจำนวนลูกค้าที่เป็นเพศชาย" |
|
|
|
prompt = f""" |
|
[INST] <<SYS>> |
|
You are a question answering assistant. Answer the question as truthful and helpful as possible คุณคือผู้ช่วยตอบคำถาม จงตอบคำถามอย่างถูกต้องและมีประโยชน์ที่สุด |
|
<</SYS>> |
|
{schema}### (sql extract) {query} [/INST] |
|
""" |
|
|
|
tokens = tokenizer(prompt, return_tensors="pt") |
|
output = model.generate(tokens["input_ids"], max_new_tokens=20, eos_token_id=tokenizer.eos_token_id) |
|
print(get_prediction(tokenizer.decode(output[0], skip_special_tokens=True))) |
|
``` |