metadata
license: apache-2.0
datasets:
- AIAT/Pangpuriye-dataset
- AIAT/Pangpuriye-public_ThaiSum40k
- AIAT/Pangpuriye-generated_by_LLama3-codeLlama
- AIAT/Pangpuriye-public_alpaca-cleaned
- AIAT/Pangpuriye-generated_by_typhoon
language:
- th
- en
pipeline_tag: text-generation
tags:
- code_generation
- sql
metrics:
- accuracy
🤖 Super AI Engineer Development Program Season 4 - Pangpuriye Table-based Question Answering Model
This model was fine-tuned from the original OpenThaiGPT-1.0.1-7b. The model is set under Apache license 2.0.
Example inference using huggingface transformers.
The following code is an exmaple of how to inference our model.
from transformers import AutoModelForCausalLM, AutoTokenizer, LlamaTokenizer
import pandas as pd
def get_prediction(raw_prediction):
if "[/INST]" in raw_prediction:
index = raw_prediction.index("[/INST]")
return raw_prediction[index + 7:]
return raw_prediction
tokenizer = LlamaTokenizer.from_pretrained("AIAT/Pangpuriye-openthaigpt-1.0.0-7b-chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("AIAT/Pangpuriye-openthaigpt-1.0.0-7b-chat", trust_remote_code=True)
schema = """your SQL schema"""
query = "หาจำนวนลูกค้าที่เป็นเพศชาย"
prompt = f"""
[INST] <<SYS>>
You are a question answering assistant. Answer the question as truthful and helpful as possible คุณคือผู้ช่วยตอบคำถาม จงตอบคำถามอย่างถูกต้องและมีประโยชน์ที่สุด
<</SYS>>
{schema}### (sql extract) {query} [/INST]
"""
tokens = tokenizer(prompt, return_tensors="pt")
output = model.generate(tokens["input_ids"], max_new_tokens=20, eos_token_id=tokenizer.eos_token_id)
print(get_prediction(tokenizer.decode(output[0], skip_special_tokens=True)))
Acknowledgements
The model collaborated by the members of Panguriye's house during the LLMs hackathon in Super AI Engineer Development Program Season 4.
We thank the organizers of this hackathon, OpenThaiGPT, AIAT, NECTEC and ThaiSC for this challenging task and opportunity to be a part of developing Thai large language model.