File size: 3,838 Bytes
ebf5d44 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
---
license: apache-2.0
language:
- en
library_name: transformers
pipeline_tag: text2text-generation
tags:
- code
- sql
- text-to-sql
- text2sql
- t2sql
---
Introducing Hrida-T2SQL-3B-128k-V0.1, our latest small language model (SLM) tailored for data scientists and industry professionals. This advanced model marks a significant upgrade from our previous release, now equipped with an expanded 128k token context window for handling even the most intricate data queries with precision. Powered by the Phi 3 architecture, it effortlessly converts natural language queries into precise SQL commands, enhancing data analysis efficiency and decision-making capabilities.
For full details of this model please read our [blog post](https://www.hridaai.com/blog/t2sql-128k).
## Prompt Template
```txt
### Instruction:
Provide the system prompt.
### Dialect:
Specify the SQL dialect (e.g., MySQL, PostgreSQL, SQL Server, etc.).
### Context:
Provide the database schema including table names, column names, and data types.
### Input:
User's query.
### Response:
Expected SQL query output based on the input and context.
```
- **Instruction (System Prompt)**: This guides the model on processing input to generate the SQL query response effectively.
- **Dialect (Optional)**: Specify the SQL variant the model should use to ensure the generated query conforms to the correct syntax.
- **Context**: Provide the database schema to the model for generating accurate SQL queries.
- **Input**: Provide the user query for the model to comprehend and transform into an SQL query.
- **Response**: Expected output from the model.
## Chat Prompt Template
```txt
<s>
<|system|>
{ Instruction / System Prompt }
<|user|>
{ Context / User Query } <|end|>
<|assistant|>
```
## Run the Model
### Using Transformers
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Define the model and tokenizer
model_id = "HridaAI/Hrida-T2SQL-3B-128k-V0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, trust_remote_code=True)
# Define the context and prompt
prompt = """
Answer to the query will be in the form of an SQL query.
### Context: CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
Age INT,
DepartmentID INT,
Salary DECIMAL(10, 2),
DateHired DATE,
Active BOOLEAN,
FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID)
);
CREATE TABLE Departments (
DepartmentID INT PRIMARY KEY,
DepartmentName VARCHAR(100),
Location VARCHAR(100)
);
### Input: Write a SQL query to select all the employees who are active.
### Response:
"""
# Prepare the input
messages = [{"role": "user", "content": prompt}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
# Generate the output
outputs = model.generate(inputs, max_length=300)
print(tokenizer.decode(outputs[0]))
```
### Using MLX
```python
from mlx_lm import generate, load
model,tokenizer = load("HridaAI/Hrida-T2SQL-3B-128k-V0.1")
prompt = """
Answer to the quey will be in the form of SQL query.
### Context: CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
Age INT,
DepartmentID INT,
Salary DECIMAL(10, 2),
DateHired DATE,
Active BOOLEAN,
FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID)
);
CREATE TABLE Departments (
DepartmentID INT PRIMARY KEY,
DepartmentName VARCHAR(100),
Location VARCHAR(100)
); ### Input: Write a SQL query to select all the employees who are active. ### Response:"""
response = generate(model=model,tokenizer=tokenizer,prompt=prompt, verbose=True)
``` |