GEMMA-2-2B for Text-to-PostgreSQL (Text-PGSQL)

Model Overview

Model Name: GEMMA-2-2B

Model Type: Text-to-PostgreSQL (Text-PGSQL)

Purpose: The model is designed for query generation purposes. It translates natural language queries into PostgreSQL SQL queries.

Training Details

Training Duration: 12 hours
Epochs: 3
Loss: Achieved good loss, indicating effective learning and generalization.

Dataset

The model was trained on a dataset that includes a variety of schemas, questions, and queries relevant to PostgreSQL.

Performance

Loss: Good loss achieved, demonstrating the model's ability to effectively learn and generate accurate SQL queries.

Usage

This model is intended for applications that require natural language to SQL translation, particularly for PostgreSQL databases. It can be utilized in systems that need to convert user queries in natural language into executable SQL statements.

Model Architecture

The model leverages the GEMMA-2-2B architecture, which is a large pre-trained model adapted for the task of SQL query generation.

Prompt

prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{schema}

### Input:
{question}

### Response:
"""

Pytorch

from transformers import AutoModelForCausalLM, AutoTokenizer

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = AutoModelForCausalLM.from_pretrained("suriya7/PostgreSQL-Coder-2B").to(device)
tokenizer = AutoTokenizer.from_pretrained("suriya7/PostgreSQL-Coder-2B")

inputs = tokenizer(prompt, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0],skip_special_tokens=True).replace(prompt,'').strip())

Examples

Schemas

CREATE TABLE employees (
    employee_id SERIAL PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100) UNIQUE NOT NULL,
    hire_date DATE NOT NULL,
    job_id INT NOT NULL,
    salary NUMERIC(8, 2),
    department_id INT
);

CREATE TABLE Authors (
    author_id SERIAL PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    birth_date DATE
);

CREATE TABLE Books (
    book_id SERIAL PRIMARY KEY,
    title VARCHAR(100) NOT NULL,
    genre VARCHAR(50),
    publication_year INT,
    author_id INT,
    FOREIGN KEY (author_id) REFERENCES Authors(author_id)
);

CREATE TABLE Members (
    member_id SERIAL PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100) UNIQUE,
    phone_number VARCHAR(15),
    membership_start_date DATE
);

CREATE TABLE Loans (
    loan_id SERIAL PRIMARY KEY,
    book_id INT,
    member_id INT,
    loan_date DATE,
    return_date DATE,
    FOREIGN KEY (book_id) REFERENCES Books(book_id),
    FOREIGN KEY (member_id) REFERENCES Members(member_id)
);

Questions

Find the emails of employees who were hired after January 1, 2020.

SELECT email FROM employees WHERE hire_date > '2020-01-01';

What are the first names and last names of all employees?

SELECT first_name, last_name FROM employees;

Retrieve the titles and genres of books that have not been borrowed by any member.

SELECT title, genre FROM Books WHERE book_id NOT IN (SELECT book_id FROM Loans);