Links for sample inference and database schema are dead

by NPap - opened Nov 22, 2023

Discussion

NPap

Nov 22, 2023

https://huggingface.co/defog/sqlcoder-34b-alpha#using-sqlcoder

The ones right under this heading.

gerald29

Nov 27, 2023

It might be found here: https://github.com/defog-ai/sqlcoder

samvedya

Nov 27, 2023

@gerald29 @NPap

Do you have a sample code for running the model with quantization on an RTX 4090 (24gb vram)?"

gghfez

Nov 29, 2023

@samvedya Just download the exl2 from here: https://huggingface.co/waldie/sqlcoder-34b-alpha-4bpw-h6-exl2

(Works for my 3090s)

samvedya

Nov 29, 2023

model_name="defog/sqlcoder-34b-alpha"

quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16
)

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
torch_dtype=torch.float16,
device_map="auto",
use_cache=True,
quantization_config=quantization_config
)

This is a better way

NPap

Nov 29, 2023

•

edited Nov 29, 2023

@samvedya Just download the exl2 from here: https://huggingface.co/waldie/sqlcoder-34b-alpha-4bpw-h6-exl2

(Works for my 3090s)

hey, for reference, what kind of t/s do you get for your prompts? (or How long does it take to get an output from the model?)
(How big is your schema in terms of # columns and tables?)

(Around 4 line output prompts)

shengxiaoyi

Dec 11, 2023

Hi, all.

I run the inference script on sqlcoder-34b-alpha model but with no sql relsult return, any ideas please.

Thanks a lot

miteshgarg

Jan 30

@samvedya : were you able to run the model with 8 bit quantization on an RTX 4090 (24gb vram) with above settings?

samvedya

Jan 30

@samvedya : were you able to run the model with 8 bit quantization on an RTX 4090 (24gb vram) with above settings?

Yes.

miteshgarg

Jan 30

•

edited Jan 30

@samvedya : can you please share the code or github report and version of libraries used.
Also, I want to run the code on windows env. Which env you used and is there any other specific changes you did?

samvedya

Jan 30

@samvedya : can you share the code and version of libraries used. I want to run the code on windows env

#Latest version of every library as of today

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch
#import sqlparse # if you want to scrap the sql from raw LLM output

model_id = "codellama/CodeLlama-34b-Instruct-hf"
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=quantization_config,
device_map="auto",
)

prompt = ""

inputs = tokenizer(prompt, return_tensors="pt",add_special_tokens=True).to("cuda")

output = model.generate(
inputs["input_ids"],
max_new_tokens=128,
do_sample=True,
top_p=0.9,
temperature=0.1,
repetition_penalty=1.05
)
output = output[0].to("cpu")
string_output=(tokenizer.decode(output))

print(string_output)

miteshgarg

Jan 30

Hi @samvedya : I was able to run the code with 4bit quantization on windows with a specific bitsandbytes library available on path:
https://jllllll.github.io/bitsandbytes-windows-webui/bitsandbytes/

Can you let me know if you were able to run the model in 8 bit and what was the config used for it?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment