jester6136
/

SeaLLMs-v3-1.5B-Chat-AWQ

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

jester6136 commited on Jul 26

Commit

81b6046

•

1 Parent(s): 1699de5

Update README.md

Files changed (1) hide show

README.md +26 -3

README.md CHANGED Viewed

@@ -1,3 +1,26 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+How to use with vllm:
+```
+from vllm import LLM, SamplingParams
+inputs = [
+    "Who is the president of US?",
+    "Can you speak Indonesian?"
+]
+# Initialize the LLM model
+llm = LLM(model="jester6136/SeaLLMs-v3-1.5B-Chat-AWQ",
+          quantization="AWQ",
+          gpu_memory_utilization=0.9,
+          max_model_len=2000,
+          max_num_seqs=32)
+sparams = SamplingParams(temperature=0.0, max_tokens=2000, top_p=0.95,top_k=0.95,repetition_penalty=1.05)
+chat_template = '<|im_start|> user \n {input} <|im_end|>\n<|im_start|>assistant\n'
+prompts = [chat_template.format(input=prompt) for prompt in inputs]
+outputs = llm.generate(prompts, sparams)
+# print out the model response
+for output in outputs:
+    prompt = output.prompt
+    generated_text = output.outputs[0].text
+    print(f"Prompt: {prompt}\nResponse: {generated_text}\n\n")
+```