--- license: apache-2.0 --- How to use with vllm: ``` from vllm import LLM, SamplingParams inputs = [ "Who is the president of US?", "Can you speak Indonesian?" ] # Initialize the LLM model llm = LLM(model="jester6136/SeaLLMs-v3-1.5B-Chat-AWQ", quantization="AWQ", gpu_memory_utilization=0.9, max_model_len=2000, max_num_seqs=32) sparams = SamplingParams(temperature=0.0, max_tokens=2000, top_p=0.95,top_k=0.95,repetition_penalty=1.05) chat_template = '<|im_start|> user \n {input} <|im_end|>\n<|im_start|>assistant\n' prompts = [chat_template.format(input=prompt) for prompt in inputs] outputs = llm.generate(prompts, sparams) # print out the model response for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt}\nResponse: {generated_text}\n\n") ```