--- license: other license_name: seallms license_link: https://huggingface.co/SeaLLMs/SeaLLM-13B-Chat/blob/main/LICENSE language: - en - zh - id - vi - th - ms tags: - sea - multilingual base_model: SeaLLMs/SeaLLM3-7B-Chat pipeline_tag: text-generation --- # QuantFactory/SeaLLM3-7B-Chat-GGUF This is quantized version of [SeaLLMs/SeaLLM3-7B-Chat](https://huggingface.co/SeaLLMs/SeaLLM3-7B-Chat) created using llama.cpp # *SeaLLM3* - Large Language Models for Southeast Asia
Website 🤗 Tech Memo 🤗 DEMO Github Technical Report
We introduce **SeaLLM3**, the latest series of the SeaLLMs (Large Language Models for Southeast Asian languages) family. It achieves state-of-the-art performance among models with similar sizes, excelling across a diverse array of tasks such as world knowledge, mathematical reasoning, translation, and instruction following. In the meantime, it was specifically enhanced to be more trustworthy, exhibiting reduced hallucination and providing safe responses, particularly in queries closed related to Southeast Asian culture. ## 🔥 Highlights - State-of-the-art performance compared to open-source models of similar sizes, evaluated across various dimensions such as human exam questions, instruction-following, mathematics, and translation. - Significantly enhanced instruction-following capability, especially in multi-turn settings. - Ensures safety in usage with significantly reduced instances of hallucination and sensitivity to local contexts. ## Uses SeaLLMs is tailored for handling a wide range of languages spoken in the SEA region, including English, Chinese, Indonesian, Vietnamese, Thai, Tagalog, Malay, Burmese, Khmer, Lao, Tamil, and Javanese. This page introduces the SeaLLM3-7B-Chat model, specifically fine-tuned to follow human instructions effectively for task completion, making it directly applicable to your applications. ### Get started with `Transformers` To quickly try the model, we show how to conduct inference with `transformers` below. Make sure you have installed the latest transformers version (>4.40). ```python from transformers import AutoModelForCausalLM, AutoTokenizer device = "cuda" # the device to load the model onto model = AutoModelForCausalLM.from_pretrained( "SeaLLMs/SeaLLM3-7B-chat", torch_dtype=torch.bfloat16, device_map=device ) tokenizer = AutoTokenizer.from_pretrained("SeaLLMs/SeaLLM3-7B-chat") # prepare messages to model prompt = "Hiii How are you?" messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) model_inputs = tokenizer([text], return_tensors="pt").to(device) print(f"Formatted text:\n {text}") print(f"Model input:\n {model_inputs}") generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=512, do_sample=True) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True) print(f"Response:\n {response[0]}") ``` You can also utilize the following code snippet, which uses the streamer `TextStreamer` to enable the model to continue conversing with you: ```python from transformers import AutoModelForCausalLM, AutoTokenizer from transformers import TextStreamer device = "cuda" # the device to load the model onto model = AutoModelForCausalLM.from_pretrained( "SeaLLMs/SeaLLM3-7B-chat", torch_dtype=torch.bfloat16, device_map=device ) tokenizer = AutoTokenizer.from_pretrained("SeaLLMs/SeaLLM3-7B-chat") # prepare messages to model messages = [ {"role": "system", "content": "You are a helpful assistant."}, ] while True: prompt = input("User:") messages.append({"role": "user", "content": prompt}) text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) model_inputs = tokenizer([text], return_tensors="pt").to(device) streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True) generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=512, streamer=streamer) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] messages.append({"role": "assistant", "content": response}) ``` ### Inference with `vllm` You can also conduct inference with [vllm](https://docs.vllm.ai/en/stable/index.html), which is a fast and easy-to-use library for LLM inference and serving. To use vllm, first install the latest version via `pip install vllm`. ```python from vllm import LLM, SamplingParams prompts = [ "Who is the president of US?", "Can you speak Indonesian?" ] llm = LLM(ckpt_path, dtype="bfloat16") sparams = SamplingParams(temperature=0.1, max_tokens=512) outputs = llm.generate(prompts, sparams) # print out the model response for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt}\nResponse: {generated_text}\n\n") ``` ### Bias, Risks, and Limitations> **Disclaimer**: > We must note that even though the weights, codes, and demos are released in an open manner, similar to other pre-trained language models, and despite our best efforts in red teaming and safety fine-tuning and enforcement, our models come with potential risks, including but not limited to inaccurate, misleading or potentially harmful generation. > Developers and stakeholders should perform their own red teaming and provide related security measures before deployment, and they must abide by and comply with local governance and regulations. > In no event shall the authors be held liable for any claim, damages, or other liability arising from the use of the released weights, codes, or demos. ## Evaluation We conduct our evaluation along two dimensions: 1. **Model Capability**: We assess the model's performance on human exam questions, its ability to follow instructions, its proficiency in mathematics, and its translation accuracy. 2. **Model Trustworthiness**: We evaluate the model's safety and tendency to hallucinate, particularly in the context of Southeast Asia. ### Model Capability #### Multilingual World Knowledge - M3Exam [M3Exam](https://arxiv.org/abs/2306.05179) consists of local exam questions collected from each country. It reflects the model's world knowledge (e.g., with language or social science subjects) and reasoning abilities (e.g., with mathematics or natural science subjects). | Model | en | zh | id | th | vi | avg | avg_sea | |:-----------------|-----:|------:|-----:|-----:|-----:|------:|----------:| | Sailor-7B-Chat | 0.66 | 0.652 | 0.475 | 0.462 | 0.513 | 0.552 | 0.483 | | gemma-7b | 0.732 | 0.519 | 0.475 | 0.46 | 0.594 | 0.556 | 0.510 | | SeaLLM-7B-v2.5 | 0.758 | 0.581 | 0.499 | 0.502 | 0.622 | 0.592 | 0.541 | | Qwen2-7B | 0.815 | 0.874 | 0.53 | 0.479 | 0.628 | 0.665 | 0.546 | | Qwen2-7B-Instruct| 0.809 | 0.88 | 0.558 | 0.555 | 0.624 | 0.685 | 0.579 | | Sailor-14B | 0.748 | 0.84 | 0.536 | 0.528 | 0.621 | 0.655 | 0.562 | | Sailor-14B-Chat | 0.749 | 0.843 | 0.553 | 0.566 | 0.637 | 0.67 | 0.585 | | SeaLLM3-7B | 0.814 | 0.866 | 0.549 | 0.52 | 0.628 | 0.675 | 0.566 | | SeaLLM3-7B-Chat | 0.809 | 0.874 | 0.558 | 0.569 | 0.649 | 0.692 | 0.592 | #### Multilingual Instruction-following Capability - SeaBench SeaBench consists of multi-turn human instructions spanning various task types. It evaluates chat-based models on their ability to follow human instructions in both single and multi-turn settings and assesses their performance across different task types. The dataset and corresponding evaluation code will be released soon! | model | idTerms of Use and License: By using our released weights, codes, and demos, you agree to and comply with the terms and conditions specified in our SeaLLMs Terms Of Use.