--- license: other license_name: exaone license_link: LICENSE language: - en - ko tags: - lg-ai - exaone - exaone-3.5 pipeline_tag: text-generation library_name: transformers ---
# EXAONE-3.5-32B-Instruct
## Introduction
We introduce EXAONE 3.5, a collection of instruction-tuned bilingual (English and Korean) generative models ranging from 2.4B to 32B parameters, developed and released by LG AI Research. EXAONE 3.5 language models include: 1) **2.4B model** optimized for deployment on small or resource-constrained devices, 2) **7.8B model** matching the size of its predecessor but offering improved performance, and 3) **32B model** delivering powerful performance. All models support long-context processing of up to 32K tokens. Each model demonstrates state-of-the-art performance in real-world use cases and long-context understanding, while remaining competitive in general domains compared to recently released models of similar sizes.
For more details, please refer to our [technical report](https://arxiv.org/abs/2412.04862), [blog](https://www.lgresearch.ai/blog/view?seq=507) and [GitHub](https://github.com/LG-AI-EXAONE/EXAONE-3.5).
This repository contains the instruction-tuned 32B language model with the following features:
- Number of Parameters (without embeddings): 30.95B
- Number of Layers: 64
- Number of Attention Heads: GQA with 40 Q-heads and 8 KV-heads
- Vocab Size: 102,400
- Context Length: 32,768 tokens
## Quickstart
We recommend to use `transformers` v4.43 or later.
Here is the code snippet to run conversational inference with the model:
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "LGAI-EXAONE/EXAONE-3.5-32B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Choose your prompt
prompt = "Explain how wonderful you are" # English example
prompt = "스스로를 자랑해 봐" # Korean example
messages = [
{"role": "system",
"content": "You are EXAONE model from LG AI Research, a helpful assistant."},
{"role": "user", "content": prompt}
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
)
output = model.generate(
input_ids.to("cuda"),
eos_token_id=tokenizer.eos_token_id,
max_new_tokens=128,
do_sample=False,
)
print(tokenizer.decode(output[0]))
```
> ### Note
> The EXAONE 3.5 instruction-tuned language models were trained to utilize the system prompt,
> so we highly recommend using the system prompts provided in the code snippet above.
## Evaluation
The following table shows the evaluation results of real-world use cases. The full evaluation results can be found in the [technical report](https://arxiv.org/abs/2412.04862).
Models | MT-Bench | LiveBench | Arena-Hard | AlpacaEval | IFEval | KoMT-Bench[1] | LogicKor |
---|---|---|---|---|---|---|---|
EXAONE 3.5 32B | 8.51 | 43.0 | 78.6 | 60.6 | 81.7 | 8.05 | 9.06 |
Qwen 2.5 32B | 8.49 | 50.6 | 67.0 | 41.0 | 78.7 | 7.75 | 8.89 |
C4AI Command R 32B | 7.38 | 29.7 | 17.0 | 25.9 | 26.1 | 6.72 | 8.24 |
Gemma 2 27B | 8.28 | 40.0 | 57.5 | 52.2 | 59.7 | 7.19 | 8.56 |
Yi 1.5 34B | 7.64 | 26.2 | 23.1 | 34.8 | 55.5 | 4.88 | 6.33 |