NanoTranslator
Collection
a collection of nano translators
•
13 items
•
Updated
•
1
English | 简体中文
This is the medium-2 model of the NanoTranslator, currently supported only in English to Chinese.
The ONNX version of the model is also available in the repository.
All models are collected in the NanoTranslator Collection.
P. | Arch. | Act. | V. | H. | I. | L. | A.H. | K.H. | Tie | |
---|---|---|---|---|---|---|---|---|---|---|
XXL2 | 102 | LLaMA | SwiGLU | 16K | 1120 | 3072 | 6 | 16 | 8 | True |
XXL | 100 | LLaMA | SwiGLU | 16K | 768 | 4096 | 8 | 24 | 8 | True |
XL | 78 | LLaMA | GeGLU | 16K | 768 | 4096 | 6 | 24 | 8 | True |
L | 49 | LLaMA | GeGLU | 16K | 512 | 2816 | 8 | 16 | 8 | True |
M2 | 22 | Qwen2 | GeGLU | 4K | 432 | 2304 | 6 | 24 | 8 | True |
M | 22 | LLaMA | SwiGLU | 8K | 256 | 1408 | 16 | 16 | 4 | True |
S | 9 | LLaMA | SwiGLU | 4K | 168 | 896 | 16 | 12 | 4 | True |
XS | 2 | LLaMA | SwiGLU | 2K | 96 | 512 | 12 | 12 | 4 | True |
Prompt format as follows:
<|im_start|> {English Text} <|endoftext|>
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_path = 'Mxode/NanoTranslator-M2'
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)
def translate(text: str, model, **kwargs):
generation_args = dict(
max_new_tokens = kwargs.pop("max_new_tokens", 512),
do_sample = kwargs.pop("do_sample", True),
temperature = kwargs.pop("temperature", 0.55),
top_p = kwargs.pop("top_p", 0.8),
top_k = kwargs.pop("top_k", 40),
eos_token_id = kwargs.pop("eos_token_id", tokenizer.eos_token_id),
**kwargs
)
prompt = "<|im_start|>" + text + "<|endoftext|>"
model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
generated_ids = model.generate(model_inputs.input_ids, **generation_args)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
return response
text = "I love to watch my favorite TV series."
response = translate(text, model, max_new_tokens=64, do_sample=False)
print(response)
It has been measured that reasoning with ONNX models will be 2-10 times faster than reasoning directly with transformers models.
You should switch to onnx branch manually and download to local.
reference docs:
Using ORTModelForCausalLM
from optimum.onnxruntime import ORTModelForCausalLM
from transformers import AutoTokenizer
model_path = "your/folder/to/onnx_model"
ort_model = ORTModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
text = "I love to watch my favorite TV series."
response = translate(text, ort_model, max_new_tokens=64, do_sample=False)
print(response)
Using pipeline
from optimum.pipelines import pipeline
model_path = "your/folder/to/onnx_model"
pipe = pipeline("text-generation", model=model_path, accelerator="ort")
text = "I love to watch my favorite TV series."
response = pipe(text, max_new_tokens=64, do_sample=False, eos_token_id=2)
response