metadata
library_name: peft
base_model: beomi/open-llama-2-ko-7b
license: cc-by-sa-4.0
datasets:
- traintogpb/aihub-flores-koen-integrated-sparta-30k
language:
- en
- ko
metrics:
- sacrebleu
- comet
pipeline_tag: translation
Pretrained LM
- beomi/open-llama-2-ko-7b (MIT License)
Training Dataset
- traintogpb/aihub-flores-koen-integrated-sparta-30k
- Can translate in Enlgish-Korean (bi-directional)
Prompt
- Template:
prompt = f"Translate this from {src_lang} to {tgt_lang}\n### {src_lang}: {src_text}\n### {tgt_lang}:" >>> # src_lang can be 'English', '한국어' >>> # tgt_lang can be '한국어', 'English'
- Issue:
The tokenizer of the model tokenizes the prompt below in different way with the prompt above.
Make sure to use the prompt proposed above.
And mind that there is no "space (prompt = f"""Translate this from {src_lang} to {tgt_lang} ### {src_lang}: {src_text} ### {tgt_lang}:""" >>> # DO NOT USE this prompt.
_
)" at the end of the prompt.
Training
- Trained with QLoRA
- PLM: NormalFloat 4-bit
- Adapter: BrainFloat 16-bit
- Adapted to all the linear layers (around 2.2%)
Usage (IMPORTANT)
- Should remove the EOS token (
<|endoftext|>
, id=46332) at the end of the prompt.
# MODEL
plm_name = 'beomi/open-llama-2-ko-7b'
adapter_name = 'traintogpb/llama-2-enko-translator-7b-qlora-adapter'
model = LlamaForCausalLM.from_pretrained(
plm_name,
max_length=768,
quantization_config=bnb_config, # Use the QLoRA config above
attn_implementation='flash_attention_2',
torch_dtype=torch.bfloat16
)
model = PeftModel.from_pretrained(
model,
adapter_name,
torch_dtype=torch.bfloat16
)
# TOKENIZER
tokenizer = LlamaTokenizer.from_pretrained(plm_name)
tokenizer.pad_token = "</s>"
tokenizer.pad_token_id = 2
tokenizer.eos_token = "<|endoftext|>" # Must be differentiated from the PAD token
tokenizer.eos_token_id = 46332
tokenizer.add_eos_token = True
tokenizer.model_max_length = 768
# INFERENCE
text = "NMIXX is the world-best female idol group, who came back with the new song 'DASH'."
prompt = f"Translate this from {src_lang} to {tgt_lang}\n### {src_lang}: {src_text}\n### {tgt_lang}:"
inputs = tokenizer(prompt, return_tensors="pt", max_length=max_length, truncation=True)
# REMOVE EOS TOKEN IN THE PROMPT
inputs['input_ids'] = inputs['input_ids'][0][:-1].unsqueeze(dim=0)
inputs['attention_mask'] = inputs['attention_mask'][0][:-1].unsqueeze(dim=0)
outputs = model.generate(**inputs, max_length=max_length, eos_token_id=46332)
input_len = len(inputs['input_ids'].squeeze())
translated_text = tokenizer.decode(outputs[0][input_len:], skip_special_tokens=True)
print(translated_text)