library_name: transformers
tags: []
Translation and Fusion Improves Zero-shot Cross-lingual Information Extraction
Summary
We propose TransFusion, a framework in which models are fine-tuned to use English translations of low-resource language data, enabling more precise predictions through annotation fusion. Based on TransFusion, we introduce GoLLIE-TF, a cross-lingual instruction-tuned LLM for IE tasks, designed to close the performance gap between high and low-resource languages.
- 📖 Paper: Translation and Fusion Improves Zero-shot Cross-lingual Information Extraction
- 🤗 Model: GoLLIE-7B-TF
- 🚀 Example Jupyter Notebooks: GoLLIE-TF Notebooks
Important: This is based on GoLLIE README (Our flash attention implementation has small numerical differences compared to the attention implementation in Huggingface.
You must use the flag trust_remote_code=True
or you will get inferior results. Flash attention requires an available CUDA GPU. Running GOLLIE
pre-trained models on a CPU is not supported. We plan to address this in future releases. First, install flash attention 2:)
pip install flash-attn --no-build-isolation
pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary
Then you can load the model using
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("ychenNLP/GoLLIE-7B-TF")
model = AutoModelForCausalLM.from_pretrained("HiTZ/GoLLIE-7B-TF", trust_remote_code=True, torch_dtype=torch.bfloat16)
model.to("cuda")
test_input = r'''# The following lines describe the task definition
@dataclass
class LLM(Entity):
"""Large language model names or model names. This is used for deep learning and NLP tasks."""
span: str # Such as: "GPT-3.5", "LLama=7B", "ChatGPT"
@dataclass
class Hyperparams(Entity):
"""Hyperparameter used for training large language models. Including learning rate, scheduler, architecture"""
span: str # Such as: "layernorm", "cosine scheduler"
# This is the text to analyze
text = "GoLLIE-7B-TFが本日リリースされました! 1つのNVIDIA A100 GPUで推論が可能なサイズです 学習率は1e-4です 訓練にはLoRAが使用されています"
# This is the English translation of the text
eng_text = "GoLLIE-7B-TF is released today! * Sized for inference on 1 NVIDIA A100 GPUs * learning rate 1e-4 * LoRA is used for training"
# Using translation and fusion
# (1) generate annotation for eng_text
# (2) generate annotation for text
# The annotation instances that take place in the eng_text above are listed here
result = [
'''
model_input = tokenizer(test_input, return_tensors="pt")
print(model_input["input_ids"])
model_input["input_ids"] = model_input["input_ids"][:, :-1]
model_input["attention_mask"] = model_input["attention_mask"][:, :-1]
model_ouput = model.generate(
**model_input.to(model.device),
max_new_tokens=128,
do_sample=False,
min_new_tokens=0,
num_beams=1,
num_return_sequences=1,
)
print(tokenizer.batch_decode(model_ouput))