|
--- |
|
library_name: transformers |
|
tags: [] |
|
--- |
|
|
|
<img src="https://github.com/edchengg/gollie-transfusion/raw/main/assets/gollie-tf-example.png" style="height: 150px;"> |
|
|
|
# Translation and Fusion Improves Zero-shot Cross-lingual Information Extraction |
|
|
|
## Summary |
|
We propose TransFusion, a framework in which models are fine-tuned to use English translations of low-resource language data, enabling more precise predictions through annotation fusion. |
|
Based on TransFusion, we introduce GoLLIE-TF, a cross-lingual instruction-tuned LLM for IE tasks, designed to close the performance gap between high and low-resource languages. |
|
|
|
- 📖 Paper: [Translation and Fusion Improves Zero-shot Cross-lingual Information Extraction](https://arxiv.org/abs/2305.13582) |
|
- 🤗 Model: [GoLLIE-7B-TF](https://huggingface.co/ychenNLP/GoLLIE-7B-TF) |
|
- 🚀 Example Jupyter Notebooks: [GoLLIE-TF Notebooks](notebooks/tf.ipynb) |
|
|
|
|
|
**Important**: This is based on GoLLIE README (Our flash attention implementation has small numerical differences compared to the attention implementation in Huggingface. |
|
You must use the flag `trust_remote_code=True` or you will get inferior results. Flash attention requires an available CUDA GPU. Running GOLLIE |
|
pre-trained models on a CPU is not supported. We plan to address this in future releases. First, install flash attention 2:) |
|
```bash |
|
pip install flash-attn --no-build-isolation |
|
pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary |
|
``` |
|
|
|
Then you can load the model using |
|
|
|
```python |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("ychenNLP/GoLLIE-7B-TF") |
|
model = AutoModelForCausalLM.from_pretrained("HiTZ/GoLLIE-7B-TF", trust_remote_code=True, torch_dtype=torch.bfloat16) |
|
model.to("cuda") |
|
|
|
test_input = r'''# The following lines describe the task definition |
|
@dataclass |
|
class LLM(Entity): |
|
"""Large language model names or model names. This is used for deep learning and NLP tasks.""" |
|
|
|
span: str # Such as: "GPT-3.5", "LLama=7B", "ChatGPT" |
|
|
|
@dataclass |
|
class Hyperparams(Entity): |
|
"""Hyperparameter used for training large language models. Including learning rate, scheduler, architecture""" |
|
|
|
span: str # Such as: "layernorm", "cosine scheduler" |
|
|
|
# This is the text to analyze |
|
text = "GoLLIE-7B-TFが本日リリースされました! 1つのNVIDIA A100 GPUで推論が可能なサイズです 学習率は1e-4です 訓練にはLoRAが使用されています" |
|
|
|
# This is the English translation of the text |
|
eng_text = "GoLLIE-7B-TF is released today! * Sized for inference on 1 NVIDIA A100 GPUs * learning rate 1e-4 * LoRA is used for training" |
|
|
|
# Using translation and fusion |
|
# (1) generate annotation for eng_text |
|
# (2) generate annotation for text |
|
|
|
# The annotation instances that take place in the eng_text above are listed here |
|
result = [ |
|
''' |
|
|
|
model_input = tokenizer(test_input, return_tensors="pt") |
|
|
|
print(model_input["input_ids"]) |
|
|
|
model_input["input_ids"] = model_input["input_ids"][:, :-1] |
|
model_input["attention_mask"] = model_input["attention_mask"][:, :-1] |
|
|
|
model_ouput = model.generate( |
|
**model_input.to(model.device), |
|
max_new_tokens=128, |
|
do_sample=False, |
|
min_new_tokens=0, |
|
num_beams=1, |
|
num_return_sequences=1, |
|
) |
|
print(tokenizer.batch_decode(model_ouput)) |
|
|
|
``` |