ychenNLP
/

GoLLIE-7B-TF

Feature Extraction

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

GoLLIE-7B-TF / README.md

ychenNLP's picture

Update README.md

a5a3a73 verified 5 months ago

|

3.35 kB

	---
	library_name: transformers
	tags: []
	---

	<img src="https://github.com/edchengg/gollie-transfusion/raw/main/assets/gollie-tf-example.png" style="height: 150px;">

	# Translation and Fusion Improves Zero-shot Cross-lingual Information Extraction

	## Summary
	We propose TransFusion, a framework in which models are fine-tuned to use English translations of low-resource language data, enabling more precise predictions through annotation fusion.
	Based on TransFusion, we introduce GoLLIE-TF, a cross-lingual instruction-tuned LLM for IE tasks, designed to close the performance gap between high and low-resource languages.

	- 📖 Paper: [Translation and Fusion Improves Zero-shot Cross-lingual Information Extraction](https://arxiv.org/abs/2305.13582)
	- 🤗 Model: [GoLLIE-7B-TF](https://huggingface.co/ychenNLP/GoLLIE-7B-TF)
	- 🚀 Example Jupyter Notebooks: [GoLLIE-TF Notebooks](notebooks/tf.ipynb)


	Important: This is based on GoLLIE README (Our flash attention implementation has small numerical differences compared to the attention implementation in Huggingface.
	You must use the flag `trust_remote_code=True` or you will get inferior results. Flash attention requires an available CUDA GPU. Running GOLLIE
	pre-trained models on a CPU is not supported. We plan to address this in future releases. First, install flash attention 2:)
	```bash
	pip install flash-attn --no-build-isolation
	pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary
	```

	Then you can load the model using

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained("ychenNLP/GoLLIE-7B-TF")
	model = AutoModelForCausalLM.from_pretrained("HiTZ/GoLLIE-7B-TF", trust_remote_code=True, torch_dtype=torch.bfloat16)
	model.to("cuda")

	test_input = r'''# The following lines describe the task definition
	@dataclass
	class LLM(Entity):
	"""Large language model names or model names. This is used for deep learning and NLP tasks."""

	span: str # Such as: "GPT-3.5", "LLama=7B", "ChatGPT"

	@dataclass
	class Hyperparams(Entity):
	"""Hyperparameter used for training large language models. Including learning rate, scheduler, architecture"""

	span: str # Such as: "layernorm", "cosine scheduler"

	# This is the text to analyze
	text = "GoLLIE-7B-TFが本日リリースされました！ 1つのNVIDIA A100 GPUで推論が可能なサイズです学習率は1e-4です訓練にはLoRAが使用されています"

	# This is the English translation of the text
	eng_text = "GoLLIE-7B-TF is released today! * Sized for inference on 1 NVIDIA A100 GPUs * learning rate 1e-4 * LoRA is used for training"

	# Using translation and fusion
	# (1) generate annotation for eng_text
	# (2) generate annotation for text

	# The annotation instances that take place in the eng_text above are listed here
	result = [
	'''

	model_input = tokenizer(test_input, return_tensors="pt")

	print(model_input["input_ids"])

	model_input["input_ids"] = model_input["input_ids"][:, :-1]
	model_input["attention_mask"] = model_input["attention_mask"][:, :-1]

	model_ouput = model.generate(
	**model_input.to(model.device),
	max_new_tokens=128,
	do_sample=False,
	min_new_tokens=0,
	num_beams=1,
	num_return_sequences=1,
	)
	print(tokenizer.batch_decode(model_ouput))

	```