Update README.md
Browse files
README.md
CHANGED
@@ -13,4 +13,24 @@ Based on TransFusion, we introduce GoLLIE-TF, a cross-lingual instruction-tuned
|
|
13 |
|
14 |
- π Paper: [Translation and Fusion Improves Zero-shot Cross-lingual Information Extraction](https://arxiv.org/abs/2305.13582)
|
15 |
- π€ Model: [GoLLIE-7B-TF](https://huggingface.co/ychenNLP/GoLLIE-7B-TF)
|
16 |
-
- π Example Jupyter Notebooks: [GoLLIE-TF Notebooks](notebooks/tf.ipynb)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
|
14 |
- π Paper: [Translation and Fusion Improves Zero-shot Cross-lingual Information Extraction](https://arxiv.org/abs/2305.13582)
|
15 |
- π€ Model: [GoLLIE-7B-TF](https://huggingface.co/ychenNLP/GoLLIE-7B-TF)
|
16 |
+
- π Example Jupyter Notebooks: [GoLLIE-TF Notebooks](notebooks/tf.ipynb)
|
17 |
+
|
18 |
+
|
19 |
+
**Important**: This is based on GoLLIE README (Our flash attention implementation has small numerical differences compared to the attention implementation in Huggingface.
|
20 |
+
You must use the flag `trust_remote_code=True` or you will get inferior results. Flash attention requires an available CUDA GPU. Running GOLLIE
|
21 |
+
pre-trained models on a CPU is not supported. We plan to address this in future releases. First, install flash attention 2:)
|
22 |
+
```bash
|
23 |
+
pip install flash-attn --no-build-isolation
|
24 |
+
pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary
|
25 |
+
```
|
26 |
+
|
27 |
+
Then you can load the model using
|
28 |
+
|
29 |
+
```python
|
30 |
+
import torch
|
31 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
32 |
+
|
33 |
+
tokenizer = AutoTokenizer.from_pretrained("HiTZ/GoLLIE-7B")
|
34 |
+
model = AutoModelForCausalLM.from_pretrained("HiTZ/GoLLIE-7B", trust_remote_code=True, torch_dtype=torch.bfloat16)
|
35 |
+
model.to("cuda")
|
36 |
+
```
|