nicoladecao
commited on
Commit
•
6186793
1
Parent(s):
650ea3e
Update README.md
Browse files
README.md
CHANGED
@@ -12,4 +12,56 @@ tags:
|
|
12 |
- question-answering
|
13 |
- fill-mask
|
14 |
|
15 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
- question-answering
|
13 |
- fill-mask
|
14 |
|
15 |
+
---
|
16 |
+
|
17 |
+
|
18 |
+
# GENRE
|
19 |
+
|
20 |
+
|
21 |
+
The GENRE (Generative ENtity REtrieval) system as presented in [Autoregressive Entity Retrieval](https://arxiv.org/abs/2010.00904) implemented in pytorch.
|
22 |
+
|
23 |
+
In a nutshell, GENRE uses a sequence-to-sequence approach to entity retrieval (e.g., linking), based on fine-tuned [BART](https://arxiv.org/abs/1910.13461) architecture. GENRE performs retrieval generating the unique entity name conditioned on the input text using constrained beam search to only generate valid identifiers. The model was first released in the [facebookresearch/GENRE](https://github.com/facebookresearch/GENRE) repository using `fairseq` (the `transformers` models are obtained with a conversion script similar to [this](https://github.com/huggingface/transformers/blob/master/src/transformers/models/bart/convert_bart_original_pytorch_checkpoint_to_pytorch.py).
|
24 |
+
|
25 |
+
|
26 |
+
## BibTeX entry and citation info
|
27 |
+
|
28 |
+
**Please consider citing our works if you use code from this repository.**
|
29 |
+
|
30 |
+
```bibtex
|
31 |
+
@inproceedings{decao2020autoregressive,
|
32 |
+
title={Autoregressive Entity Retrieval},
|
33 |
+
author={Nicola {De Cao} and Gautier Izacard and Sebastian Riedel and Fabio Petroni},
|
34 |
+
booktitle={International Conference on Learning Representations},
|
35 |
+
url={https://openreview.net/forum?id=5k8F6UU39V},
|
36 |
+
year={2021}
|
37 |
+
}
|
38 |
+
```
|
39 |
+
|
40 |
+
## Usage
|
41 |
+
|
42 |
+
Here is an example of generation for Wikipedia page retrieval for open-domain fact-checking:
|
43 |
+
|
44 |
+
```python
|
45 |
+
import pickle
|
46 |
+
from trie import Trie
|
47 |
+
from transformers import BartTokenizer, BartForConditionalGeneration
|
48 |
+
|
49 |
+
# OPTIONAL: load the prefix tree (trie)
|
50 |
+
# with open("kilt_titles_trie_dict.pkl", "rb") as f:
|
51 |
+
# trie = Trie.load_from_dict(pickle.load(f))
|
52 |
+
|
53 |
+
tokenizer = BartTokenizer.from_pretrained("facebook/genre-kilt")
|
54 |
+
model = BartForConditionalGeneration.from_pretrained("facebook/genre-kilt").eval()
|
55 |
+
|
56 |
+
sentences = ["Einstein was a German physicist."]
|
57 |
+
|
58 |
+
outputs = model.generate(
|
59 |
+
**tokenizer(sentences, return_tensors="pt"),
|
60 |
+
num_beams=5,
|
61 |
+
num_return_sequences=5,
|
62 |
+
# OPTIONAL: use constrained beam search
|
63 |
+
# prefix_allowed_tokens_fn=lambda batch_id, sent: trie.get(sent.tolist()),
|
64 |
+
)
|
65 |
+
|
66 |
+
tokenizer.batch_decode(outputs, skip_special_tokens=True)
|
67 |
+
```
|