facebook
/

genre-kilt

Text2Text Generation

entity-retrieval

named-entity-disambiguation

entity-disambiguation

named-entity-linking

Inference Endpoints

Model card Files Files and versions Community

nicoladecao commited on Jun 7, 2022

Commit

6186793

•

1 Parent(s): 650ea3e

Update README.md

Files changed (1) hide show

README.md +53 -1

README.md CHANGED Viewed

@@ -12,4 +12,56 @@ tags:
 - question-answering
 - fill-mask
----

 - question-answering
 - fill-mask
+---
+# GENRE
+The GENRE (Generative ENtity REtrieval) system as presented in [Autoregressive Entity Retrieval](https://arxiv.org/abs/2010.00904) implemented in pytorch.
+In a nutshell, GENRE uses a sequence-to-sequence approach to entity retrieval (e.g., linking), based on fine-tuned [BART](https://arxiv.org/abs/1910.13461) architecture. GENRE performs retrieval generating the unique entity name conditioned on the input text using constrained beam search to only generate valid identifiers. The model was first released in the [facebookresearch/GENRE](https://github.com/facebookresearch/GENRE) repository using `fairseq` (the `transformers` models are obtained with a conversion script similar to [this](https://github.com/huggingface/transformers/blob/master/src/transformers/models/bart/convert_bart_original_pytorch_checkpoint_to_pytorch.py).
+## BibTeX entry and citation info
+**Please consider citing our works if you use code from this repository.**
+```bibtex
+@inproceedings{decao2020autoregressive,
+  title={Autoregressive Entity Retrieval},
+  author={Nicola {De Cao} and Gautier Izacard and Sebastian Riedel and Fabio Petroni},
+  booktitle={International Conference on Learning Representations},
+  url={https://openreview.net/forum?id=5k8F6UU39V},
+  year={2021}
+}
+```
+## Usage
+Here is an example of generation for Wikipedia page retrieval for open-domain fact-checking:
+```python
+import pickle
+from trie import Trie
+from transformers import BartTokenizer, BartForConditionalGeneration
+# OPTIONAL: load the prefix tree (trie)
+# with open("kilt_titles_trie_dict.pkl", "rb") as f:
+#     trie = Trie.load_from_dict(pickle.load(f))
+tokenizer = BartTokenizer.from_pretrained("facebook/genre-kilt")
+model = BartForConditionalGeneration.from_pretrained("facebook/genre-kilt").eval()
+sentences = ["Einstein was a German physicist."]
+outputs = model.generate(
+    **tokenizer(sentences, return_tensors="pt"),
+    num_beams=5,
+    num_return_sequences=5,
+    # OPTIONAL: use constrained beam search
+    # prefix_allowed_tokens_fn=lambda batch_id, sent: trie.get(sent.tolist()),
+)
+tokenizer.batch_decode(outputs, skip_special_tokens=True)
+```