OWG
/

ONNX
English
File size: 2,037 Bytes
586a876
1371dfa
586a876
1371dfa
 
 
a8560d5
586a876
1371dfa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
language: en
license: apache-2.0
datasets:
- bookcorpus
- wikipedia
- vblagoje/cc_news
---

# BigBird base model

BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the capabilities of a complete transformer that the sparse model can handle.

It is a pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this [paper](https://arxiv.org/abs/2007.14062) and first released in this [repository](https://github.com/google-research/bigbird).

## Model description

BigBird relies on **block sparse attention** instead of normal attention (i.e. BERT's attention) and can handle sequences up to a length of 4096 at a much lower compute cost compared to BERT. It has achieved SOTA on various tasks involving very long sequences such as long documents summarization, question-answering with long contexts.

## Original implementation

Follow [this link](https://huggingface.co/google/bigbird-roberta-base) to see the original implementation.

## How to use

Download the model by cloning the repository via `git clone https://huggingface.co/OWG/bigbird-roberta-base`.

Then you can use the model with the following code:

```python
from onnxruntime import InferenceSession, SessionOptions, GraphOptimizationLevel
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained("google/bigbird-roberta-base")

options = SessionOptions()
options.graph_optimization_level = GraphOptimizationLevel.ORT_ENABLE_ALL

session = InferenceSession("path/to/model.onnx", sess_options=options)
session.disable_fallback()

text = "Replace me by any text you want to encode."

input_ids = tokenizer(text, return_tensors="pt", return_attention_mask=True)
inputs = {k: v.cpu().detach().numpy() for k, v in input_ids.items()}

outputs_name = session.get_outputs()[0].name
outputs = session.run(output_names=[outputs_name], input_feed=inputs)
```