Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,55 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: en
|
3 |
+
tags:
|
4 |
+
- exbert
|
5 |
+
license: apache-2.0
|
6 |
+
datasets:
|
7 |
+
- bookcorpus
|
8 |
+
- wikipedia
|
9 |
+
---
|
10 |
+
|
11 |
+
# VGCN-BERT (DistilBERT based, uncased)
|
12 |
+
|
13 |
+
This model is a VGCN-BERT model based on [DistilBert-base-uncased](https://huggingface.co/distilbert-base-uncased) version. The original paper is [VGCN-BERT](https://arxiv.org/abs/2004.05707).
|
14 |
+
|
15 |
+
### How to use
|
16 |
+
|
17 |
+
- First prepare WGraph symmetric adjacency matrix
|
18 |
+
|
19 |
+
```python
|
20 |
+
import transformers as tfr
|
21 |
+
from transformers.models.vgcn_bert.modeling_graph import WordGraph,_normalize_adj
|
22 |
+
|
23 |
+
tokenizer = tfr.AutoTokenizer.from_pretrained(
|
24 |
+
"zhibinlu/vgcn-bert-distilbert-base-uncased"
|
25 |
+
)
|
26 |
+
# 1st method: Build graph using NPMI statistical method from training corpus
|
27 |
+
wgraph = WordGraph(rows=train_valid_df["text"], tokenizer=tokenizer)
|
28 |
+
# 2nd method: Build graph from pre-defined entity relationship tuple with weight
|
29 |
+
entity_relations = [
|
30 |
+
("dog", "labrador", 0.6),
|
31 |
+
("cat", "garfield", 0.7),
|
32 |
+
("city", "montreal", 0.8),
|
33 |
+
("weather", "rain", 0.3),
|
34 |
+
]
|
35 |
+
wgraph = WordGraph(rows=entity_relations, tokenizer=tokenizer)
|
36 |
+
```
|
37 |
+
|
38 |
+
- Then instantiate VGCN-BERT model with your WGraphs (support multiple graphs).
|
39 |
+
|
40 |
+
```python
|
41 |
+
from transformers.models.vgcn_bert.modeling_vgcn_bert import VGCNBertModel
|
42 |
+
model = VGCNBertModel.from_pretrained(
|
43 |
+
"zhibinlu/vgcn-bert-distilbert-base-uncased", trust_remote_code=True,
|
44 |
+
wgraphs=[wgraph.to_torch_sparse()],
|
45 |
+
wgraph_id_to_tokenizer_id_maps=[wgraph.wgraph_id_to_tokenizer_id_map])
|
46 |
+
)
|
47 |
+
text = "Replace me by any text you'd like."
|
48 |
+
encoded_input = tokenizer(text, return_tensors="pt")
|
49 |
+
output = model(**encoded_input)
|
50 |
+
```
|
51 |
+
|
52 |
+
|
53 |
+
## Fine-tune model
|
54 |
+
|
55 |
+
It's better fin-tune vgcn-bert model for the specific tasks.
|