Spaces:
Sleeping
Sleeping
Sheshera Mysore
commited on
Commit
•
ab5080f
1
Parent(s):
fbeb324
Cache scibert locally and load it.
Browse files
app.py
CHANGED
@@ -111,7 +111,8 @@ def read_kp_encoder(in_path):
|
|
111 |
:return:
|
112 |
"""
|
113 |
if 'kp_enc_model' not in st.session_state:
|
114 |
-
word_embedding_model = models.Transformer('
|
|
|
115 |
trained_model_fname = os.path.join(in_path, 'models', 'kp_encoder_cur_best.pt')
|
116 |
if torch.cuda.is_available():
|
117 |
saved_model = torch.load(trained_model_fname)
|
|
|
111 |
:return:
|
112 |
"""
|
113 |
if 'kp_enc_model' not in st.session_state:
|
114 |
+
word_embedding_model = models.Transformer(os.path.join(in_path, 'models', 'scibert_scivocab_uncased'),
|
115 |
+
max_seq_length=512)
|
116 |
trained_model_fname = os.path.join(in_path, 'models', 'kp_encoder_cur_best.pt')
|
117 |
if torch.cuda.is_available():
|
118 |
saved_model = torch.load(trained_model_fname)
|
data/models/scibert_scivocab_uncased/README.md
ADDED
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: en
|
3 |
+
---
|
4 |
+
# SciBERT
|
5 |
+
|
6 |
+
This is the pretrained model presented in [SciBERT: A Pretrained Language Model for Scientific Text](https://www.aclweb.org/anthology/D19-1371/), which is a BERT model trained on scientific text.
|
7 |
+
|
8 |
+
The training corpus was papers taken from [Semantic Scholar](https://www.semanticscholar.org). Corpus size is 1.14M papers, 3.1B tokens. We use the full text of the papers in training, not just abstracts.
|
9 |
+
|
10 |
+
SciBERT has its own wordpiece vocabulary (scivocab) that's built to best match the training corpus. We trained cased and uncased versions.
|
11 |
+
|
12 |
+
Available models include:
|
13 |
+
* `scibert_scivocab_cased`
|
14 |
+
* `scibert_scivocab_uncased`
|
15 |
+
|
16 |
+
|
17 |
+
The original repo can be found [here](https://github.com/allenai/scibert).
|
18 |
+
|
19 |
+
If using these models, please cite the following paper:
|
20 |
+
```
|
21 |
+
@inproceedings{beltagy-etal-2019-scibert,
|
22 |
+
title = "SciBERT: A Pretrained Language Model for Scientific Text",
|
23 |
+
author = "Beltagy, Iz and Lo, Kyle and Cohan, Arman",
|
24 |
+
booktitle = "EMNLP",
|
25 |
+
year = "2019",
|
26 |
+
publisher = "Association for Computational Linguistics",
|
27 |
+
url = "https://www.aclweb.org/anthology/D19-1371"
|
28 |
+
}
|
29 |
+
```
|
data/models/scibert_scivocab_uncased/config.json
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"attention_probs_dropout_prob": 0.1,
|
3 |
+
"hidden_act": "gelu",
|
4 |
+
"hidden_dropout_prob": 0.1,
|
5 |
+
"hidden_size": 768,
|
6 |
+
"initializer_range": 0.02,
|
7 |
+
"intermediate_size": 3072,
|
8 |
+
"layer_norm_eps": 1e-12,
|
9 |
+
"max_position_embeddings": 512,
|
10 |
+
"model_type": "bert",
|
11 |
+
"num_attention_heads": 12,
|
12 |
+
"num_hidden_layers": 12,
|
13 |
+
"pad_token_id": 0,
|
14 |
+
"type_vocab_size": 2,
|
15 |
+
"vocab_size": 31090
|
16 |
+
}
|
data/models/scibert_scivocab_uncased/pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e492944d88ac97dee6baa547671d3c526a3d067676883efb058311f4e5882e1a
|
3 |
+
size 442221694
|
data/models/scibert_scivocab_uncased/vocab.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|