Teja Gollapudi
commited on
Commit
•
e06b8a8
1
Parent(s):
260e8fe
Update README.md
Browse files
README.md
CHANGED
@@ -22,7 +22,7 @@ license: "apache-2.0"
|
|
22 |
</ul>
|
23 |
|
24 |
#### Motivation
|
25 |
-
Based on [MiniLMv2 distillation](https://arxiv.org/pdf/2012.15828.pdf), we have distilled vBERT-2021-large into a smaller minilmv2
|
26 |
|
27 |
#### Intended Use
|
28 |
The model functions as a VMware-specific Language Model.
|
@@ -53,7 +53,7 @@ output = model(encoded_input)
|
|
53 |
```
|
54 |
|
55 |
### Training
|
56 |
-
|
57 |
#### - Datasets
|
58 |
Publically available VMware text data such as VMware Docs, Blogs, etc. were used for distilling the teacher vBERT-2021-large model into vinilm-2021-from-large model. Sourced in May 2021. (~320,000 Documents)
|
59 |
#### - Preprocessing
|
@@ -67,7 +67,7 @@ Publically available VMware text data such as VMware Docs, Blogs, etc. were used
|
|
67 |
|
68 |
#### - Model performance measures
|
69 |
We benchmarked vBERT on various VMware-specific NLP downstream tasks (IR, classification, etc).
|
70 |
-
The model scored higher than the 'bert-base-uncased' model on all benchmarks.
|
71 |
|
72 |
### Limitations and bias
|
73 |
Since the model is distilled from a vBERT model based on the BERT model, it may have the same biases embedded within the original BERT model.
|
|
|
22 |
</ul>
|
23 |
|
24 |
#### Motivation
|
25 |
+
Based on [MiniLMv2 distillation](https://arxiv.org/pdf/2012.15828.pdf), we have distilled vBERT-2021-large into a smaller minilmv2 model for faster inference times without a significant loss of performance.
|
26 |
|
27 |
#### Intended Use
|
28 |
The model functions as a VMware-specific Language Model.
|
|
|
53 |
```
|
54 |
|
55 |
### Training
|
56 |
+
The model is distilled from [vBERT-2021-large](https://huggingface.co/VMware/vbert-2021-large). [nreimers/MiniLMv2-L6-H768-distilled-from-BERT-Large](https://huggingface.co/nreimers/MiniLMv2-L6-H768-distilled-from-BERT-Large/tree/main) was used to initialize the weights.
|
57 |
#### - Datasets
|
58 |
Publically available VMware text data such as VMware Docs, Blogs, etc. were used for distilling the teacher vBERT-2021-large model into vinilm-2021-from-large model. Sourced in May 2021. (~320,000 Documents)
|
59 |
#### - Preprocessing
|
|
|
67 |
|
68 |
#### - Model performance measures
|
69 |
We benchmarked vBERT on various VMware-specific NLP downstream tasks (IR, classification, etc).
|
70 |
+
The model scored higher than the 'bert-base-uncased' model on all benchmarks.
|
71 |
|
72 |
### Limitations and bias
|
73 |
Since the model is distilled from a vBERT model based on the BERT model, it may have the same biases embedded within the original BERT model.
|