Teja Gollapudi
commited on
Commit
•
eb50fc4
1
Parent(s):
e06b8a8
Added info of the minilmv2 model used for initializing the weights
Browse files
README.md
CHANGED
@@ -53,7 +53,12 @@ output = model(encoded_input)
|
|
53 |
```
|
54 |
|
55 |
### Training
|
56 |
-
|
|
|
|
|
|
|
|
|
|
|
57 |
#### - Datasets
|
58 |
Publically available VMware text data such as VMware Docs, Blogs, etc. were used for distilling the teacher vBERT-2021-large model into vinilm-2021-from-large model. Sourced in May 2021. (~320,000 Documents)
|
59 |
#### - Preprocessing
|
@@ -66,8 +71,7 @@ Publically available VMware text data such as VMware Docs, Blogs, etc. were used
|
|
66 |
</ul>
|
67 |
|
68 |
#### - Model performance measures
|
69 |
-
We benchmarked
|
70 |
-
The model scored higher than the 'bert-base-uncased' model on all benchmarks.
|
71 |
|
72 |
### Limitations and bias
|
73 |
Since the model is distilled from a vBERT model based on the BERT model, it may have the same biases embedded within the original BERT model.
|
|
|
53 |
```
|
54 |
|
55 |
### Training
|
56 |
+
<ul>
|
57 |
+
<li>The model is distilled from [vBERT-2021-large](https://huggingface.co/VMware/vbert-2021-large) </li>
|
58 |
+
<li>Weights were initialized using [nreimers/MiniLMv2-L6-H768-distilled-from-BERT-Large](https://huggingface.co/nreimers/MiniLMv2-L6-H768-distilled-from-BERT-Large/tree/main) </li>
|
59 |
+
</ul>
|
60 |
+
|
61 |
+
|
62 |
#### - Datasets
|
63 |
Publically available VMware text data such as VMware Docs, Blogs, etc. were used for distilling the teacher vBERT-2021-large model into vinilm-2021-from-large model. Sourced in May 2021. (~320,000 Documents)
|
64 |
#### - Preprocessing
|
|
|
71 |
</ul>
|
72 |
|
73 |
#### - Model performance measures
|
74 |
+
We benchmarked vinilm on various VMware-specific NLP downstream tasks (IR, classification, etc).
|
|
|
75 |
|
76 |
### Limitations and bias
|
77 |
Since the model is distilled from a vBERT model based on the BERT model, it may have the same biases embedded within the original BERT model.
|