VMware
/

vinilm-2021-from-large

@@ -53,7 +53,12 @@ output = model(encoded_input)
 ```
 ### Training
-The model is distilled from [vBERT-2021-large](https://huggingface.co/VMware/vbert-2021-large). [nreimers/MiniLMv2-L6-H768-distilled-from-BERT-Large](https://huggingface.co/nreimers/MiniLMv2-L6-H768-distilled-from-BERT-Large/tree/main) was used to initialize the weights.
 #### - Datasets
 Publically available VMware text data such as VMware Docs, Blogs, etc. were used for distilling the teacher vBERT-2021-large model into vinilm-2021-from-large model. Sourced in May 2021. (~320,000 Documents)
 #### - Preprocessing
@@ -66,8 +71,7 @@ Publically available VMware text data such as VMware Docs, Blogs, etc. were used
 </ul>
 #### - Model performance measures
-We benchmarked vBERT on various VMware-specific NLP downstream tasks (IR, classification, etc).
-The model scored higher than the 'bert-base-uncased' model on all benchmarks.
 ### Limitations and bias
 Since the model is distilled from a vBERT model based on the BERT model, it may have the same biases embedded within the original BERT model.

 ```
 ### Training
+<ul>
+<li>The model is distilled from [vBERT-2021-large](https://huggingface.co/VMware/vbert-2021-large) </li>
+<li>Weights were initialized using [nreimers/MiniLMv2-L6-H768-distilled-from-BERT-Large](https://huggingface.co/nreimers/MiniLMv2-L6-H768-distilled-from-BERT-Large/tree/main) </li>
+</ul>
 #### - Datasets
 Publically available VMware text data such as VMware Docs, Blogs, etc. were used for distilling the teacher vBERT-2021-large model into vinilm-2021-from-large model. Sourced in May 2021. (~320,000 Documents)
 #### - Preprocessing
 </ul>
 #### - Model performance measures
+We benchmarked vinilm on various VMware-specific NLP downstream tasks (IR, classification, etc).
 ### Limitations and bias
 Since the model is distilled from a vBERT model based on the BERT model, it may have the same biases embedded within the original BERT model.