Teja Gollapudi commited on
Commit
eb50fc4
1 Parent(s): e06b8a8

Added info of the minilmv2 model used for initializing the weights

Browse files
Files changed (1) hide show
  1. README.md +7 -3
README.md CHANGED
@@ -53,7 +53,12 @@ output = model(encoded_input)
53
  ```
54
 
55
  ### Training
56
- The model is distilled from [vBERT-2021-large](https://huggingface.co/VMware/vbert-2021-large). [nreimers/MiniLMv2-L6-H768-distilled-from-BERT-Large](https://huggingface.co/nreimers/MiniLMv2-L6-H768-distilled-from-BERT-Large/tree/main) was used to initialize the weights.
 
 
 
 
 
57
  #### - Datasets
58
  Publically available VMware text data such as VMware Docs, Blogs, etc. were used for distilling the teacher vBERT-2021-large model into vinilm-2021-from-large model. Sourced in May 2021. (~320,000 Documents)
59
  #### - Preprocessing
@@ -66,8 +71,7 @@ Publically available VMware text data such as VMware Docs, Blogs, etc. were used
66
  </ul>
67
 
68
  #### - Model performance measures
69
- We benchmarked vBERT on various VMware-specific NLP downstream tasks (IR, classification, etc).
70
- The model scored higher than the 'bert-base-uncased' model on all benchmarks.
71
 
72
  ### Limitations and bias
73
  Since the model is distilled from a vBERT model based on the BERT model, it may have the same biases embedded within the original BERT model.
 
53
  ```
54
 
55
  ### Training
56
+ <ul>
57
+ <li>The model is distilled from [vBERT-2021-large](https://huggingface.co/VMware/vbert-2021-large) </li>
58
+ <li>Weights were initialized using [nreimers/MiniLMv2-L6-H768-distilled-from-BERT-Large](https://huggingface.co/nreimers/MiniLMv2-L6-H768-distilled-from-BERT-Large/tree/main) </li>
59
+ </ul>
60
+
61
+
62
  #### - Datasets
63
  Publically available VMware text data such as VMware Docs, Blogs, etc. were used for distilling the teacher vBERT-2021-large model into vinilm-2021-from-large model. Sourced in May 2021. (~320,000 Documents)
64
  #### - Preprocessing
 
71
  </ul>
72
 
73
  #### - Model performance measures
74
+ We benchmarked vinilm on various VMware-specific NLP downstream tasks (IR, classification, etc).
 
75
 
76
  ### Limitations and bias
77
  Since the model is distilled from a vBERT model based on the BERT model, it may have the same biases embedded within the original BERT model.