Teja Gollapudi commited on
Commit
e06b8a8
1 Parent(s): 260e8fe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -22,7 +22,7 @@ license: "apache-2.0"
22
  </ul>
23
 
24
  #### Motivation
25
- Based on [MiniLMv2 distillation](https://arxiv.org/pdf/2012.15828.pdf), we have distilled vBERT-2021-large into a smaller minilmv2-type model for faster inference times without a significant loss of performance.
26
 
27
  #### Intended Use
28
  The model functions as a VMware-specific Language Model.
@@ -53,7 +53,7 @@ output = model(encoded_input)
53
  ```
54
 
55
  ### Training
56
-
57
  #### - Datasets
58
  Publically available VMware text data such as VMware Docs, Blogs, etc. were used for distilling the teacher vBERT-2021-large model into vinilm-2021-from-large model. Sourced in May 2021. (~320,000 Documents)
59
  #### - Preprocessing
@@ -67,7 +67,7 @@ Publically available VMware text data such as VMware Docs, Blogs, etc. were used
67
 
68
  #### - Model performance measures
69
  We benchmarked vBERT on various VMware-specific NLP downstream tasks (IR, classification, etc).
70
- The model scored higher than the 'bert-base-uncased' model on all benchmarks.
71
 
72
  ### Limitations and bias
73
  Since the model is distilled from a vBERT model based on the BERT model, it may have the same biases embedded within the original BERT model.
 
22
  </ul>
23
 
24
  #### Motivation
25
+ Based on [MiniLMv2 distillation](https://arxiv.org/pdf/2012.15828.pdf), we have distilled vBERT-2021-large into a smaller minilmv2 model for faster inference times without a significant loss of performance.
26
 
27
  #### Intended Use
28
  The model functions as a VMware-specific Language Model.
 
53
  ```
54
 
55
  ### Training
56
+ The model is distilled from [vBERT-2021-large](https://huggingface.co/VMware/vbert-2021-large). [nreimers/MiniLMv2-L6-H768-distilled-from-BERT-Large](https://huggingface.co/nreimers/MiniLMv2-L6-H768-distilled-from-BERT-Large/tree/main) was used to initialize the weights.
57
  #### - Datasets
58
  Publically available VMware text data such as VMware Docs, Blogs, etc. were used for distilling the teacher vBERT-2021-large model into vinilm-2021-from-large model. Sourced in May 2021. (~320,000 Documents)
59
  #### - Preprocessing
 
67
 
68
  #### - Model performance measures
69
  We benchmarked vBERT on various VMware-specific NLP downstream tasks (IR, classification, etc).
70
+ The model scored higher than the 'bert-base-uncased' model on all benchmarks.
71
 
72
  ### Limitations and bias
73
  Since the model is distilled from a vBERT model based on the BERT model, it may have the same biases embedded within the original BERT model.