zen-E
/

bert-mini-sentence-distil-unsupervised

Feature Extraction

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

zen-E commited on Oct 3, 2023

Commit

659bc91

·

1 Parent(s): 8c33234

Create README.md

Files changed (1) hide show

README.md +55 -0

README.md ADDED Viewed

	@@ -0,0 +1,55 @@

+---
+datasets:
+- ffgcc/NEWS5M
+- zen-E/NEWS5M-simcse-roberta-large-embeddings-pca-256
+language:
+- en
+metrics:
+- pearsonr
+- spearmanr
+library_name: transformers
+---
+The model is trained by knowledge distillation between the "princeton-nlp/unsup-simcse-roberta-large" and "prajjwal1/bert-mini" on the 'ffgcc/NEWS5M'.
+The model can perform inferenced by Automodel.
+The model achieves 0.825 and 0.83 for pearsonr and spearmanr respectively on STS-b test dataset.
+For more training detail, the training config and the pytorch forward function is as follows:
+```python
+config = {
+  'epoch' = 200,
+  'learning_rate' = 3e-4,
+  'batch_size' = 12288,
+  'temperature' = 0.05
+}
+```
+```python
+def forward_cos_mse_kd_unsup(self, sentences, teacher_sentence_embs):
+    """forward function for the unsupervised News5M dataset"""
+    _, o = self.bert(**sentences)
+    # cosine similarity between the first half batch and the second half batch
+    half_batch = o.size(0) // 2
+    higher_half = half_batch * 2 #skip the last datapoint when the batch size number is odd
+    cos_sim = cosine_sim(o[:half_batch], o[half_batch:higher_half])
+    cos_sim_teacher = cosine_sim(teacher_sentence_embs[:half_batch], teacher_sentence_embs[half_batch:higher_half])
+    # KL Divergence between student and teacher probabilities
+    soft_teacher_probs = F.softmax(cos_sim_teacher / self.temperature, dim=1)
+    kd_contrastive_loss = F.kl_div(F.log_softmax(cos_sim / self.temperature, dim=1),
+                            soft_teacher_probs,
+                            reduction='batchmean')
+    # MSE loss
+    kd_mse_loss = nn.MSELoss()(o, teacher_sentence_embs)/3
+    # equal weight for the two losses
+    total_loss = kd_contrastive_loss*0.5 + kd_mse_loss*0.5
+    return total_loss, kd_contrastive_loss, kd_mse_loss
+```