sap-ai-research
/

miCSE

Sentence Similarity

feature-extraction

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

TJKlein commited on May 6, 2023

Commit

3560a96

·

1 Parent(s): cd75805

Update README.md

Files changed (1) hide show

README.md +2 -4

README.md CHANGED Viewed

@@ -6,10 +6,8 @@ tags:
 language: en
 license: apache-2.0
 ---
-# **m**utual **i**nformation **C**ontrastive **S**entence **E**mbedding (**miCSE**):
-Language model of the paper to appear at [ACL 2023](https://2023.aclweb.org/) titled: "_**miCSE**: Mutual Information Contrastive Learning for Low-shot Sentence Embeddings_"
-Check out the pre-print on arXiv: [![arXiv](https://img.shields.io/badge/arXiv-2109.05105-29d634.svg)](https://arxiv.org/abs/2211.04928)
 # Brief Model Description
 The **miCSE** language model is trained for sentence similarity computation. Training the model imposes alignment between the attention pattern of different views (embeddings of augmentations) during contrastive learning. Intuitively, learning sentence embeddings with miCSE entails enforcing __syntactic consistency across dropout augmented views__. Practically, this is achieved by regularizing the self-attention distribution. By regularizing self-attention during training, representation learning becomes much more sample efficient. Hence, self-supervised learning becomes tractable even when the training set is limited in size. This property makes miCSE particularly interesting for __real-world applications__, where training data is typically limited.

 language: en
 license: apache-2.0
 ---
+# **m**utual **i**nformation **C**ontrastive **S**entence **E**mbedding (**miCSE**) for Low-shot Sentence Embeddings
+Paper to appear at [ACL 2023](https://2023.aclweb.org/). Check out the pre-print on arXiv: [![arXiv](https://img.shields.io/badge/arXiv-2109.05105-29d634.svg)](https://arxiv.org/abs/2211.04928)
 # Brief Model Description
 The **miCSE** language model is trained for sentence similarity computation. Training the model imposes alignment between the attention pattern of different views (embeddings of augmentations) during contrastive learning. Intuitively, learning sentence embeddings with miCSE entails enforcing __syntactic consistency across dropout augmented views__. Practically, this is achieved by regularizing the self-attention distribution. By regularizing self-attention during training, representation learning becomes much more sample efficient. Hence, self-supervised learning becomes tractable even when the training set is limited in size. This property makes miCSE particularly interesting for __real-world applications__, where training data is typically limited.