Update README.md
Browse files
README.md
CHANGED
@@ -6,10 +6,8 @@ tags:
|
|
6 |
language: en
|
7 |
license: apache-2.0
|
8 |
---
|
9 |
-
# **m**utual **i**nformation **C**ontrastive **S**entence **E**mbedding (**miCSE**)
|
10 |
-
|
11 |
-
|
12 |
-
Check out the pre-print on arXiv: [![arXiv](https://img.shields.io/badge/arXiv-2109.05105-29d634.svg)](https://arxiv.org/abs/2211.04928)
|
13 |
|
14 |
# Brief Model Description
|
15 |
The **miCSE** language model is trained for sentence similarity computation. Training the model imposes alignment between the attention pattern of different views (embeddings of augmentations) during contrastive learning. Intuitively, learning sentence embeddings with miCSE entails enforcing __syntactic consistency across dropout augmented views__. Practically, this is achieved by regularizing the self-attention distribution. By regularizing self-attention during training, representation learning becomes much more sample efficient. Hence, self-supervised learning becomes tractable even when the training set is limited in size. This property makes miCSE particularly interesting for __real-world applications__, where training data is typically limited.
|
|
|
6 |
language: en
|
7 |
license: apache-2.0
|
8 |
---
|
9 |
+
# **m**utual **i**nformation **C**ontrastive **S**entence **E**mbedding (**miCSE**) for Low-shot Sentence Embeddings
|
10 |
+
Paper to appear at [ACL 2023](https://2023.aclweb.org/). Check out the pre-print on arXiv: [![arXiv](https://img.shields.io/badge/arXiv-2109.05105-29d634.svg)](https://arxiv.org/abs/2211.04928)
|
|
|
|
|
11 |
|
12 |
# Brief Model Description
|
13 |
The **miCSE** language model is trained for sentence similarity computation. Training the model imposes alignment between the attention pattern of different views (embeddings of augmentations) during contrastive learning. Intuitively, learning sentence embeddings with miCSE entails enforcing __syntactic consistency across dropout augmented views__. Practically, this is achieved by regularizing the self-attention distribution. By regularizing self-attention during training, representation learning becomes much more sample efficient. Hence, self-supervised learning becomes tractable even when the training set is limited in size. This property makes miCSE particularly interesting for __real-world applications__, where training data is typically limited.
|