TJKlein commited on
Commit
3560a96
·
1 Parent(s): cd75805

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -4
README.md CHANGED
@@ -6,10 +6,8 @@ tags:
6
  language: en
7
  license: apache-2.0
8
  ---
9
- # **m**utual **i**nformation **C**ontrastive **S**entence **E**mbedding (**miCSE**):
10
- Language model of the paper to appear at [ACL 2023](https://2023.aclweb.org/) titled: "_**miCSE**: Mutual Information Contrastive Learning for Low-shot Sentence Embeddings_"
11
-
12
- Check out the pre-print on arXiv: [![arXiv](https://img.shields.io/badge/arXiv-2109.05105-29d634.svg)](https://arxiv.org/abs/2211.04928)
13
 
14
  # Brief Model Description
15
  The **miCSE** language model is trained for sentence similarity computation. Training the model imposes alignment between the attention pattern of different views (embeddings of augmentations) during contrastive learning. Intuitively, learning sentence embeddings with miCSE entails enforcing __syntactic consistency across dropout augmented views__. Practically, this is achieved by regularizing the self-attention distribution. By regularizing self-attention during training, representation learning becomes much more sample efficient. Hence, self-supervised learning becomes tractable even when the training set is limited in size. This property makes miCSE particularly interesting for __real-world applications__, where training data is typically limited.
 
6
  language: en
7
  license: apache-2.0
8
  ---
9
+ # **m**utual **i**nformation **C**ontrastive **S**entence **E**mbedding (**miCSE**) for Low-shot Sentence Embeddings
10
+ Paper to appear at [ACL 2023](https://2023.aclweb.org/). Check out the pre-print on arXiv: [![arXiv](https://img.shields.io/badge/arXiv-2109.05105-29d634.svg)](https://arxiv.org/abs/2211.04928)
 
 
11
 
12
  # Brief Model Description
13
  The **miCSE** language model is trained for sentence similarity computation. Training the model imposes alignment between the attention pattern of different views (embeddings of augmentations) during contrastive learning. Intuitively, learning sentence embeddings with miCSE entails enforcing __syntactic consistency across dropout augmented views__. Practically, this is achieved by regularizing the self-attention distribution. By regularizing self-attention during training, representation learning becomes much more sample efficient. Hence, self-supervised learning becomes tractable even when the training set is limited in size. This property makes miCSE particularly interesting for __real-world applications__, where training data is typically limited.