YuxinJiang
commited on
Commit
•
51f5a45
1
Parent(s):
3c4cc5d
Update README.md
Browse files
README.md
CHANGED
@@ -1,33 +1,36 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
---
|
4 |
# PromCSE: Improved Universal Sentence Embeddings with Prompt-based Contrastive Learning and Energy-based Learning
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
|
6 |
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sick)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sick?p=deep-continuous-prompt-for-contrastive-1)
|
|
|
7 |
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts12)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts12?p=deep-continuous-prompt-for-contrastive-1)
|
8 |
|
9 |
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts13)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts13?p=deep-continuous-prompt-for-contrastive-1)
|
|
|
10 |
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts14)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts14?p=deep-continuous-prompt-for-contrastive-1)
|
11 |
|
12 |
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts16)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts16?p=deep-continuous-prompt-for-contrastive-1)
|
13 |
-
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts15)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts15?p=deep-continuous-prompt-for-contrastive-1)
|
14 |
|
15 |
-
|
16 |
-
To be published in [**EMNLP 2022**](https://2022.emnlp.org/)
|
17 |
-
|
18 |
-
Our code is modified based on [SimCSE](https://github.com/princeton-nlp/SimCSE) and [P-tuning v2](https://github.com/THUDM/P-tuning-v2/). Here we would like to sincerely thank them for their excellent works.
|
19 |
-
|
20 |
-
We release our best model checkpoint which acquires **Top 1** results on four STS tasks:
|
21 |
|
22 |
<!-- <img src="https://github.com/YJiangcm/DCPCSE/blob/master/figure/leaderboard.png" width="700" height="380"> -->
|
23 |
|
24 |
| Model | STS12 | STS13 | STS14 | STS15 | STS16 | STS-B | SICK-R | Avg. |
|
25 |
|:-----------------------:|:-----:|:----------:|:---------:|:-----:|:-----:|:-----:|:-----:|:-----:|
|
26 |
-
| sup-PromCSE-RoBERTa-large ([huggingface](https://huggingface.co/YuxinJiang/
|
27 |
-
| unsup-PromCSE-BERT-base ([huggingface](https://huggingface.co/YuxinJiang/
|
28 |
|
29 |
If you have any questions, feel free to raise an issue.
|
30 |
|
|
|
|
|
|
|
31 |
|
32 |
## Setups
|
33 |
|
@@ -40,7 +43,65 @@ Run the following script to install the remaining dependencies,
|
|
40 |
pip install -r requirements.txt
|
41 |
```
|
42 |
|
43 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
|
45 |
**Data**
|
46 |
|
@@ -114,57 +175,6 @@ All our experiments are conducted on Nvidia 3090 GPUs.
|
|
114 |
| Valid steps | 125 | 125 | 125 | 125 |
|
115 |
|
116 |
|
117 |
-
## Evaluation
|
118 |
-
Our evaluation code for sentence embeddings is based on a modified version of [SentEval](https://github.com/facebookresearch/SentEval). It evaluates sentence embeddings on semantic textual similarity (STS) tasks and downstream transfer tasks. For STS tasks, our evaluation takes the "all" setting, and report Spearman's correlation. The STS tasks include seven standard STS tasks (STS12-16, STSB, SICK-R) and one domain-shifted STS task (CxC).
|
119 |
-
|
120 |
-
Before evaluation, please download the evaluation datasets by running
|
121 |
-
```bash
|
122 |
-
cd SentEval/data/downstream/
|
123 |
-
bash download_dataset.sh
|
124 |
-
```
|
125 |
-
To evaluate the domain shift robustness of sentence embedding, we need to download [CxC](https://drive.google.com/drive/folders/1ZnRlVlc4kFsKbaWj9cFbb8bQU0fxzz1c?usp=sharing), and put the data into *SentEval/data/downstream/CocoCXC*
|
126 |
-
|
127 |
-
Then come back to the root directory, you can evaluate the well trained models using our evaluation code. For example,
|
128 |
-
```bash
|
129 |
-
python evaluation.py \
|
130 |
-
--model_name_or_path YuxinJiang/sup-promcse-roberta-large \
|
131 |
-
--pooler_type cls \
|
132 |
-
--task_set sts \
|
133 |
-
--mode test \
|
134 |
-
--pre_seq_len 10
|
135 |
-
```
|
136 |
-
which is expected to output the results in a tabular format:
|
137 |
-
```
|
138 |
-
------ test ------
|
139 |
-
+-------+-------+-------+-------+-------+--------------+-----------------+-------+
|
140 |
-
| STS12 | STS13 | STS14 | STS15 | STS16 | STSBenchmark | SICKRelatedness | Avg. |
|
141 |
-
+-------+-------+-------+-------+-------+--------------+-----------------+-------+
|
142 |
-
| 79.14 | 88.64 | 83.73 | 87.33 | 84.57 | 87.84 | 82.07 | 84.76 |
|
143 |
-
+-------+-------+-------+-------+-------+--------------+-----------------+-------+
|
144 |
-
```
|
145 |
-
|
146 |
-
Arguments for the evaluation script are as follows,
|
147 |
-
|
148 |
-
* `--model_name_or_path`: The name or path of a `transformers`-based pre-trained checkpoint.
|
149 |
-
* `--pooler_type`: Pooling method. Now we support
|
150 |
-
* `cls` (default): Use the representation of `[CLS]` token. A linear+activation layer is applied after the representation (it's in the standard BERT implementation). If you use **supervised PromCSE**, you should use this option.
|
151 |
-
* `cls_before_pooler`: Use the representation of `[CLS]` token without the extra linear+activation. If you use **unsupervised PromCSE**, you should take this option.
|
152 |
-
* `avg`: Average embeddings of the last layer. If you use checkpoints of SBERT/SRoBERTa ([paper](https://arxiv.org/abs/1908.10084)), you should use this option.
|
153 |
-
* `avg_top2`: Average embeddings of the last two layers.
|
154 |
-
* `avg_first_last`: Average embeddings of the first and last layers. If you use vanilla BERT or RoBERTa, this works the best.
|
155 |
-
* `--mode`: Evaluation mode
|
156 |
-
* `test` (default): The default test mode. To faithfully reproduce our results, you should use this option.
|
157 |
-
* `dev`: Report the development set results. Note that in STS tasks, only `STS-B` and `SICK-R` have development sets, so we only report their numbers. It also takes a fast mode for transfer tasks, so the running time is much shorter than the `test` mode (though numbers are slightly lower).
|
158 |
-
* `fasttest`: It is the same as `test`, but with a fast mode so the running time is much shorter, but the reported numbers may be lower (only for transfer tasks).
|
159 |
-
* `--task_set`: What set of tasks to evaluate on (if set, it will override `--tasks`)
|
160 |
-
* `sts` (default): Evaluate on STS tasks, including `STS 12~16`, `STS-B` and `SICK-R`. This is the most commonly-used set of tasks to evaluate the quality of sentence embeddings.
|
161 |
-
* `cococxc`: Evaluate on domain-shifted CXC task.
|
162 |
-
* `transfer`: Evaluate on transfer tasks.
|
163 |
-
* `full`: Evaluate on both STS and transfer tasks.
|
164 |
-
* `na`: Manually set tasks by `--tasks`.
|
165 |
-
* `--tasks`: Specify which dataset(s) to evaluate on. Will be overridden if `--task_set` is not `na`. See the code for a full list of tasks.
|
166 |
-
* `--pre_seq_len`: The length of deep continuous prompt.
|
167 |
-
|
168 |
## Usage
|
169 |
We provide *tool.py* to easily compute the cosine similarities between two groups of sentences as well as build index for a group of sentences and search among them. You can have a try by runing
|
170 |
```bash
|
@@ -238,6 +248,7 @@ Retrieval results for query: A woman is making a photo.
|
|
238 |
An animal is biting a persons finger. (cosine similarity: 0.6126)
|
239 |
```
|
240 |
|
|
|
241 |
## Citation
|
242 |
|
243 |
Please cite our paper by:
|
@@ -251,4 +262,4 @@ Please cite our paper by:
|
|
251 |
archivePrefix={arXiv},
|
252 |
primaryClass={cs.CL}
|
253 |
}
|
254 |
-
```
|
|
|
|
|
|
|
|
|
1 |
# PromCSE: Improved Universal Sentence Embeddings with Prompt-based Contrastive Learning and Energy-based Learning
|
2 |
+
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Ubtve9fqljSTbFH4dYZkXOxitrUl6Az3?usp=sharing)
|
3 |
+
arXiv link: https://arxiv.org/abs/2203.06875v2
|
4 |
+
To be published in [**EMNLP 2022**](https://2022.emnlp.org/)
|
5 |
+
|
6 |
+
Our code is modified based on [SimCSE](https://github.com/princeton-nlp/SimCSE) and [P-tuning v2](https://github.com/THUDM/P-tuning-v2/). Here we would like to sincerely thank them for their excellent works.
|
7 |
+
|
8 |
+
We have released our supervised and unsupervised models on huggingface, which acquire **Top 1** results on 4 standard STS tasks:
|
9 |
|
10 |
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sick)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sick?p=deep-continuous-prompt-for-contrastive-1)
|
11 |
+
|
12 |
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts12)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts12?p=deep-continuous-prompt-for-contrastive-1)
|
13 |
|
14 |
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts13)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts13?p=deep-continuous-prompt-for-contrastive-1)
|
15 |
+
|
16 |
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts14)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts14?p=deep-continuous-prompt-for-contrastive-1)
|
17 |
|
18 |
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts16)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts16?p=deep-continuous-prompt-for-contrastive-1)
|
|
|
19 |
|
20 |
+
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts15)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts15?p=deep-continuous-prompt-for-contrastive-1)
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
<!-- <img src="https://github.com/YJiangcm/DCPCSE/blob/master/figure/leaderboard.png" width="700" height="380"> -->
|
23 |
|
24 |
| Model | STS12 | STS13 | STS14 | STS15 | STS16 | STS-B | SICK-R | Avg. |
|
25 |
|:-----------------------:|:-----:|:----------:|:---------:|:-----:|:-----:|:-----:|:-----:|:-----:|
|
26 |
+
| sup-PromCSE-RoBERTa-large ([huggingface](https://huggingface.co/YuxinJiang/sup-promcse-roberta-large)) | 79.14 |88.64| 83.73| 87.33 |84.57| 87.84| 82.07| 84.76|
|
27 |
+
| unsup-PromCSE-BERT-base ([huggingface](https://huggingface.co/YuxinJiang/unsup-promcse-bert-base-uncased)) | 73.03 |85.18| 76.70| 84.19 |79.69| 80.62| 70.00| 78.49|
|
28 |
|
29 |
If you have any questions, feel free to raise an issue.
|
30 |
|
31 |
+
[//]: <## Architecture>
|
32 |
+
[//]: <We add multi-layer trainable dense vectors as soft prompts to the input sequence, which means the input embeddings as well as each layer's hidden embeddings of prompts are optimized (the orange blocks). Note that all parameters of the pre-trained model are frozen (the blue blocks), thus reducing the number of tunable parameters to around **0.1\%**. The [CLS] token embedding of the last layer is selected as the sentence representation. The contrastive framework is the same as SimCSE.>
|
33 |
+
|
34 |
|
35 |
## Setups
|
36 |
|
|
|
43 |
pip install -r requirements.txt
|
44 |
```
|
45 |
|
46 |
+
## Train PromCSE
|
47 |
+
|
48 |
+
In the following section, we describe how to train a PromCSE model by using our code.
|
49 |
+
|
50 |
+
|
51 |
+
### Evaluation
|
52 |
+
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Ubtve9fqljSTbFH4dYZkXOxitrUl6Az3?usp=sharing)
|
53 |
+
|
54 |
+
Our evaluation code for sentence embeddings is based on a modified version of [SentEval](https://github.com/facebookresearch/SentEval). It evaluates sentence embeddings on semantic textual similarity (STS) tasks and downstream transfer tasks. For STS tasks, our evaluation takes the "all" setting, and report Spearman's correlation. The STS tasks include seven standard STS tasks (STS12-16, STSB, SICK-R) and one domain-shifted STS task (CxC).
|
55 |
+
|
56 |
+
Before evaluation, please download the evaluation datasets by running
|
57 |
+
```bash
|
58 |
+
cd SentEval/data/downstream/
|
59 |
+
bash download_dataset.sh
|
60 |
+
```
|
61 |
+
To evaluate the domain shift robustness of sentence embedding, we need to download [CxC](https://drive.google.com/drive/folders/1ZnRlVlc4kFsKbaWj9cFbb8bQU0fxzz1c?usp=sharing), and put the data into *SentEval/data/downstream/CocoCXC*
|
62 |
+
|
63 |
+
Then come back to the root directory, you can evaluate the well trained models using our evaluation code. For example,
|
64 |
+
```bash
|
65 |
+
python evaluation.py \
|
66 |
+
--model_name_or_path YuxinJiang/sup-promcse-roberta-large \
|
67 |
+
--pooler_type cls \
|
68 |
+
--task_set sts \
|
69 |
+
--mode test \
|
70 |
+
--pre_seq_len 10
|
71 |
+
```
|
72 |
+
which is expected to output the results in a tabular format:
|
73 |
+
```
|
74 |
+
------ test ------
|
75 |
+
+-------+-------+-------+-------+-------+--------------+-----------------+-------+
|
76 |
+
| STS12 | STS13 | STS14 | STS15 | STS16 | STSBenchmark | SICKRelatedness | Avg. |
|
77 |
+
+-------+-------+-------+-------+-------+--------------+-----------------+-------+
|
78 |
+
| 79.14 | 88.64 | 83.73 | 87.33 | 84.57 | 87.84 | 82.07 | 84.76 |
|
79 |
+
+-------+-------+-------+-------+-------+--------------+-----------------+-------+
|
80 |
+
```
|
81 |
+
Arguments for the evaluation script are as follows,
|
82 |
+
|
83 |
+
* `--model_name_or_path`: The name or path of a `transformers`-based pre-trained checkpoint.
|
84 |
+
* `--pooler_type`: Pooling method. Now we support
|
85 |
+
* `cls` (default): Use the representation of `[CLS]` token. A linear+activation layer is applied after the representation (it's in the standard BERT implementation). If you use **supervised PromCSE**, you should use this option.
|
86 |
+
* `cls_before_pooler`: Use the representation of `[CLS]` token without the extra linear+activation. If you use **unsupervised PromCSE**, you should take this option.
|
87 |
+
* `avg`: Average embeddings of the last layer. If you use checkpoints of SBERT/SRoBERTa ([paper](https://arxiv.org/abs/1908.10084)), you should use this option.
|
88 |
+
* `avg_top2`: Average embeddings of the last two layers.
|
89 |
+
* `avg_first_last`: Average embeddings of the first and last layers. If you use vanilla BERT or RoBERTa, this works the best.
|
90 |
+
* `--mode`: Evaluation mode
|
91 |
+
* `test` (default): The default test mode. To faithfully reproduce our results, you should use this option.
|
92 |
+
* `dev`: Report the development set results. Note that in STS tasks, only `STS-B` and `SICK-R` have development sets, so we only report their numbers. It also takes a fast mode for transfer tasks, so the running time is much shorter than the `test` mode (though numbers are slightly lower).
|
93 |
+
* `fasttest`: It is the same as `test`, but with a fast mode so the running time is much shorter, but the reported numbers may be lower (only for transfer tasks).
|
94 |
+
* `--task_set`: What set of tasks to evaluate on (if set, it will override `--tasks`)
|
95 |
+
* `sts` (default): Evaluate on STS tasks, including `STS 12~16`, `STS-B` and `SICK-R`. This is the most commonly-used set of tasks to evaluate the quality of sentence embeddings.
|
96 |
+
* `cococxc`: Evaluate on domain-shifted CXC task.
|
97 |
+
* `transfer`: Evaluate on transfer tasks.
|
98 |
+
* `full`: Evaluate on both STS and transfer tasks.
|
99 |
+
* `na`: Manually set tasks by `--tasks`.
|
100 |
+
* `--tasks`: Specify which dataset(s) to evaluate on. Will be overridden if `--task_set` is not `na`. See the code for a full list of tasks.
|
101 |
+
* `--pre_seq_len`: The length of deep continuous prompt.
|
102 |
+
|
103 |
+
|
104 |
+
### Training
|
105 |
|
106 |
**Data**
|
107 |
|
|
|
175 |
| Valid steps | 125 | 125 | 125 | 125 |
|
176 |
|
177 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
178 |
## Usage
|
179 |
We provide *tool.py* to easily compute the cosine similarities between two groups of sentences as well as build index for a group of sentences and search among them. You can have a try by runing
|
180 |
```bash
|
|
|
248 |
An animal is biting a persons finger. (cosine similarity: 0.6126)
|
249 |
```
|
250 |
|
251 |
+
|
252 |
## Citation
|
253 |
|
254 |
Please cite our paper by:
|
|
|
262 |
archivePrefix={arXiv},
|
263 |
primaryClass={cs.CL}
|
264 |
}
|
265 |
+
```
|