YuxinJiang commited on
Commit
51f5a45
1 Parent(s): 3c4cc5d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -65
README.md CHANGED
@@ -1,33 +1,36 @@
1
- ---
2
- license: apache-2.0
3
- ---
4
  # PromCSE: Improved Universal Sentence Embeddings with Prompt-based Contrastive Learning and Energy-based Learning
 
 
 
 
 
 
 
5
 
6
  [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sick)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sick?p=deep-continuous-prompt-for-contrastive-1)
 
7
  [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts12)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts12?p=deep-continuous-prompt-for-contrastive-1)
8
 
9
  [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts13)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts13?p=deep-continuous-prompt-for-contrastive-1)
 
10
  [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts14)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts14?p=deep-continuous-prompt-for-contrastive-1)
11
 
12
  [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts16)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts16?p=deep-continuous-prompt-for-contrastive-1)
13
- [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts15)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts15?p=deep-continuous-prompt-for-contrastive-1)
14
 
15
- arXiv link: https://arxiv.org/abs/2203.06875v2
16
- To be published in [**EMNLP 2022**](https://2022.emnlp.org/)
17
-
18
- Our code is modified based on [SimCSE](https://github.com/princeton-nlp/SimCSE) and [P-tuning v2](https://github.com/THUDM/P-tuning-v2/). Here we would like to sincerely thank them for their excellent works.
19
-
20
- We release our best model checkpoint which acquires **Top 1** results on four STS tasks:
21
 
22
  <!-- <img src="https://github.com/YJiangcm/DCPCSE/blob/master/figure/leaderboard.png" width="700" height="380"> -->
23
 
24
  | Model | STS12 | STS13 | STS14 | STS15 | STS16 | STS-B | SICK-R | Avg. |
25
  |:-----------------------:|:-----:|:----------:|:---------:|:-----:|:-----:|:-----:|:-----:|:-----:|
26
- | sup-PromCSE-RoBERTa-large ([huggingface](https://huggingface.co/YuxinJiang/unsup-promcse-bert-base-uncased)) | 79.14 |88.64| 83.73| 87.33 |84.57| 87.84| 82.07| 84.76|
27
- | unsup-PromCSE-BERT-base ([huggingface](https://huggingface.co/YuxinJiang/sup-promcse-roberta-large)) | 73.03 |85.18| 76.70| 84.19 |79.69| 80.62| 70.00| 78.49|
28
 
29
  If you have any questions, feel free to raise an issue.
30
 
 
 
 
31
 
32
  ## Setups
33
 
@@ -40,7 +43,65 @@ Run the following script to install the remaining dependencies,
40
  pip install -r requirements.txt
41
  ```
42
 
43
- ## Training
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
 
45
  **Data**
46
 
@@ -114,57 +175,6 @@ All our experiments are conducted on Nvidia 3090 GPUs.
114
  | Valid steps | 125 | 125 | 125 | 125 |
115
 
116
 
117
- ## Evaluation
118
- Our evaluation code for sentence embeddings is based on a modified version of [SentEval](https://github.com/facebookresearch/SentEval). It evaluates sentence embeddings on semantic textual similarity (STS) tasks and downstream transfer tasks. For STS tasks, our evaluation takes the "all" setting, and report Spearman's correlation. The STS tasks include seven standard STS tasks (STS12-16, STSB, SICK-R) and one domain-shifted STS task (CxC).
119
-
120
- Before evaluation, please download the evaluation datasets by running
121
- ```bash
122
- cd SentEval/data/downstream/
123
- bash download_dataset.sh
124
- ```
125
- To evaluate the domain shift robustness of sentence embedding, we need to download [CxC](https://drive.google.com/drive/folders/1ZnRlVlc4kFsKbaWj9cFbb8bQU0fxzz1c?usp=sharing), and put the data into *SentEval/data/downstream/CocoCXC*
126
-
127
- Then come back to the root directory, you can evaluate the well trained models using our evaluation code. For example,
128
- ```bash
129
- python evaluation.py \
130
- --model_name_or_path YuxinJiang/sup-promcse-roberta-large \
131
- --pooler_type cls \
132
- --task_set sts \
133
- --mode test \
134
- --pre_seq_len 10
135
- ```
136
- which is expected to output the results in a tabular format:
137
- ```
138
- ------ test ------
139
- +-------+-------+-------+-------+-------+--------------+-----------------+-------+
140
- | STS12 | STS13 | STS14 | STS15 | STS16 | STSBenchmark | SICKRelatedness | Avg. |
141
- +-------+-------+-------+-------+-------+--------------+-----------------+-------+
142
- | 79.14 | 88.64 | 83.73 | 87.33 | 84.57 | 87.84 | 82.07 | 84.76 |
143
- +-------+-------+-------+-------+-------+--------------+-----------------+-------+
144
- ```
145
-
146
- Arguments for the evaluation script are as follows,
147
-
148
- * `--model_name_or_path`: The name or path of a `transformers`-based pre-trained checkpoint.
149
- * `--pooler_type`: Pooling method. Now we support
150
- * `cls` (default): Use the representation of `[CLS]` token. A linear+activation layer is applied after the representation (it's in the standard BERT implementation). If you use **supervised PromCSE**, you should use this option.
151
- * `cls_before_pooler`: Use the representation of `[CLS]` token without the extra linear+activation. If you use **unsupervised PromCSE**, you should take this option.
152
- * `avg`: Average embeddings of the last layer. If you use checkpoints of SBERT/SRoBERTa ([paper](https://arxiv.org/abs/1908.10084)), you should use this option.
153
- * `avg_top2`: Average embeddings of the last two layers.
154
- * `avg_first_last`: Average embeddings of the first and last layers. If you use vanilla BERT or RoBERTa, this works the best.
155
- * `--mode`: Evaluation mode
156
- * `test` (default): The default test mode. To faithfully reproduce our results, you should use this option.
157
- * `dev`: Report the development set results. Note that in STS tasks, only `STS-B` and `SICK-R` have development sets, so we only report their numbers. It also takes a fast mode for transfer tasks, so the running time is much shorter than the `test` mode (though numbers are slightly lower).
158
- * `fasttest`: It is the same as `test`, but with a fast mode so the running time is much shorter, but the reported numbers may be lower (only for transfer tasks).
159
- * `--task_set`: What set of tasks to evaluate on (if set, it will override `--tasks`)
160
- * `sts` (default): Evaluate on STS tasks, including `STS 12~16`, `STS-B` and `SICK-R`. This is the most commonly-used set of tasks to evaluate the quality of sentence embeddings.
161
- * `cococxc`: Evaluate on domain-shifted CXC task.
162
- * `transfer`: Evaluate on transfer tasks.
163
- * `full`: Evaluate on both STS and transfer tasks.
164
- * `na`: Manually set tasks by `--tasks`.
165
- * `--tasks`: Specify which dataset(s) to evaluate on. Will be overridden if `--task_set` is not `na`. See the code for a full list of tasks.
166
- * `--pre_seq_len`: The length of deep continuous prompt.
167
-
168
  ## Usage
169
  We provide *tool.py* to easily compute the cosine similarities between two groups of sentences as well as build index for a group of sentences and search among them. You can have a try by runing
170
  ```bash
@@ -238,6 +248,7 @@ Retrieval results for query: A woman is making a photo.
238
  An animal is biting a persons finger. (cosine similarity: 0.6126)
239
  ```
240
 
 
241
  ## Citation
242
 
243
  Please cite our paper by:
@@ -251,4 +262,4 @@ Please cite our paper by:
251
  archivePrefix={arXiv},
252
  primaryClass={cs.CL}
253
  }
254
- ```
 
 
 
 
1
  # PromCSE: Improved Universal Sentence Embeddings with Prompt-based Contrastive Learning and Energy-based Learning
2
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Ubtve9fqljSTbFH4dYZkXOxitrUl6Az3?usp=sharing)
3
+ arXiv link: https://arxiv.org/abs/2203.06875v2
4
+ To be published in [**EMNLP 2022**](https://2022.emnlp.org/)
5
+
6
+ Our code is modified based on [SimCSE](https://github.com/princeton-nlp/SimCSE) and [P-tuning v2](https://github.com/THUDM/P-tuning-v2/). Here we would like to sincerely thank them for their excellent works.
7
+
8
+ We have released our supervised and unsupervised models on huggingface, which acquire **Top 1** results on 4 standard STS tasks:
9
 
10
  [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sick)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sick?p=deep-continuous-prompt-for-contrastive-1)
11
+
12
  [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts12)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts12?p=deep-continuous-prompt-for-contrastive-1)
13
 
14
  [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts13)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts13?p=deep-continuous-prompt-for-contrastive-1)
15
+
16
  [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts14)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts14?p=deep-continuous-prompt-for-contrastive-1)
17
 
18
  [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts16)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts16?p=deep-continuous-prompt-for-contrastive-1)
 
19
 
20
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-continuous-prompt-for-contrastive-1/semantic-textual-similarity-on-sts15)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts15?p=deep-continuous-prompt-for-contrastive-1)
 
 
 
 
 
21
 
22
  <!-- <img src="https://github.com/YJiangcm/DCPCSE/blob/master/figure/leaderboard.png" width="700" height="380"> -->
23
 
24
  | Model | STS12 | STS13 | STS14 | STS15 | STS16 | STS-B | SICK-R | Avg. |
25
  |:-----------------------:|:-----:|:----------:|:---------:|:-----:|:-----:|:-----:|:-----:|:-----:|
26
+ | sup-PromCSE-RoBERTa-large ([huggingface](https://huggingface.co/YuxinJiang/sup-promcse-roberta-large)) | 79.14 |88.64| 83.73| 87.33 |84.57| 87.84| 82.07| 84.76|
27
+ | unsup-PromCSE-BERT-base ([huggingface](https://huggingface.co/YuxinJiang/unsup-promcse-bert-base-uncased)) | 73.03 |85.18| 76.70| 84.19 |79.69| 80.62| 70.00| 78.49|
28
 
29
  If you have any questions, feel free to raise an issue.
30
 
31
+ [//]: <## Architecture>
32
+ [//]: <We add multi-layer trainable dense vectors as soft prompts to the input sequence, which means the input embeddings as well as each layer's hidden embeddings of prompts are optimized (the orange blocks). Note that all parameters of the pre-trained model are frozen (the blue blocks), thus reducing the number of tunable parameters to around **0.1\%**. The [CLS] token embedding of the last layer is selected as the sentence representation. The contrastive framework is the same as SimCSE.>
33
+
34
 
35
  ## Setups
36
 
 
43
  pip install -r requirements.txt
44
  ```
45
 
46
+ ## Train PromCSE
47
+
48
+ In the following section, we describe how to train a PromCSE model by using our code.
49
+
50
+
51
+ ### Evaluation
52
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Ubtve9fqljSTbFH4dYZkXOxitrUl6Az3?usp=sharing)
53
+
54
+ Our evaluation code for sentence embeddings is based on a modified version of [SentEval](https://github.com/facebookresearch/SentEval). It evaluates sentence embeddings on semantic textual similarity (STS) tasks and downstream transfer tasks. For STS tasks, our evaluation takes the "all" setting, and report Spearman's correlation. The STS tasks include seven standard STS tasks (STS12-16, STSB, SICK-R) and one domain-shifted STS task (CxC).
55
+
56
+ Before evaluation, please download the evaluation datasets by running
57
+ ```bash
58
+ cd SentEval/data/downstream/
59
+ bash download_dataset.sh
60
+ ```
61
+ To evaluate the domain shift robustness of sentence embedding, we need to download [CxC](https://drive.google.com/drive/folders/1ZnRlVlc4kFsKbaWj9cFbb8bQU0fxzz1c?usp=sharing), and put the data into *SentEval/data/downstream/CocoCXC*
62
+
63
+ Then come back to the root directory, you can evaluate the well trained models using our evaluation code. For example,
64
+ ```bash
65
+ python evaluation.py \
66
+ --model_name_or_path YuxinJiang/sup-promcse-roberta-large \
67
+ --pooler_type cls \
68
+ --task_set sts \
69
+ --mode test \
70
+ --pre_seq_len 10
71
+ ```
72
+ which is expected to output the results in a tabular format:
73
+ ```
74
+ ------ test ------
75
+ +-------+-------+-------+-------+-------+--------------+-----------------+-------+
76
+ | STS12 | STS13 | STS14 | STS15 | STS16 | STSBenchmark | SICKRelatedness | Avg. |
77
+ +-------+-------+-------+-------+-------+--------------+-----------------+-------+
78
+ | 79.14 | 88.64 | 83.73 | 87.33 | 84.57 | 87.84 | 82.07 | 84.76 |
79
+ +-------+-------+-------+-------+-------+--------------+-----------------+-------+
80
+ ```
81
+ Arguments for the evaluation script are as follows,
82
+
83
+ * `--model_name_or_path`: The name or path of a `transformers`-based pre-trained checkpoint.
84
+ * `--pooler_type`: Pooling method. Now we support
85
+ * `cls` (default): Use the representation of `[CLS]` token. A linear+activation layer is applied after the representation (it's in the standard BERT implementation). If you use **supervised PromCSE**, you should use this option.
86
+ * `cls_before_pooler`: Use the representation of `[CLS]` token without the extra linear+activation. If you use **unsupervised PromCSE**, you should take this option.
87
+ * `avg`: Average embeddings of the last layer. If you use checkpoints of SBERT/SRoBERTa ([paper](https://arxiv.org/abs/1908.10084)), you should use this option.
88
+ * `avg_top2`: Average embeddings of the last two layers.
89
+ * `avg_first_last`: Average embeddings of the first and last layers. If you use vanilla BERT or RoBERTa, this works the best.
90
+ * `--mode`: Evaluation mode
91
+ * `test` (default): The default test mode. To faithfully reproduce our results, you should use this option.
92
+ * `dev`: Report the development set results. Note that in STS tasks, only `STS-B` and `SICK-R` have development sets, so we only report their numbers. It also takes a fast mode for transfer tasks, so the running time is much shorter than the `test` mode (though numbers are slightly lower).
93
+ * `fasttest`: It is the same as `test`, but with a fast mode so the running time is much shorter, but the reported numbers may be lower (only for transfer tasks).
94
+ * `--task_set`: What set of tasks to evaluate on (if set, it will override `--tasks`)
95
+ * `sts` (default): Evaluate on STS tasks, including `STS 12~16`, `STS-B` and `SICK-R`. This is the most commonly-used set of tasks to evaluate the quality of sentence embeddings.
96
+ * `cococxc`: Evaluate on domain-shifted CXC task.
97
+ * `transfer`: Evaluate on transfer tasks.
98
+ * `full`: Evaluate on both STS and transfer tasks.
99
+ * `na`: Manually set tasks by `--tasks`.
100
+ * `--tasks`: Specify which dataset(s) to evaluate on. Will be overridden if `--task_set` is not `na`. See the code for a full list of tasks.
101
+ * `--pre_seq_len`: The length of deep continuous prompt.
102
+
103
+
104
+ ### Training
105
 
106
  **Data**
107
 
 
175
  | Valid steps | 125 | 125 | 125 | 125 |
176
 
177
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
178
  ## Usage
179
  We provide *tool.py* to easily compute the cosine similarities between two groups of sentences as well as build index for a group of sentences and search among them. You can have a try by runing
180
  ```bash
 
248
  An animal is biting a persons finger. (cosine similarity: 0.6126)
249
  ```
250
 
251
+
252
  ## Citation
253
 
254
  Please cite our paper by:
 
262
  archivePrefix={arXiv},
263
  primaryClass={cs.CL}
264
  }
265
+ ```