sergioburdisso commited on
Commit
4015f18
1 Parent(s): 879865d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -16
README.md CHANGED
@@ -1,16 +1,25 @@
1
  ---
2
- pipeline_tag: sentence-similarity
 
 
3
  tags:
4
  - sentence-transformers
5
  - feature-extraction
6
  - sentence-similarity
7
  - transformers
8
-
 
 
 
 
9
  ---
10
 
11
- # sergioburdisso/dialog2flow-single-bert-base
12
 
13
- This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
 
 
 
 
14
 
15
  <!--- Describe your model here -->
16
 
@@ -26,7 +35,7 @@ Then you can use the model like this:
26
 
27
  ```python
28
  from sentence_transformers import SentenceTransformer
29
- sentences = ["This is an example sentence", "Each sentence is converted"]
30
 
31
  model = SentenceTransformer('sergioburdisso/dialog2flow-single-bert-base')
32
  embeddings = model.encode(sentences)
@@ -51,7 +60,7 @@ def mean_pooling(model_output, attention_mask):
51
 
52
 
53
  # Sentences we want sentence embeddings for
54
- sentences = ['This is an example sentence', 'Each sentence is converted']
55
 
56
  # Load model from HuggingFace Hub
57
  tokenizer = AutoTokenizer.from_pretrained('sergioburdisso/dialog2flow-single-bert-base')
@@ -71,21 +80,23 @@ print("Sentence embeddings:")
71
  print(sentence_embeddings)
72
  ```
73
 
 
 
74
 
 
75
 
76
- ## Evaluation Results
77
-
78
- <!--- Describe how your model was evaluated -->
79
-
80
- For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name=sergioburdisso/dialog2flow-single-bert-base)
81
 
 
82
 
83
- ## Training
84
- The model was trained with the parameters:
85
 
86
  **DataLoader**:
87
 
88
- `torch.utils.data.dataloader.DataLoader` of length 24615 with parameters:
89
  ```
90
  {'batch_size': 64, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
91
  ```
@@ -98,7 +109,7 @@ Parameters of the fit()-Method:
98
  ```
99
  {
100
  "epochs": 15,
101
- "evaluation_steps": 246,
102
  "evaluator": [
103
  "spretrainer.evaluation.FewShotClassificationEvaluator.FewShotClassificationEvaluator"
104
  ],
@@ -124,4 +135,22 @@ SentenceTransformer(
124
 
125
  ## Citing & Authors
126
 
127
- <!--- Describe where people can find more information -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: en
3
+ license: mit
4
+ library_name: sentence-transformers
5
  tags:
6
  - sentence-transformers
7
  - feature-extraction
8
  - sentence-similarity
9
  - transformers
10
+ datasets:
11
+ - Salesforce/dialogstudio
12
+ pipeline_tag: sentence-similarity
13
+ base_model:
14
+ - google-bert/bert-base-uncased
15
  ---
16
 
 
17
 
18
+ # Dialog2Flow single target (BERT-base)
19
+
20
+ This is the original **D2F$_{single}$** model introduced in the paper ["Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction"](https://publications.idiap.ch/attachments/papers/2024/Burdisso_EMNLP2024_2024.pdf) published in the EMNLP 2024 main conference.
21
+
22
+ Implementation-wise, this is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or search.
23
 
24
  <!--- Describe your model here -->
25
 
 
35
 
36
  ```python
37
  from sentence_transformers import SentenceTransformer
38
+ sentences = ["your phone please", "okay may i have your telephone number please"]
39
 
40
  model = SentenceTransformer('sergioburdisso/dialog2flow-single-bert-base')
41
  embeddings = model.encode(sentences)
 
60
 
61
 
62
  # Sentences we want sentence embeddings for
63
+ sentences = ['your phone please', 'okay may i have your telephone number please']
64
 
65
  # Load model from HuggingFace Hub
66
  tokenizer = AutoTokenizer.from_pretrained('sergioburdisso/dialog2flow-single-bert-base')
 
80
  print(sentence_embeddings)
81
  ```
82
 
83
+ ## Training
84
+ The model was trained with the parameters:
85
 
86
+ **DataLoader**:
87
 
88
+ `torch.utils.data.dataloader.DataLoader` of length 363506 with parameters:
89
+ ```
90
+ {'batch_size': 64, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
91
+ ```
 
92
 
93
+ **Loss**:
94
 
95
+ `spretrainer.losses.LabeledContrastiveLoss.LabeledContrastiveLoss`
 
96
 
97
  **DataLoader**:
98
 
99
+ `torch.utils.data.dataloader.DataLoader` of length 49478 with parameters:
100
  ```
101
  {'batch_size': 64, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
102
  ```
 
109
  ```
110
  {
111
  "epochs": 15,
112
+ "evaluation_steps": 164,
113
  "evaluator": [
114
  "spretrainer.evaluation.FewShotClassificationEvaluator.FewShotClassificationEvaluator"
115
  ],
 
135
 
136
  ## Citing & Authors
137
 
138
+
139
+ ```bibtex
140
+ @inproceedings{burdisso-etal-2024-dialog2flow,
141
+ title = "Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction",
142
+ author = "Burdisso, Sergio and
143
+ Madikeri, Srikanth and
144
+ Motlicek, Petr",
145
+ booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
146
+ month = nov,
147
+ year = "2024",
148
+ address = "Miami",
149
+ publisher = "Association for Computational Linguistics",
150
+ }
151
+ ```
152
+
153
+ ## License
154
+
155
+ Copyright (c) 2024 [Idiap Research Institute](https://www.idiap.ch/).
156
+ MIT License.