SetFit with sentence-transformers/paraphrase-mpnet-base-v2

This is a SetFit model that can be used for Text Classification. This SetFit model uses sentence-transformers/paraphrase-mpnet-base-v2 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Type: SetFit
Sentence Transformer body: sentence-transformers/paraphrase-mpnet-base-v2
Classification head: a LogisticRegression instance
Maximum Sequence Length: 512 tokens
Number of Classes: 7 classes

Model Sources

Repository: SetFit on GitHub
Paper: Efficient Few-Shot Learning Without Prompts
Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts

Model Labels

Label	Examples
RequestMoveToFloor	'Please go to the 3rd floor.' 'Can you take me to floor 5?' 'I need to go to the 8th floor.'
RequestMoveToFloorByX	'Go one floor up' 'Take me up two floors' 'Move me down one level'
Confirm	"Yes, that's right." 'Sure.' 'Exactly.'
RequestEmployeeLocation	'Where is Erik Velldal’s office?' 'Which floor is Andreas Austeng on?' 'Can you tell me where Birthe Soppe’s office is?'
CurrentFloor	'Which floor are we on?' 'What floor is this?' 'Are we on the 5th floor?'
Stop	'Stop the elevator.' "Wait, don't go to that floor." 'No, not that floor.'
OutOfCoverage	"What's the capital of France?" 'How many floors does this building have?' 'Can you make a phone call for me?'

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("victomoe/setfit-intent-classifier-2")
# Run inference
preds = model("Absolutely.")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	1	5.1533	9

Label	Training Sample Count
Confirm	22
CurrentFloor	21
OutOfCoverage	22
RequestEmployeeLocation	22
RequestMoveToFloor	23
RequestMoveToFloorByX	20
Stop	20

Training Hyperparameters

batch_size: (32, 32)
num_epochs: (10, 10)
max_steps: -1
sampling_strategy: oversampling
body_learning_rate: (2e-05, 1e-05)
head_learning_rate: 0.01
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: False
use_amp: False
warmup_proportion: 0.1
l2_weight: 0.01
seed: 42
eval_max_steps: -1
load_best_model_at_end: False

Training Results

Epoch	Step	Training Loss	Validation Loss
0.0017	1	0.1415	-
0.0829	50	0.1863	-
0.1658	100	0.1559	-
0.2488	150	0.0966	-
0.3317	200	0.0363	-
0.4146	250	0.009	-
0.4975	300	0.0035	-
0.5804	350	0.0024	-
0.6633	400	0.0017	-
0.7463	450	0.0015	-
0.8292	500	0.0011	-
0.9121	550	0.0009	-
0.9950	600	0.0008	-
1.0779	650	0.0007	-
1.1609	700	0.0006	-
1.2438	750	0.0005	-
1.3267	800	0.0005	-
1.4096	850	0.0005	-
1.4925	900	0.0007	-
1.5755	950	0.0004	-
1.6584	1000	0.0004	-
1.7413	1050	0.0004	-
1.8242	1100	0.0004	-
1.9071	1150	0.0003	-
1.9900	1200	0.0003	-
2.0730	1250	0.0003	-
2.1559	1300	0.0003	-
2.2388	1350	0.0003	-
2.3217	1400	0.0003	-
2.4046	1450	0.0003	-
2.4876	1500	0.0003	-
2.5705	1550	0.0002	-
2.6534	1600	0.0002	-
2.7363	1650	0.0004	-
2.8192	1700	0.0002	-
2.9022	1750	0.0002	-
2.9851	1800	0.0002	-
3.0680	1850	0.0002	-
3.1509	1900	0.0002	-
3.2338	1950	0.0002	-
3.3167	2000	0.0002	-
3.3997	2050	0.0002	-
3.4826	2100	0.0002	-
3.5655	2150	0.0002	-
3.6484	2200	0.0002	-
3.7313	2250	0.0002	-
3.8143	2300	0.0002	-
3.8972	2350	0.0002	-
3.9801	2400	0.0002	-
4.0630	2450	0.0002	-
4.1459	2500	0.0002	-
4.2289	2550	0.0002	-
4.3118	2600	0.0002	-
4.3947	2650	0.0002	-
4.4776	2700	0.0002	-
4.5605	2750	0.0002	-
4.6434	2800	0.0001	-
4.7264	2850	0.0001	-
4.8093	2900	0.0001	-
4.8922	2950	0.0001	-
4.9751	3000	0.0001	-
5.0580	3050	0.0001	-
5.1410	3100	0.0001	-
5.2239	3150	0.0001	-
5.3068	3200	0.0001	-
5.3897	3250	0.0001	-
5.4726	3300	0.0001	-
5.5556	3350	0.0003	-
5.6385	3400	0.0004	-
5.7214	3450	0.0001	-
5.8043	3500	0.0001	-
5.8872	3550	0.0001	-
5.9701	3600	0.0001	-
6.0531	3650	0.0001	-
6.1360	3700	0.0001	-
6.2189	3750	0.0001	-
6.3018	3800	0.0001	-
6.3847	3850	0.0001	-
6.4677	3900	0.0001	-
6.5506	3950	0.0001	-
6.6335	4000	0.0001	-
6.7164	4050	0.0001	-
6.7993	4100	0.0001	-
6.8823	4150	0.0001	-
6.9652	4200	0.0001	-
7.0481	4250	0.0001	-
7.1310	4300	0.0001	-
7.2139	4350	0.0001	-
7.2968	4400	0.0001	-
7.3798	4450	0.0001	-
7.4627	4500	0.0001	-
7.5456	4550	0.0001	-
7.6285	4600	0.0001	-
7.7114	4650	0.0001	-
7.7944	4700	0.0001	-
7.8773	4750	0.0001	-
7.9602	4800	0.0001	-
8.0431	4850	0.0001	-
8.1260	4900	0.0001	-
8.2090	4950	0.0001	-
8.2919	5000	0.0001	-
8.3748	5050	0.0001	-
8.4577	5100	0.0001	-
8.5406	5150	0.0001	-
8.6235	5200	0.0001	-
8.7065	5250	0.0001	-
8.7894	5300	0.0001	-
8.8723	5350	0.0001	-
8.9552	5400	0.0001	-
9.0381	5450	0.0001	-
9.1211	5500	0.0001	-
9.2040	5550	0.0001	-
9.2869	5600	0.0001	-
9.3698	5650	0.0001	-
9.4527	5700	0.0001	-
9.5357	5750	0.0001	-
9.6186	5800	0.0001	-
9.7015	5850	0.0001	-
9.7844	5900	0.0001	-
9.8673	5950	0.0001	-
9.9502	6000	0.0001	-

Framework Versions

Python: 3.10.8
SetFit: 1.1.0
Sentence Transformers: 3.1.1
Transformers: 4.38.2
PyTorch: 2.1.2
Datasets: 2.17.1
Tokenizers: 0.15.0

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}

victomoe
/

setfit-intent-classifier-2