SetFit with google-bert/bert-large-uncased

This is a SetFit model trained on the bhujith10/multi_class_classification_dataset dataset that can be used for Text Classification. This SetFit model uses google-bert/bert-large-uncased as the Sentence Transformer embedding model. A SetFitHead instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Type: SetFit
Sentence Transformer body: google-bert/bert-large-uncased
Classification head: a SetFitHead instance
Maximum Sequence Length: 512 tokens
Number of Classes: 6 classes
Training Dataset: bhujith10/multi_class_classification_dataset

Model Sources

Repository: SetFit on GitHub
Paper: Efficient Few-Shot Learning Without Prompts
Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("bhujith10/bert-large-uncased-setfit_finetuned")
# Run inference
preds = model("Title: On the isoperimetric quotient over scalar-flat conformal classes,
Abstract: Let $(M,g)$ be a smooth compact Riemannian manifold of dimension $n$ with
smooth boundary $\partial M$. Suppose that $(M,g)$ admits a scalar-flat
conformal metric. We prove that the supremum of the isoperimetric quotient over
the scalar-flat conformal class is strictly larger than the best constant of
the isoperimetric inequality in the Euclidean space, and consequently is
achieved, if either (i) $n\ge 12$ and $\partial M$ has a nonumbilic point; or
(ii) $n\ge 10$, $\partial M$ is umbilic and the Weyl tensor does not vanish at
some boundary point.")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	23	145.8467	280

Training Hyperparameters

batch_size: (4, 4)
num_epochs: (2, 2)
max_steps: -1
sampling_strategy: oversampling
body_learning_rate: (2e-05, 1e-05)
head_learning_rate: 0.01
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: False
use_amp: False
warmup_proportion: 0.1
l2_weight: 0.01
seed: 42
eval_max_steps: -1
load_best_model_at_end: True

Training Results

Epoch	Step	Training Loss	Validation Loss
0.0003	1	0.22	-
0.0138	50	0.3706	-
0.0276	100	0.2389	-
0.0414	150	0.1628	-
0.0551	200	0.1401	-
0.0689	250	0.1043	-
0.0827	300	0.1047	-
0.0965	350	0.098	-
0.1103	400	0.0931	-
0.1241	450	0.1002	-
0.1379	500	0.0837	-
0.1516	550	0.0673	-
0.1654	600	0.0709	-
0.1792	650	0.08	-
0.1930	700	0.0719	-
0.2068	750	0.0805	-
0.2206	800	0.059	-
0.2344	850	0.0957	-
0.2481	900	0.0614	-
0.2619	950	0.0887	-
0.2757	1000	0.0713	-
0.2895	1050	0.0734	-
0.3033	1100	0.0519	-
0.3171	1150	0.0802	-
0.3309	1200	0.0817	-
0.3446	1250	0.0665	-
0.3584	1300	0.0515	-
0.3722	1350	0.0764	-
0.3860	1400	0.0564	-
0.3998	1450	0.0512	-
0.4136	1500	0.052	-
0.4274	1550	0.0398	-
0.4411	1600	0.0473	-
0.4549	1650	0.0433	-
0.4687	1700	0.0621	-
0.4825	1750	0.0506	-
0.4963	1800	0.0395	-
0.5101	1850	0.0516	-
0.5238	1900	0.0431	-
0.5376	1950	0.037	-
0.5514	2000	0.0299	-
0.5652	2050	0.0398	-
0.5790	2100	0.0335	-
0.5928	2150	0.0438	-
0.6066	2200	0.0436	-
0.6203	2250	0.0345	-
0.6341	2300	0.0396	-
0.6479	2350	0.0381	-
0.6617	2400	0.0377	-
0.6755	2450	0.0287	-
0.6893	2500	0.0393	-
0.7031	2550	0.0309	-
0.7168	2600	0.0363	-
0.7306	2650	0.0347	-
0.7444	2700	0.0299	-
0.7582	2750	0.0305	-
0.7720	2800	0.0349	-
0.7858	2850	0.0385	-
0.7996	2900	0.0412	-
0.8133	2950	0.0336	-
0.8271	3000	0.0422	-
0.8409	3050	0.0249	-
0.8547	3100	0.0285	-
0.8685	3150	0.0258	-
0.8823	3200	0.0309	-
0.8961	3250	0.0246	-
0.9098	3300	0.0271	-
0.9236	3350	0.0285	-
0.9374	3400	0.0318	-
0.9512	3450	0.0287	-
0.9650	3500	0.0298	-
0.9788	3550	0.021	-
0.9926	3600	0.036	-
1.0	3627	-	0.1036
1.0063	3650	0.0257	-
1.0201	3700	0.02	-
1.0339	3750	0.0333	-
1.0477	3800	0.0339	-
1.0615	3850	0.0283	-
1.0753	3900	0.0233	-
1.0891	3950	0.0311	-
1.1028	4000	0.0296	-
1.1166	4050	0.0271	-
1.1304	4100	0.0321	-
1.1442	4150	0.0221	-
1.1580	4200	0.026	-
1.1718	4250	0.0283	-
1.1856	4300	0.0378	-
1.1993	4350	0.0225	-
1.2131	4400	0.0237	-
1.2269	4450	0.0254	-
1.2407	4500	0.0253	-
1.2545	4550	0.023	-
1.2683	4600	0.0265	-
1.2821	4650	0.0255	-
1.2958	4700	0.0278	-
1.3096	4750	0.0285	-
1.3234	4800	0.0234	-
1.3372	4850	0.0282	-
1.3510	4900	0.0197	-
1.3648	4950	0.0284	-
1.3785	5000	0.0326	-
1.3923	5050	0.0233	-
1.4061	5100	0.0386	-
1.4199	5150	0.0308	-
1.4337	5200	0.0218	-
1.4475	5250	0.0288	-
1.4613	5300	0.0251	-
1.4750	5350	0.0255	-
1.4888	5400	0.0261	-
1.5026	5450	0.0253	-
1.5164	5500	0.0313	-
1.5302	5550	0.0277	-
1.5440	5600	0.0252	-
1.5578	5650	0.0293	-
1.5715	5700	0.0334	-
1.5853	5750	0.0285	-
1.5991	5800	0.0269	-
1.6129	5850	0.0267	-
1.6267	5900	0.0313	-
1.6405	5950	0.0243	-
1.6543	6000	0.0301	-
1.6680	6050	0.0266	-
1.6818	6100	0.0276	-
1.6956	6150	0.0293	-
1.7094	6200	0.0291	-
1.7232	6250	0.031	-
1.7370	6300	0.0283	-
1.7508	6350	0.0238	-
1.7645	6400	0.0261	-
1.7783	6450	0.0196	-
1.7921	6500	0.034	-
1.8059	6550	0.0255	-
1.8197	6600	0.0231	-
1.8335	6650	0.0256	-
1.8473	6700	0.0207	-
1.8610	6750	0.0325	-
1.8748	6800	0.0238	-
1.8886	6850	0.0277	-
1.9024	6900	0.0239	-
1.9162	6950	0.0239	-
1.9300	7000	0.0227	-
1.9438	7050	0.0236	-
1.9575	7100	0.0216	-
1.9713	7150	0.0248	-
1.9851	7200	0.0244	-
1.9989	7250	0.0203	-
2.0	7254	-	0.1068

Framework Versions

Python: 3.10.12
SetFit: 1.1.0
Sentence Transformers: 3.3.1
Transformers: 4.45.2
PyTorch: 2.1.0+cu118
Datasets: 3.2.0
Tokenizers: 0.20.3

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}

bhujith10
/

bert-large-uncased-setfit_finetuned