bhujith10's picture
Push model using huggingface_hub.
4335984 verified
metadata
tags:
  - setfit
  - sentence-transformers
  - text-classification
  - generated_from_setfit_trainer
widget:
  - text: >-
      Title: Robust Contextual Bandit via the Capped-$\ell_{2}$ norm,

      Abstract: This paper considers the actor-critic contextual bandit for the
      mobile health

      (mHealth) intervention. The state-of-the-art decision-making methods in
      mHealth

      generally assume that the noise in the dynamic system follows the Gaussian

      distribution. Those methods use the least-square-based algorithm to
      estimate

      the expected reward, which is prone to the existence of outliers. To deal
      with

      the issue of outliers, we propose a novel robust actor-critic contextual
      bandit

      method for the mHealth intervention. In the critic updating, the

      capped-$\ell_{2}$ norm is used to measure the approximation error, which

      prevents outliers from dominating our objective. A set of weights could be

      achieved from the critic updating. Considering them gives a weighted
      objective

      for the actor updating. It provides the badly noised sample in the critic

      updating with zero weights for the actor updating. As a result, the
      robustness

      of both actor-critic updating is enhanced. There is a key parameter in the

      capped-$\ell_{2}$ norm. We provide a reliable method to properly set it by

      making use of one of the most fundamental definitions of outliers in

      statistics. Extensive experiment results demonstrate that our method can

      achieve almost identical results compared with the state-of-the-art
      methods on

      the dataset without outliers and dramatically outperform them on the
      datasets

      noised by outliers.
  - text: >-
      Title: Increasing the Reusability of Enforcers with Lifecycle Events,

      Abstract: Runtime enforcement can be effectively used to improve the
      reliability of

      software applications. However, it often requires the definition of ad hoc

      policies and enforcement strategies, which might be expensive to identify
      and

      implement. This paper discusses how to exploit lifecycle events to obtain

      useful enforcement strategies that can be easily reused across
      applications,

      thus reducing the cost of adoption of the runtime enforcement technology.
      The

      paper finally sketches how this idea can be used to define libraries that
      can

      automatically overcome problems related to applications misusing them.
  - text: >-
      Title: Generalized Minimum Distance Estimators in Linear Regression with
      Dependent Errors,

      Abstract: This paper discusses minimum distance estimation method in the
      linear

      regression model with dependent errors which are strongly mixing. The

      regression parameters are estimated through the minimum distance
      estimation

      method, and asymptotic distributional properties of the estimators are

      discussed. A simulation study compares the performance of the minimum
      distance

      estimator with other well celebrated estimator. This simulation study
      shows the

      superiority of the minimum distance estimator over another estimator.
      KoulMde

      (R package) which was used for the simulation study is available online.
      See

      section 4 for the detail.
  - text: >-
      Title: On the isoperimetric quotient over scalar-flat conformal classes,

      Abstract: Let $(M,g)$ be a smooth compact Riemannian manifold of dimension
      $n$ with

      smooth boundary $\partial M$. Suppose that $(M,g)$ admits a scalar-flat

      conformal metric. We prove that the supremum of the isoperimetric quotient
      over

      the scalar-flat conformal class is strictly larger than the best constant
      of

      the isoperimetric inequality in the Euclidean space, and consequently is

      achieved, if either (i) $n\ge 12$ and $\partial M$ has a nonumbilic point;
      or

      (ii) $n\ge 10$, $\partial M$ is umbilic and the Weyl tensor does not
      vanish at

      some boundary point.
  - text: >-
      Title: Monte Carlo Tree Search with Sampled Information Relaxation Dual
      Bounds,

      Abstract: Monte Carlo Tree Search (MCTS), most famously used in game-play
      artificial

      intelligence (e.g., the game of Go), is a well-known strategy for
      constructing

      approximate solutions to sequential decision problems. Its primary
      innovation

      is the use of a heuristic, known as a default policy, to obtain Monte
      Carlo

      estimates of downstream values for states in a decision tree. This
      information

      is used to iteratively expand the tree towards regions of states and
      actions

      that an optimal policy might visit. However, to guarantee convergence to
      the

      optimal action, MCTS requires the entire tree to be expanded
      asymptotically. In

      this paper, we propose a new technique called Primal-Dual MCTS that
      utilizes

      sampled information relaxation upper bounds on potential actions, creating
      the

      possibility of "ignoring" parts of the tree that stem from highly
      suboptimal

      choices. This allows us to prove that despite converging to a partial
      decision

      tree in the limit, the recommended action from Primal-Dual MCTS is
      optimal. The

      new approach shows significant promise when used to optimize the behavior
      of a

      single driver navigating a graph while operating on a ride-sharing
      platform.

      Numerical experiments on a real dataset of 7,000 trips in New Jersey
      suggest

      that Primal-Dual MCTS improves upon standard MCTS by producing deeper
      decision

      trees and exhibits a reduced sensitivity to the size of the action space.
metrics:
  - accuracy
pipeline_tag: text-classification
library_name: setfit
inference: false
datasets:
  - bhujith10/multi_class_classification_dataset
base_model: google-bert/bert-large-uncased

SetFit with google-bert/bert-large-uncased

This is a SetFit model trained on the bhujith10/multi_class_classification_dataset dataset that can be used for Text Classification. This SetFit model uses google-bert/bert-large-uncased as the Sentence Transformer embedding model. A SetFitHead instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Sources

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("bhujith10/bert-large-uncased-setfit_finetuned")
# Run inference
preds = model("Title: On the isoperimetric quotient over scalar-flat conformal classes,
Abstract: Let $(M,g)$ be a smooth compact Riemannian manifold of dimension $n$ with
smooth boundary $\partial M$. Suppose that $(M,g)$ admits a scalar-flat
conformal metric. We prove that the supremum of the isoperimetric quotient over
the scalar-flat conformal class is strictly larger than the best constant of
the isoperimetric inequality in the Euclidean space, and consequently is
achieved, if either (i) $n\ge 12$ and $\partial M$ has a nonumbilic point; or
(ii) $n\ge 10$, $\partial M$ is umbilic and the Weyl tensor does not vanish at
some boundary point.")

Training Details

Training Set Metrics

Training set Min Median Max
Word count 23 145.8467 280

Training Hyperparameters

  • batch_size: (4, 4)
  • num_epochs: (2, 2)
  • max_steps: -1
  • sampling_strategy: oversampling
  • body_learning_rate: (2e-05, 1e-05)
  • head_learning_rate: 0.01
  • loss: CosineSimilarityLoss
  • distance_metric: cosine_distance
  • margin: 0.25
  • end_to_end: False
  • use_amp: False
  • warmup_proportion: 0.1
  • l2_weight: 0.01
  • seed: 42
  • eval_max_steps: -1
  • load_best_model_at_end: True

Training Results

Epoch Step Training Loss Validation Loss
0.0003 1 0.22 -
0.0138 50 0.3706 -
0.0276 100 0.2389 -
0.0414 150 0.1628 -
0.0551 200 0.1401 -
0.0689 250 0.1043 -
0.0827 300 0.1047 -
0.0965 350 0.098 -
0.1103 400 0.0931 -
0.1241 450 0.1002 -
0.1379 500 0.0837 -
0.1516 550 0.0673 -
0.1654 600 0.0709 -
0.1792 650 0.08 -
0.1930 700 0.0719 -
0.2068 750 0.0805 -
0.2206 800 0.059 -
0.2344 850 0.0957 -
0.2481 900 0.0614 -
0.2619 950 0.0887 -
0.2757 1000 0.0713 -
0.2895 1050 0.0734 -
0.3033 1100 0.0519 -
0.3171 1150 0.0802 -
0.3309 1200 0.0817 -
0.3446 1250 0.0665 -
0.3584 1300 0.0515 -
0.3722 1350 0.0764 -
0.3860 1400 0.0564 -
0.3998 1450 0.0512 -
0.4136 1500 0.052 -
0.4274 1550 0.0398 -
0.4411 1600 0.0473 -
0.4549 1650 0.0433 -
0.4687 1700 0.0621 -
0.4825 1750 0.0506 -
0.4963 1800 0.0395 -
0.5101 1850 0.0516 -
0.5238 1900 0.0431 -
0.5376 1950 0.037 -
0.5514 2000 0.0299 -
0.5652 2050 0.0398 -
0.5790 2100 0.0335 -
0.5928 2150 0.0438 -
0.6066 2200 0.0436 -
0.6203 2250 0.0345 -
0.6341 2300 0.0396 -
0.6479 2350 0.0381 -
0.6617 2400 0.0377 -
0.6755 2450 0.0287 -
0.6893 2500 0.0393 -
0.7031 2550 0.0309 -
0.7168 2600 0.0363 -
0.7306 2650 0.0347 -
0.7444 2700 0.0299 -
0.7582 2750 0.0305 -
0.7720 2800 0.0349 -
0.7858 2850 0.0385 -
0.7996 2900 0.0412 -
0.8133 2950 0.0336 -
0.8271 3000 0.0422 -
0.8409 3050 0.0249 -
0.8547 3100 0.0285 -
0.8685 3150 0.0258 -
0.8823 3200 0.0309 -
0.8961 3250 0.0246 -
0.9098 3300 0.0271 -
0.9236 3350 0.0285 -
0.9374 3400 0.0318 -
0.9512 3450 0.0287 -
0.9650 3500 0.0298 -
0.9788 3550 0.021 -
0.9926 3600 0.036 -
1.0 3627 - 0.1036
1.0063 3650 0.0257 -
1.0201 3700 0.02 -
1.0339 3750 0.0333 -
1.0477 3800 0.0339 -
1.0615 3850 0.0283 -
1.0753 3900 0.0233 -
1.0891 3950 0.0311 -
1.1028 4000 0.0296 -
1.1166 4050 0.0271 -
1.1304 4100 0.0321 -
1.1442 4150 0.0221 -
1.1580 4200 0.026 -
1.1718 4250 0.0283 -
1.1856 4300 0.0378 -
1.1993 4350 0.0225 -
1.2131 4400 0.0237 -
1.2269 4450 0.0254 -
1.2407 4500 0.0253 -
1.2545 4550 0.023 -
1.2683 4600 0.0265 -
1.2821 4650 0.0255 -
1.2958 4700 0.0278 -
1.3096 4750 0.0285 -
1.3234 4800 0.0234 -
1.3372 4850 0.0282 -
1.3510 4900 0.0197 -
1.3648 4950 0.0284 -
1.3785 5000 0.0326 -
1.3923 5050 0.0233 -
1.4061 5100 0.0386 -
1.4199 5150 0.0308 -
1.4337 5200 0.0218 -
1.4475 5250 0.0288 -
1.4613 5300 0.0251 -
1.4750 5350 0.0255 -
1.4888 5400 0.0261 -
1.5026 5450 0.0253 -
1.5164 5500 0.0313 -
1.5302 5550 0.0277 -
1.5440 5600 0.0252 -
1.5578 5650 0.0293 -
1.5715 5700 0.0334 -
1.5853 5750 0.0285 -
1.5991 5800 0.0269 -
1.6129 5850 0.0267 -
1.6267 5900 0.0313 -
1.6405 5950 0.0243 -
1.6543 6000 0.0301 -
1.6680 6050 0.0266 -
1.6818 6100 0.0276 -
1.6956 6150 0.0293 -
1.7094 6200 0.0291 -
1.7232 6250 0.031 -
1.7370 6300 0.0283 -
1.7508 6350 0.0238 -
1.7645 6400 0.0261 -
1.7783 6450 0.0196 -
1.7921 6500 0.034 -
1.8059 6550 0.0255 -
1.8197 6600 0.0231 -
1.8335 6650 0.0256 -
1.8473 6700 0.0207 -
1.8610 6750 0.0325 -
1.8748 6800 0.0238 -
1.8886 6850 0.0277 -
1.9024 6900 0.0239 -
1.9162 6950 0.0239 -
1.9300 7000 0.0227 -
1.9438 7050 0.0236 -
1.9575 7100 0.0216 -
1.9713 7150 0.0248 -
1.9851 7200 0.0244 -
1.9989 7250 0.0203 -
2.0 7254 - 0.1068

Framework Versions

  • Python: 3.10.12
  • SetFit: 1.1.0
  • Sentence Transformers: 3.3.1
  • Transformers: 4.45.2
  • PyTorch: 2.1.0+cu118
  • Datasets: 3.2.0
  • Tokenizers: 0.20.3

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}