Edit model card

SetFit with BAAI/bge-base-en-v1.5

This is a SetFit model that can be used for Text Classification. This SetFit model uses BAAI/bge-base-en-v1.5 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Sources

Model Labels

Label Examples
1
  • 'Reasoning why the answer may be good:\n1. Context Grounding: The answer is directly supported by the document, which explicitly states that "With almost every line of his epic Punica Silius references Virgil."\n2. Relevance: The answer specifically addresses the question asked by identifying the title of Silius Italicus' epic where Virgil is frequently referenced.\n3. Conciseness: The answer is short, clear, and to the point, providing just the necessary information without any extraneous details.\n\nReasoning why the answer may be bad:\n- There is no evidence of deviation or lack of support from the provided document, the relevance is clearly maintained, and the answer concisely addresses the question.\n\nFinal Result: Good'
  • 'Good'
  • 'Reasoning:\n\nWhy the answer may be good:\n1. Context Grounding: The answer mentions "3,000 police," which correlates with the figure provided in the document regarding the number of French police that protected the Olympic torch relay.\n2. Relevance: The answer directly addresses the question, which asks about the number of police protecting the torch in France.\n3. Conciseness: The answer is brief and to the point without adding any unnecessary information.\n\nWhy the answer may be bad:\nThere is no evident issue with context grounding, relevance, or conciseness in the answer provided.\n\nFinal result: Good'
0
  • "**Reasoning Why the Answer May Be Good:\n- The answer correctly identifies a person associated with vice-presidential and presidential roles at Notre Dame, although it attributes the wrong timeframe for the vice-presidency.\n\nReasoning Why the Answer May Be Bad:\n- The document specifically mentions that John Francis O'Hara became vice-president in 1933, not James Edward O'Hara, indicating the answer is not well-supported by the provided document.\n- The answer provides incorrect and irrelevant information that does not address the specific question asked.\n- The question asked for the vice-president elected in 1933, and the answer incorrectly identifies the year 1934.\n\nFinal Result:**\nBad"
  • "Reasoning:\n1. Context Grounding: The document does provide the necessary information about the gross earnings of Beyoncé's second world tour. Therefore, the answer is well-supported by the document.\n2. Relevance: The answer directly responds to the specific question asked about the gross earnings of Beyoncé during her second world tour in 2009.\n3. Conciseness: The answer is concise and sticks to the point, providing the exact figure and relevant context about the record without additional unnecessary information.\n\nFinal Result: Good"
  • "Reasoning:\n\nWhy the answer may be good:\n- The answer specifies a borough of New York, which is relevant to the question.\n- It provides a specific claim about the population distribution of Asian-Americans within New York City boroughs.\n\nWhy the answer may be bad:\n- The provided document explicitly states that Queens is home to the state's largest Asian-American population, not Manhattan.\n- The answer does not align with the key information from the document, thus failing the test of context grounding.\n\nFinal Result: Bad"

Evaluation

Metrics

Label Accuracy
all 0.8361

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("Netta1994/setfit_baai_squad_gpt-4o_improved-cot-instructions_two_reasoning_only_reasoning_17267")
# Run inference
preds = model("The answer may be good:
- The information provided in the answer is supported by the document. 

The answer may be bad:
- The answer does not address the specific question asked which pertains to the year that Doctorate degrees were first granted at Notre Dame.
- It deviates into unrelated information about the opening of a theology library, which is irrelevant to the question.

Final result: Bad")

Training Details

Training Set Metrics

Training set Min Median Max
Word count 1 91.8596 275
Label Training Sample Count
0 27
1 30

Training Hyperparameters

  • batch_size: (16, 16)
  • num_epochs: (5, 5)
  • max_steps: -1
  • sampling_strategy: oversampling
  • num_iterations: 20
  • body_learning_rate: (2e-05, 2e-05)
  • head_learning_rate: 2e-05
  • loss: CosineSimilarityLoss
  • distance_metric: cosine_distance
  • margin: 0.25
  • end_to_end: False
  • use_amp: False
  • warmup_proportion: 0.1
  • l2_weight: 0.01
  • seed: 42
  • eval_max_steps: -1
  • load_best_model_at_end: False

Training Results

Epoch Step Training Loss Validation Loss
0.0070 1 0.1646 -
0.3497 50 0.2544 -
0.6993 100 0.1157 -
1.0490 150 0.0294 -
1.3986 200 0.0037 -
1.7483 250 0.0025 -
2.0979 300 0.0023 -
2.4476 350 0.002 -
2.7972 400 0.0018 -
3.1469 450 0.0017 -
3.4965 500 0.0016 -
3.8462 550 0.0017 -
4.1958 600 0.0016 -
4.5455 650 0.0015 -
4.8951 700 0.0016 -

Framework Versions

  • Python: 3.10.14
  • SetFit: 1.1.0
  • Sentence Transformers: 3.1.0
  • Transformers: 4.44.0
  • PyTorch: 2.4.1+cu121
  • Datasets: 2.19.2
  • Tokenizers: 0.19.1

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}
Downloads last month
7
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Netta1994/setfit_baai_squad_gpt-4o_improved-cot-instructions_two_reasoning_only_reasoning_17267

Finetuned
(257)
this model

Evaluation results