SetFit

This is a SetFit model that can be used for Text Classification. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Type: SetFit
Classification head: a LogisticRegression instance
Maximum Sequence Length: 384 tokens
Number of Classes: 2 classes

Model Sources

Repository: SetFit on GitHub
Paper: Efficient Few-Shot Learning Without Prompts
Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts

Model Labels

Label Examples

non-bug

Label	Examples
non-bug	"Define file subtype value behavior/expectations\nIs your feature request related to a problem? Please describe.\r\nNot clear if the CFE_FS_InitHeader SubType needs to be one of the FS enums or if it can be user defined by apps. Note there is no longer a shell file created by ES:\r\nhttps://github.com/nasa/cFE/blob/e80aae94e0f56b868657daba965c590766a4dc57/modules/core_api/fsw/inc/cfe_fs_extern_typedefs.h#L101-L108\r\n\r\nDescribe the solution you'd like\r\nNeed to determine if FS should define all file subtypes, or treat it as an extendable field (or whatever). That will affect if the SHELL subtype gets removed or renamed (since there is still an app that would create it). Note right now apps don't even use `CFE_FS_InitHeader`, but they do currently set unique values.\r\n\r\nDescribe alternatives you've considered\r\nNone\r\n\r\nAdditional context\r\nCode review\r\n\r\nRequester Info\r\nJacob Hageman - NASA/GSFC\r\n" 'Disambiguate command vs message requirements \nIs your feature request related to a problem? Please describe.\r\n"Command" terminology has been used for both ground commands (that increment command counters) and inter-app commands (that typically do not increment command counters). So it's unclear in the requirement which sort of use case is intended.\r\n\r\nDescribe the solution you'd like\r\n"Command" is ground command with additional associated behavior (increments command counters), "Message" is typical sb message that does not increment command counter.\r\n\r\nDescribe alternatives you've considered\r\nNone\r\n\r\nAdditional context\r\nDiscovered during requirements scrub, helps clarify what impacts command counter.\r\n\r\nRequester Info\r\nJacob Hageman - NASA/GSFC\r\n' "Improve table handling\nIs your feature request related to a problem? Please describe.\r\nDoesn't actually allow table management within the task loop\r\n\r\nDescribe the solution you'd like\r\nActually follow the table management pattern, allowing updates (should be a decent example)\r\n\r\nDescribe alternatives you've considered\r\nN/A\r\n\r\nAdditional context\r\nN/A\r\n\r\nRequester Info\r\nJacob Hageman - NASA/GSFC"
bug	'CFE_PLATFORM_ES_PERF_MAX_IDS not fully deprecated\nDescribe the bug\r\nCFE_PLATFORM_ES_PERF_MAX_IDS was superseded by CFE_MISSION_ES_PERF_MAX_IDS as noted in this comment: https://github.com/nasa/cFE/search?q=CFE_PLATFORM_ES_PERF_MAX_IDS. However, sample cpu1_platform_cfg.h still contains the definition for CFE_PLATFORM_ES_PERF_MAX_IDS is still referenced in es_UT.c and comments in cfe_es_events.h and sample_perfids.h\r\n\r\nTo Reproduce\r\nN/A\r\n\r\nExpected behavior\r\nEither CFE_PLATFORM_ES_PERF_MAX_IDS should be totally deprecated and all references should be replaced by CFE_MISSION_ES_PERF_MAX_IDS or (if deemed necessary) support for platform-specific max values should be re-added in the perf-log implementation.\r\n\r\nCode snips\r\ncfe/cmake/sample_defs/cpu1_platform_cfg.h:1978\r\ncfe/fsw/cfe-core/src/inc/cfe_es_events.h:1046\r\ncfe/fsw/cfe-core/unit-test/es_UT.c:3664\r\n\r\nSystem observed on:\r\nN/A\r\n\r\nAdditional context\r\nN/A\r\n\r\nReporter Info\r\nPJ Chapates Gateway Vehicle System Manager FSW Production, JSC\r\n' 'CF Purge Queue Command Opcode Not Defined\nThis issue was imported from the GSFC issue tracking system\r\n\r\n_Imported from_: [GSFCCFS-1765] CF Purge Queue Command Opcode Not Defined\r\n_Originally submitted by_: Maldonado, Sergio E. (GSFC-580.0)[Arctic Slope Technical Services, Inc.] on Fri Oct 29 11:03:57 2021\r\n\r\n_Original Description_:\r\nThe command opcode for Purge Queue is not present in the CF\_CMDS enumeration in cf\_msg.h. It should be present with a value of 21. The command dispatch table in cf\_cmd.c does have an entry for the command, as well as the implementation. Without the opcode defined, the command cannot be verified at the functional level. ' "File age check logic is wrong\nChecklist (Please check before submitting)\r\n\r\n [x] I reviewed the Contributing Guide.\r\n [x] I performed a cursory search to see if the bug report is relevant, not redundant, nor in conflict with other tickets.\r\n\r\nDescribe the bug\r\nProduces ~17 files in 10 minutes when requesting 1 file per minute\r\n\r\nTo Reproduce\r\n1. Enable a 1 file per minute config\r\n2. Watch ~17 files get produced\r\n\r\nExpected behavior\r\n1 file per minute when configured to do so\r\n\r\nCode snips\r\nThe problem is how file age is accumulated. W/ the default config, 4 seconds are added every HK message, and another second is added every 1 second SB timeout. So within the typical 4 second scheduled HK request the file age gets incremented by 7 seconds (4 from HK processing and 3 from SB timeouts).\r\n\r\nhttps://github.com/nasa/DS/blob/ce988535edffd6b36cc1083e10988c2d0a4a38db/fsw/src/ds_app.c#L124\r\nhttps://github.com/nasa/DS/blob/ce988535edffd6b36cc1083e10988c2d0a4a38db/fsw/src/ds_app.c#L520\r\n\r\nReally the time accumulation logic is broken since it's going to vary based on receiving any other command that would cause SB not to timeout.\r\n\r\nLikely needs a functional test update to catch this issue.\r\n\r\nSystem observed on:\r\nIndependent of system\r\n\r\nAdditional context\r\nNone\r\n\r\nReporter Info**\r\nJacob Hageman - NASA/GSFC"

"Define file subtype value behavior/expectations\nIs your feature request related to a problem? Please describe.\r\nNot clear if the CFE_FS_InitHeader SubType needs to be one of the FS enums or if it can be user defined by apps. Note there is no longer a shell file created by ES:\r\nhttps://github.com/nasa/cFE/blob/e80aae94e0f56b868657daba965c590766a4dc57/modules/core_api/fsw/inc/cfe_fs_extern_typedefs.h#L101-L108\r\n\r\nDescribe the solution you'd like\r\nNeed to determine if FS should define all file subtypes, or treat it as an extendable field (or whatever). That will affect if the SHELL subtype gets removed or renamed (since there is still an app that would create it). Note right now apps don't even use CFE_FS_InitHeader, but they do currently set unique values.\r\n\r\nDescribe alternatives you've considered\r\nNone\r\n\r\nAdditional context\r\nCode review\r\n\r\nRequester Info\r\nJacob Hageman - NASA/GSFC\r\n"
'Disambiguate command vs message requirements \nIs your feature request related to a problem? Please describe.\r\n"Command" terminology has been used for both ground commands (that increment command counters) and inter-app commands (that typically do not increment command counters). So it's unclear in the requirement which sort of use case is intended.\r\n\r\nDescribe the solution you'd like\r\n"Command" is ground command with additional associated behavior (increments command counters), "Message" is typical sb message that does not increment command counter.\r\n\r\nDescribe alternatives you've considered\r\nNone\r\n\r\nAdditional context\r\nDiscovered during requirements scrub, helps clarify what impacts command counter.\r\n\r\nRequester Info\r\nJacob Hageman - NASA/GSFC\r\n'
"Improve table handling\nIs your feature request related to a problem? Please describe.\r\nDoesn't actually allow table management within the task loop\r\n\r\nDescribe the solution you'd like\r\nActually follow the table management pattern, allowing updates (should be a decent example)\r\n\r\nDescribe alternatives you've considered\r\nN/A\r\n\r\nAdditional context\r\nN/A\r\n\r\nRequester Info\r\nJacob Hageman - NASA/GSFC"

bug

'CFE_PLATFORM_ES_PERF_MAX_IDS not fully deprecated\nDescribe the bug\r\nCFE_PLATFORM_ES_PERF_MAX_IDS was superseded by CFE_MISSION_ES_PERF_MAX_IDS as noted in this comment: https://github.com/nasa/cFE/search?q=CFE_PLATFORM_ES_PERF_MAX_IDS. However, sample cpu1_platform_cfg.h still contains the definition for CFE_PLATFORM_ES_PERF_MAX_IDS is still referenced in es_UT.c and comments in cfe_es_events.h and sample_perfids.h\r\n\r\nTo Reproduce\r\nN/A\r\n\r\nExpected behavior\r\nEither CFE_PLATFORM_ES_PERF_MAX_IDS should be totally deprecated and all references should be replaced by CFE_MISSION_ES_PERF_MAX_IDS or (if deemed necessary) support for platform-specific max values should be re-added in the perf-log implementation.\r\n\r\nCode snips\r\ncfe/cmake/sample_defs/cpu1_platform_cfg.h:1978\r\ncfe/fsw/cfe-core/src/inc/cfe_es_events.h:1046\r\ncfe/fsw/cfe-core/unit-test/es_UT.c:3664\r\n\r\nSystem observed on:\r\nN/A\r\n\r\nAdditional context\r\nN/A\r\n\r\nReporter Info\r\nPJ Chapates Gateway Vehicle System Manager FSW Production, JSC\r\n'
'CF Purge Queue Command Opcode Not Defined\nThis issue was imported from the GSFC issue tracking system\r\n\r\n_Imported from_: [GSFCCFS-1765] CF Purge Queue Command Opcode Not Defined\r\n_Originally submitted by_: Maldonado, Sergio E. (GSFC-580.0)[Arctic Slope Technical Services, Inc.] on Fri Oct 29 11:03:57 2021\r\n\r\n_Original Description_:\r\nThe command opcode for Purge Queue is not present in the CF\_CMDS enumeration in cf\_msg.h. It should be present with a value of 21. The command dispatch table in cf\_cmd.c does have an entry for the command, as well as the implementation. Without the opcode defined, the command cannot be verified at the functional level. '
"File age check logic is wrong\nChecklist (Please check before submitting)\r\n\r\n [x] I reviewed the Contributing Guide.\r\n [x] I performed a cursory search to see if the bug report is relevant, not redundant, nor in conflict with other tickets.\r\n\r\nDescribe the bug\r\nProduces ~17 files in 10 minutes when requesting 1 file per minute\r\n\r\nTo Reproduce\r\n1. Enable a 1 file per minute config\r\n2. Watch ~17 files get produced\r\n\r\nExpected behavior\r\n1 file per minute when configured to do so\r\n\r\nCode snips\r\nThe problem is how file age is accumulated. W/ the default config, 4 seconds are added every HK message, and another second is added every 1 second SB timeout. So within the typical 4 second scheduled HK request the file age gets incremented by 7 seconds (4 from HK processing and 3 from SB timeouts).\r\n\r\nhttps://github.com/nasa/DS/blob/ce988535edffd6b36cc1083e10988c2d0a4a38db/fsw/src/ds_app.c#L124\r\nhttps://github.com/nasa/DS/blob/ce988535edffd6b36cc1083e10988c2d0a4a38db/fsw/src/ds_app.c#L520\r\n\r\nReally the time accumulation logic is broken since it's going to vary based on receiving any other command that would cause SB not to timeout.\r\n\r\nLikely needs a functional test update to catch this issue.\r\n\r\nSystem observed on:\r\nIndependent of system\r\n\r\nAdditional context\r\nNone\r\n\r\nReporter Info**\r\nJacob Hageman - NASA/GSFC"

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("setfit_model_id")
# Run inference
preds = model("Consistent CFE_PSP_Main implementation
RTEMS PSP hardcodes \"/cf/cfe_es_startup.scr\", but mcp750 and pc-linux both use the CFE_PLATFORM_ES_NONVOL_STARTUP_FILE.

Inconsistent implementations.

From #102  (solved here):
cfe_psp_start.c for mcp750 VxWorks has StartupFilePath as an input parameter to CFE_PSP_Main, but calls CFE_ES_Main with CFE_PLATFORM_ES_NONVOL_STARTUP_FILE.

Confusing implementation... looks like at least the pc-linux PSP only uses CFE_PLATFORM_ES_NONVOL_STARTUP_FILE (but a different prototype).")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	1	110.5796	2778

Label	Training Sample Count
bug	662
non-bug	1517

Training Hyperparameters

batch_size: (16, 2)
num_epochs: (1, 1)
max_steps: -1
sampling_strategy: oversampling
num_iterations: 20
body_learning_rate: (2e-05, 1e-05)
head_learning_rate: 0.01
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: False
use_amp: False
warmup_proportion: 0.1
l2_weight: 0.01
seed: 42
eval_max_steps: -1
load_best_model_at_end: False

Training Results

Epoch	Step	Training Loss	Validation Loss
0.0002	1	0.4726	-
0.0092	50	0.2725	-
0.0184	100	0.2269	-
0.0275	150	0.2061	-
0.0367	200	0.2113	-
0.0459	250	0.1806	-
0.0551	300	0.1833	-
0.0642	350	0.1578	-
0.0734	400	0.1478	-
0.0826	450	0.1376	-
0.0918	500	0.1135	-
0.1010	550	0.1145	-
0.1101	600	0.1099	-
0.1193	650	0.0859	-
0.1285	700	0.0837	-
0.1377	750	0.0826	-
0.1468	800	0.0809	-
0.1560	850	0.0559	-
0.1652	900	0.0539	-
0.1744	950	0.0444	-
0.1836	1000	0.0376	-
0.1927	1050	0.0387	-
0.2019	1100	0.035	-
0.2111	1150	0.0317	-
0.2203	1200	0.029	-
0.2294	1250	0.0277	-
0.2386	1300	0.0108	-
0.2478	1350	0.0226	-
0.2570	1400	0.0105	-
0.2662	1450	0.02	-
0.2753	1500	0.016	-
0.2845	1550	0.0181	-
0.2937	1600	0.0184	-
0.3029	1650	0.0113	-
0.3120	1700	0.014	-
0.3212	1750	0.0101	-
0.3304	1800	0.0106	-
0.3396	1850	0.0101	-
0.3488	1900	0.0117	-
0.3579	1950	0.0115	-
0.3671	2000	0.0113	-
0.3763	2050	0.005	-
0.3855	2100	0.0062	-
0.3946	2150	0.0141	-
0.4038	2200	0.0096	-
0.4130	2250	0.0117	-
0.4222	2300	0.0051	-
0.4314	2350	0.0054	-
0.4405	2400	0.0049	-
0.4497	2450	0.0054	-
0.4589	2500	0.0027	-
0.4681	2550	0.0009	-
0.4772	2600	0.0021	-
0.4864	2650	0.005	-
0.4956	2700	0.0026	-
0.5048	2750	0.0025	-
0.5140	2800	0.0014	-
0.5231	2850	0.0005	-
0.5323	2900	0.0012	-
0.5415	2950	0.0027	-
0.5507	3000	0.0002	-
0.5598	3050	0.0012	-
0.5690	3100	0.0015	-
0.5782	3150	0.0001	-
0.5874	3200	0.0	-
0.5965	3250	0.0001	-
0.6057	3300	0.0011	-
0.6149	3350	0.0012	-
0.6241	3400	0.0043	-
0.6333	3450	0.0027	-
0.6424	3500	0.0007	-
0.6516	3550	0.0033	-
0.6608	3600	0.0005	-
0.6700	3650	0.0011	-
0.6791	3700	0.0023	-
0.6883	3750	0.0009	-
0.6975	3800	0.0012	-
0.7067	3850	0.0021	-
0.7159	3900	0.0003	-
0.7250	3950	0.0001	-
0.7342	4000	0.0001	-
0.7434	4050	0.0001	-
0.7526	4100	0.0023	-
0.7617	4150	0.0025	-
0.7709	4200	0.0001	-
0.7801	4250	0.0	-
0.7893	4300	0.0	-
0.7985	4350	0.001	-
0.8076	4400	0.0013	-
0.8168	4450	0.0002	-
0.8260	4500	0.0026	-
0.8352	4550	0.0002	-
0.8443	4600	0.0002	-
0.8535	4650	0.0	-
0.8627	4700	0.0001	-
0.8719	4750	0.0012	-
0.8811	4800	0.001	-
0.8902	4850	0.0001	-
0.8994	4900	0.001	-
0.9086	4950	0.0002	-
0.9178	5000	0.0002	-
0.9269	5050	0.001	-
0.9361	5100	0.0001	-
0.9453	5150	0.0021	-
0.9545	5200	0.0001	-
0.9637	5250	0.0001	-
0.9728	5300	0.0	-
0.9820	5350	0.0001	-
0.9912	5400	0.0002	-

Framework Versions

Python: 3.11.6
SetFit: 1.1.0
Sentence Transformers: 3.0.1
Transformers: 4.44.2
PyTorch: 2.4.1+cu121
Datasets: 2.21.0
Tokenizers: 0.19.1

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}