--- library_name: sklearn license: mit tags: - sklearn - skops - text-classification model_format: pickle model_file: skops-zquiq5g5.pkl --- # Model description This is a `Support Vector Classifier` model trained on SIRIUS dataset.As input, the model takes text embeddings encoded with camembert-base (768 tokens) ## Intended uses & limitations This model is not ready to be used in production. ## Training Procedure [More Information Needed] ### Hyperparameters

Click to expand

| Hyperparameter | Value | |---------------------------------------------------------|--------------------------------------------------------------------------------------------------------------| | memory | | | steps | [('columntransformer', ColumnTransformer(transformers=[('num',
Pipeline(steps=[('imputer',
SimpleImputer(strategy='median')),
('scaler', StandardScaler()),
('pca',
PCA(n_components=84))]),
Index(['avg_1', 'avg_2', 'avg_3', 'avg_4', 'avg_5', 'avg_6', 'avg_7', 'avg_8',
'avg_9', 'avg_10',
...
'max_759', 'max_760', 'max_761', 'max_762', 'max_763', 'max_764',
'max_765', 'max_766', 'max_767', 'max_768'],
dtype='object', length=2304))],
verbose_feature_names_out=False)), ('svc', SVC(probability=True, random_state=42))] | | verbose | False | | columntransformer | ColumnTransformer(transformers=[('num',
Pipeline(steps=[('imputer',
SimpleImputer(strategy='median')),
('scaler', StandardScaler()),
('pca',
PCA(n_components=84))]),
Index(['avg_1', 'avg_2', 'avg_3', 'avg_4', 'avg_5', 'avg_6', 'avg_7', 'avg_8',
'avg_9', 'avg_10',
...
'max_759', 'max_760', 'max_761', 'max_762', 'max_763', 'max_764',
'max_765', 'max_766', 'max_767', 'max_768'],
dtype='object', length=2304))],
verbose_feature_names_out=False) | | svc | SVC(probability=True, random_state=42) | | columntransformer__force_int_remainder_cols | True | | columntransformer__n_jobs | | | columntransformer__remainder | drop | | columntransformer__sparse_threshold | 0.3 | | columntransformer__transformer_weights | | | columntransformer__transformers | [('num', Pipeline(steps=[('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler()), ('pca', PCA(n_components=84))]), Index(['avg_1', 'avg_2', 'avg_3', 'avg_4', 'avg_5', 'avg_6', 'avg_7', 'avg_8',
'avg_9', 'avg_10',
...
'max_759', 'max_760', 'max_761', 'max_762', 'max_763', 'max_764',
'max_765', 'max_766', 'max_767', 'max_768'],
dtype='object', length=2304))] | | columntransformer__verbose | False | | columntransformer__verbose_feature_names_out | False | | columntransformer__num | Pipeline(steps=[('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler()), ('pca', PCA(n_components=84))]) | | columntransformer__num__memory | | | columntransformer__num__steps | [('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler()), ('pca', PCA(n_components=84))] | | columntransformer__num__verbose | False | | columntransformer__num__imputer | SimpleImputer(strategy='median') | | columntransformer__num__scaler | StandardScaler() | | columntransformer__num__pca | PCA(n_components=84) | | columntransformer__num__imputer__add_indicator | False | | columntransformer__num__imputer__copy | True | | columntransformer__num__imputer__fill_value | | | columntransformer__num__imputer__keep_empty_features | False | | columntransformer__num__imputer__missing_values | nan | | columntransformer__num__imputer__strategy | median | | columntransformer__num__scaler__copy | True | | columntransformer__num__scaler__with_mean | True | | columntransformer__num__scaler__with_std | True | | columntransformer__num__pca__copy | True | | columntransformer__num__pca__iterated_power | auto | | columntransformer__num__pca__n_components | 84 | | columntransformer__num__pca__n_oversamples | 10 | | columntransformer__num__pca__power_iteration_normalizer | auto | | columntransformer__num__pca__random_state | | | columntransformer__num__pca__svd_solver | auto | | columntransformer__num__pca__tol | 0.0 | | columntransformer__num__pca__whiten | False | | svc__C | 1.0 | | svc__break_ties | False | | svc__cache_size | 200 | | svc__class_weight | | | svc__coef0 | 0.0 | | svc__decision_function_shape | ovr | | svc__degree | 3 | | svc__gamma | scale | | svc__kernel | rbf | | svc__max_iter | -1 | | svc__probability | True | | svc__random_state | 42 | | svc__shrinking | True | | svc__tol | 0.001 | | svc__verbose | False |

### Model Plot

Pipeline(steps=[('columntransformer',ColumnTransformer(transformers=[('num',Pipeline(steps=[('imputer',SimpleImputer(strategy='median')),('scaler',StandardScaler()),('pca',PCA(n_components=84))]),Index(['avg_1', 'avg_2', 'avg_3', 'avg_4', 'avg_5', 'avg_6', 'avg_7', 'avg_8','avg_9', 'avg_10',...'max_759', 'max_760', 'max_761', 'max_762', 'max_763', 'max_764','max_765', 'max_766', 'max_767', 'max_768'],dtype='object', length=2304))],verbose_feature_names_out=False)),('svc', SVC(probability=True, random_state=42))])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

## Evaluation Results | Metric | Value | |----------|----------| | accuracy | 0.935065 | | f1 score | 0.935709 | ### Confusion Matrix ![Confusion Matrix](confusion_matrix.png) # How to Get Started with the Model [More Information Needed] # Model Card Authors huynhdoo # Model Card Contact You can contact the model card authors through following channels: [More Information Needed] # Citation **BibTeX** ``` @inproceedings{...,year={2024}} ``` # get_started_code import pickle as pickle with open(pkl_filename, 'rb') as file: pipe = pickle.load(file)