metadata

license: apache-2.0
base_model: bert-base-cased
tags:
  - generated_from_trainer
datasets:
  - linnaeus
metrics:
  - precision
  - recall
  - f1
  - accuracy
model-index:
  - name: bert-linnaeus-ner
    results:
      - task:
          name: Token Classification
          type: token-classification
        dataset:
          name: linnaeus
          type: linnaeus
          config: linnaeus
          split: validation
          args: linnaeus
        metrics:
          - name: Precision
            type: precision
            value: 0.9223433242506812
          - name: Recall
            type: recall
            value: 0.9521800281293952
          - name: F1
            type: f1
            value: 0.9370242214532872
          - name: Accuracy
            type: accuracy
            value: 0.9985110458648063
widget:
  - text: >-
      Streptococcus suis (S. suis) is an important zoonosis and pathogen that
      can carry prophages.
  - text: >-
      Lactobacillus plantarum is an important probiotic and is mostly isolated
      from fermented foods.
inference:
  parameters:
    aggregation_strategy: first

bert-linnaeus-ner

This model is a fine-tuned version of bert-base-cased on the linnaeus dataset. It achieves the following results on the evaluation set:

Loss: 0.0073
Precision: 0.9223
Recall: 0.9522
F1: 0.9370
Accuracy: 0.9985

Model description

This model can be used to find organisms and species in text data.

NB. THIS MODEL IS WIP AND IS SUBJECT TO CHANGE!

Intended uses & limitations

This model's intended use is in my Master's thesis to mask names of bacteria (and phages) for further analysis.

Training and evaluation data

Linnaeus dataset was used to train and validate the performance.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Precision	Recall	F1	Accuracy
0.0076	1.0	1492	0.0128	0.8566	0.9578	0.9044	0.9967
0.0024	2.0	2984	0.0082	0.9092	0.9578	0.9329	0.9980
0.0007	3.0	4476	0.0073	0.9223	0.9522	0.9370	0.9985

Framework versions

Transformers 4.34.0
Pytorch 2.1.0+cu121
Datasets 2.14.5
Tokenizers 0.14.0