metadata
license: apache-2.0
base_model: bert-base-cased
tags:
- generated_from_trainer
datasets:
- linnaeus
metrics:
- precision
- recall
- f1
- accuracy
model-index:
- name: bert-linnaeus-ner
results:
- task:
name: Token Classification
type: token-classification
dataset:
name: linnaeus
type: linnaeus
config: linnaeus
split: validation
args: linnaeus
metrics:
- name: Precision
type: precision
value: 0.9223433242506812
- name: Recall
type: recall
value: 0.9521800281293952
- name: F1
type: f1
value: 0.9370242214532872
- name: Accuracy
type: accuracy
value: 0.9985110458648063
widget:
- text: >-
Streptococcus suis (S. suis) is an important zoonosis and pathogen that
can carry prophages.
- text: >-
Lactobacillus plantarum is an important probiotic and is mostly isolated
from fermented foods.
inference:
parameters:
aggregation_strategy: first
bert-linnaeus-ner
This model is a fine-tuned version of bert-base-cased on the linnaeus dataset. It achieves the following results on the evaluation set:
- Loss: 0.0073
- Precision: 0.9223
- Recall: 0.9522
- F1: 0.9370
- Accuracy: 0.9985
Model description
This model can be used to find organisms and species in text data.
NB. THIS MODEL IS WIP AND IS SUBJECT TO CHANGE!
Intended uses & limitations
This model's intended use is in my Master's thesis to mask names of bacteria (and phages) for further analysis.
Training and evaluation data
Linnaeus dataset was used to train and validate the performance.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy |
---|---|---|---|---|---|---|---|
0.0076 | 1.0 | 1492 | 0.0128 | 0.8566 | 0.9578 | 0.9044 | 0.9967 |
0.0024 | 2.0 | 2984 | 0.0082 | 0.9092 | 0.9578 | 0.9329 | 0.9980 |
0.0007 | 3.0 | 4476 | 0.0073 | 0.9223 | 0.9522 | 0.9370 | 0.9985 |
Framework versions
- Transformers 4.34.0
- Pytorch 2.1.0+cu121
- Datasets 2.14.5
- Tokenizers 0.14.0