cambridgeltl/linnaeus
Updated • 215 • 1
How to use mikrz/bert-linnaeus-ner with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("token-classification", model="mikrz/bert-linnaeus-ner") # Load model directly
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("mikrz/bert-linnaeus-ner")
model = AutoModelForTokenClassification.from_pretrained("mikrz/bert-linnaeus-ner")This model is a fine-tuned version of bert-base-cased on the linnaeus dataset. It achieves the following results on the evaluation set:
This model can be used to find organisms and species in text data.
NB. THIS MODEL IS WIP AND IS SUBJECT TO CHANGE!
This model's intended use is in my Master's thesis to mask names of bacteria (and phages) for further analysis.
Linnaeus dataset was used to train and validate the performance.
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy |
|---|---|---|---|---|---|---|---|
| 0.0076 | 1.0 | 1492 | 0.0128 | 0.8566 | 0.9578 | 0.9044 | 0.9967 |
| 0.0024 | 2.0 | 2984 | 0.0082 | 0.9092 | 0.9578 | 0.9329 | 0.9980 |
| 0.0007 | 3.0 | 4476 | 0.0073 | 0.9223 | 0.9522 | 0.9370 | 0.9985 |
Base model
google-bert/bert-base-cased