BiodivBERT
Model description
- BiodivBERT is a domain-specific BERT based cased model for the biodiversity literature.
- It uses the tokenizer from BERTT base cased model.
- BiodivBERT is pre-trained on abstracts and full text from biodiversity literature.
- BiodivBERT is fine-tuned on two down stream tasks for Named Entity Recognition and Relation Extraction in the biodiversity domain.
- Please visit our GitHub Repo for more details.
How to use
- You can use BiodivBERT via huggingface library as follows:
- Masked Language Model
>>> from transformers import AutoTokenizer, AutoModelForMaskedLM
>>> tokenizer = AutoTokenizer.from_pretrained("NoYo25/BiodivBERT")
>>> model = AutoModelForMaskedLM.from_pretrained("NoYo25/BiodivBERT")
- Token Classification - Named Entity Recognition
>>> from transformers import AutoTokenizer, AutoModelForTokenClassification
>>> tokenizer = AutoTokenizer.from_pretrained("NoYo25/BiodivBERT")
>>> model = AutoModelForTokenClassification.from_pretrained("NoYo25/BiodivBERT")
- Sequence Classification - Relation Extraction
>>> from transformers import AutoTokenizer, AutoModelForSequenceClassification
>>> tokenizer = AutoTokenizer.from_pretrained("NoYo25/BiodivBERT")
>>> model = AutoModelForSequenceClassification.from_pretrained("NoYo25/BiodivBERT")
Training data
- BiodivBERT is pre-trained on abstracts and full text from biodiversity domain-related publications.
- We used both Elsevier and Springer APIs to crawl such data.
- We covered publications over the duration of 1990-2020.
Evaluation results
BiodivBERT overperformed both BERT_base_cased
, biobert_v1.1
, and BiLSTM
as a baseline approach on the down stream tasks.
- Downloads last month
- 72
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.