Named Entity Recognition (Model) for English language

This model is a fine-tuned version of google/electra-base-discriminator on tner/ontonotes5 dataset. It achieves the following results on the evaluation set:

Loss: 0.1080
Precision: 0.8564
Recall: 0.8792
F1: 0.8676
Accuracy: 0.9741

Intended uses & limitations

Since OntoNotes includes detailed named entity annotations, a model fine-tuned on it can effectively recognize entities like people, locations, organizations, and some specialized categories.

OntoNotes primarily includes data from newswire, broadcast news, conversational telephone speech, and web data. Thus, models fine-tuned on OntoNotes may struggle with informal text like social media, domain-specific jargon, or highly technical language

Usage Example

from transformers import pipeline, AutoTokenizer, AutoModelForTokenClassification

# Specify the model checkpoint
model_checkpoint = "ShakhzoDavronov/electra-ner-token-classification"

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
model = AutoModelForTokenClassification.from_pretrained(model_checkpoint)

# Initialize the pipeline with the model and tokenizer
nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="average")

# Define a sample text for Named Entity Recognition
sample_text = '''Amazon announced its plans to open a new headquarters in Virginia, 
                 aiming to create over 25,000 jobs in the area by 2030.'''

# Run the pipeline and print results
outputs = nlp(sample_text)
for ent in outputs:
    print(ent)

Output:

{'entity_group': 'ORG', 'score': 0.9978817, 'word': 'amazon', 'start': 0, 'end': 6}
{'entity_group': 'GPE', 'score': 0.996828, 'word': 'virginia', 'start': 57, 'end': 65}
{'entity_group': 'CARDINAL', 'score': 0.638206, 'word': 'over 25, 000', 'start': 100, 'end': 111}
{'entity_group': 'DATE', 'score': 0.9956894, 'word': '2030', 'start': 132, 'end': 136}

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Precision	Recall	F1	Accuracy
0.1019	1.0	7491	0.1099	0.8269	0.8583	0.8423	0.9705
0.0717	2.0	14982	0.1030	0.8569	0.8762	0.8664	0.9736
0.0397	3.0	22473	0.1080	0.8564	0.8792	0.8676	0.9741

Framework versions

Transformers 4.44.2
Pytorch 2.5.0+cu121
Datasets 3.1.0
Tokenizers 0.19.1

Contacts

If you have any questions or need more information, please contact me. LinkedIn:Shakhzod Davronov

ShakhzoDavronov
/

electra-ner-token-classification

Named Entity Recognition (Model) for English language

Categories

Intended uses & limitations

Usage Example

Training hyperparameters

Training results

Framework versions

Contacts

Model tree for ShakhzoDavronov/electra-ner-token-classification

Space using ShakhzoDavronov/electra-ner-token-classification 1

Evaluation results