NuNerZero - Zero Shot NER
Collection
The best compact Zero-Shot NER models with MIT license
•
4 items
•
Updated
•
19
NuNER Zero 4k is the long-context (4k tokens) version of NuNER Zero.
NuNER Zero 4k is generally less performant than NuNER Zero, but can outperform NuNER Zero on applications where context size matters.
!pip install gliner
NuZero requires labels to be lower-cased
from gliner import GLiNER
def merge_entities(entities):
if not entities:
return []
merged = []
current = entities[0]
for next_entity in entities[1:]:
if next_entity['label'] == current['label'] and (next_entity['start'] == current['end'] + 1 or next_entity['start'] == current['end']):
current['text'] = text[current['start']: next_entity['end']].strip()
current['end'] = next_entity['end']
else:
merged.append(current)
current = next_entity
# Append the last entity
merged.append(current)
return merged
model = GLiNER.from_pretrained("numind/NuNerZero_long_context")
# NuZero requires labels to be lower-cased!
labels = ["organization", "initiative", "project"]
labels = [l.lower() for l in labels]
text = "At the annual technology summit, the keynote address was delivered by a senior member of the Association for Computing Machinery Special Interest Group on Algorithms and Computation Theory, which recently launched an expansive initiative titled 'Quantum Computing and Algorithmic Innovations: Shaping the Future of Technology'. This initiative explores the implications of quantum mechanics on next-generation computing and algorithm design and is part of a broader effort that includes the 'Global Computational Science Advancement Project'. The latter focuses on enhancing computational methodologies across scientific disciplines, aiming to set new benchmarks in computational efficiency and accuracy."
entities = model.predict_entities(text, labels)
entities = merge_entities(entities)
for entity in entities:
print(entity["text"], "=>", entity["label"])
Association for Computing Machinery Special Interest Group on Algorithms and Computation Theory => organization
Quantum Computing and Algorithmic Innovations: Shaping the Future of Technology => initiative
Global Computational Science Advancement Project => project
A fine-tuning script can be found here.
@misc{bogdanov2024nuner,
title={NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data},
author={Sergei Bogdanov and Alexandre Constantin and Timothée Bernard and Benoit Crabbé and Etienne Bernard},
year={2024},
eprint={2402.15343},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@misc{zaratiana2023gliner,
title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer},
author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois},
year={2023},
eprint={2311.08526},
archivePrefix={arXiv},
primaryClass={cs.CL}
}