|
--- |
|
language: |
|
- en |
|
license: apache-2.0 |
|
library_name: span-marker |
|
tags: |
|
- span-marker |
|
- token-classification |
|
- ner |
|
- named-entity-recognition |
|
datasets: |
|
- conll2003 |
|
metrics: |
|
- f1 |
|
- recall |
|
- precision |
|
pipeline_tag: token-classification |
|
widget: |
|
- text: Amelia Earhart flew her single engine Lockheed Vega 5B across the Atlantic |
|
to Paris. |
|
example_title: Amelia Earhart |
|
base_model: xlm-roberta-large |
|
model-index: |
|
- name: SpanMarker w. xlm-roberta-large on CoNLL03 by Tom Aarsen |
|
results: |
|
- task: |
|
type: token-classification |
|
name: Named Entity Recognition |
|
dataset: |
|
name: CoNLL03 |
|
type: conll2003 |
|
split: test |
|
revision: 01ad4ad271976c5258b9ed9b910469a806ff3288 |
|
metrics: |
|
- type: f1 |
|
value: 0.9307 |
|
name: F1 |
|
- type: precision |
|
value: 0.9264 |
|
name: Precision |
|
- type: recall |
|
value: 0.935 |
|
name: Recall |
|
--- |
|
|
|
# SpanMarker for Named Entity Recognition |
|
|
|
This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model that can be used for Named Entity Recognition. In particular, this SpanMarker model uses [xlm-roberta-large](https://huggingface.co/xlm-roberta-large) as the underlying encoder. See [train.py](train.py) for the training script. |
|
|
|
## Usage |
|
|
|
To use this model for inference, first install the `span_marker` library: |
|
|
|
```bash |
|
pip install span_marker |
|
``` |
|
|
|
You can then run inference with this model like so: |
|
|
|
```python |
|
from span_marker import SpanMarkerModel |
|
|
|
# Download from the π€ Hub |
|
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-xlm-roberta-large-conll03") |
|
# Run inference |
|
entities = model.predict("Amelia Earhart flew her single engine Lockheed Vega 5B across the Atlantic to Paris.") |
|
``` |
|
|
|
### Limitations |
|
|
|
**Warning**: This model works best when punctuation is separated from the prior words, so |
|
```python |
|
# β
|
|
model.predict("He plays J. Robert Oppenheimer , an American theoretical physicist .") |
|
# β |
|
model.predict("He plays J. Robert Oppenheimer, an American theoretical physicist.") |
|
|
|
# You can also supply a list of words directly: β
|
|
model.predict(["He", "plays", "J.", "Robert", "Oppenheimer", ",", "an", "American", "theoretical", "physicist", "."]) |
|
``` |
|
The same may be beneficial for some languages, such as splitting `"l'ocean Atlantique"` into `"l' ocean Atlantique"`. |
|
|
|
See the [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) repository for documentation and additional information on this library. |