julian-schelb
/

roberta-ner-multilingual

Token Classification

Inference Endpoints

Model card Files Files and versions Community

roberta-ner-multilingual / README.md

julian-schelb's picture

Update README.md

5b91a3b about 2 years ago

|

1.75 kB

	---
	language:
	- de
	- en
	- multilingual
	widget:
	- text: "In December 1903 in France the Royal Swedish Academy of Sciences awarded Pierre Curie, Marie Curie, and Henri Becquerel the Nobel Prize in Physics."
	- text: "Für Richard Phillips Feynman war es immer wichtig in New York, die unanschaulichen Gesetzmäßigkeiten der Quantenphysik Laien und Studenten nahezubringen und verständlich zu machen."
	- text: "My name is Julian and I live in montreal"
	- text: "My name is clara and I live in berkeley, california."
	- text: "My name is wolfgang and I live in berlin"
	tags:
	- roberta
	license: mit
	datasets:
	- wikiann
	---

	# Roberta for Multilingual Named Entity Recognition

	## Model description

	#### Limitations and bias
	This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains.

	## Training data


	## Usage

	```python

	model_tuned = RobertaForTokenClassification.from_pretrained("./results/checkpoint-final/")

	text = "Für Richard Phillips Feynman war es immer wichtig in New York, die unanschaulichen Gesetzmäßigkeiten der Quantenphysik Laien und Studenten nahezubringen und verständlich zu machen."

	inputs = tokenizer(
	text,
	add_special_tokens=False, return_tensors="pt"
	)

	with torch.no_grad():
	logits = model_tuned(**inputs).logits

	predicted_token_class_ids = logits.argmax(-1)

	# Note that tokens are classified rather then input words which means that
	# there might be more predicted token classes than words.
	# Multiple token classes might account for the same word
	predicted_tokens_classes = [model_tuned.config.id2label[t.item()] for t in predicted_token_class_ids[0]]
	predicted_tokens_classes
	```