Behpouyan-NER / README.md

Update README.md

1cf3ef9 verified 19 days ago

4.34 kB

	---
	library_name: transformers
	tags:
	- Persian
	- Named Entity Recognition
	- NER
	- Albert
	---

	# Model Card for Behpoyan-NER

	Behpoyan-NER is a fine-tuned Albert model for Named Entity Recognition (NER) in the Persian language. It is based on the `HooshvareLab/albert-fa-zwnj-base-v2-ner` model and identifies ten types of entities: Date (DAT), Event (EVE), Facility (FAC), Location (LOC), Money (MON), Organization (ORG), Percent (PCT), Person (PER), Product (PRO), and Time (TIM).

	## Model Details

	### Model Description

	Behpoyan-NER is designed to recognize named entities in Persian text, improving upon the capabilities of its base model, `HooshvareLab/albert-fa-zwnj-base-v2-ner`. It was fine-tuned on a dataset combining ARMAN, PEYMA, and WikiANN datasets, which are widely used for NER in the Persian language.

	- Developed by: Behpoyan
	- Model type: Albert for Token Classification
	- Language(s) (NLP): Persian (fa)
	- License: MIT

	### Model Sources

	- Repository: [Behpoyan/Behpoyan-NER](https://huggingface.co/Behpoyan/Behpoyan-NER)
	- Base Model Repository: [HooshvareLab/albert-fa-zwnj-base-v2-ner](https://huggingface.co/HooshvareLab/albert-fa-zwnj-base-v2-ner)


	### Direct Use

	This model can be directly used for Named Entity Recognition tasks in Persian text. Example applications include text analysis, information extraction, and Persian-language NLP applications.

	### Downstream Use

	The model can be fine-tuned further for domain-specific NER tasks or combined with other models for complex NLP pipelines.

	### Out-of-Scope Use

	The model is not designed for languages other than Persian or tasks outside token classification. Misuse for generating biased or harmful content is discouraged.

	### Recommendations

	While the model performs well for general-purpose NER in Persian, users should validate its performance on their specific datasets. Be cautious of biases in the training data, especially in identifying less-represented entities.

	## How to Get Started with the Model

	Here’s how you can use the model:

	```python
	from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

	tokenizer = AutoTokenizer.from_pretrained("Behpouyan/Behpouyan-NER")
	model = AutoModelForTokenClassification.from_pretrained("Behpouyan/Behpouyan-NER")

	nlp = pipeline("ner", model=model, tokenizer=tokenizer)

	# Input example
	example = '''
	"در سال ۱۴۰۱، شرکت علی‌بابا اعلام کرد که با همکاری بانک ملت، یک پروژه بزرگ برای توسعه زیرساخت‌های تجارت الکترونیک در ایران آغاز خواهد کرد.
	این پروژه در تهران و اصفهان اجرا می‌شود و پیش‌بینی می‌شود تا پایان سال ۱۴۰۲ تکمیل شود."
	'''
	# Get NER results
	ner_results = nlp(example)

	# Function to merge subword entities
	def merge_entities(entities):
	merged_results = []
	current_entity = None

	for entity in entities:
	if entity['entity'].startswith("B-") or current_entity is None:
	# Start a new entity
	if current_entity:
	merged_results.append(current_entity)
	current_entity = {
	"word": entity['word'].strip(),
	"entity": entity['entity'][2:], # Remove "B-" prefix
	"score": entity['score'],
	"start": entity['start'],
	"end": entity['end'],
	}
	elif entity['entity'].startswith("I-") and current_entity:
	# Continue the current entity
	current_entity['word'] += entity['word'].strip()
	current_entity['score'] = min(current_entity['score'], entity['score']) # Use the lowest score
	current_entity['end'] = entity['end']

	# Add the last entity if any
	if current_entity:
	merged_results.append(current_entity)

	return merged_results

	# Merge the entities
	merged_results = merge_entities(ner_results)

	# Display the merged results
	print("Named Entity Recognition Results:")
	for entity in merged_results:
	print(f"- Entity: {entity['word']}")
	print(f" Type: {entity['entity']}")
	print(f" Score: {entity['score']:.2f}")
	print(f" Start: {entity['start']}, End: {entity['end']}")
	print("-" * 40)