YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
How to use
Requirements
Transformers require transformers
and sentencepiece
, both of which can be
installed using pip
.
pip install transformers sentencepiece
Pipelines 🚀
In case you are not familiar with Transformers, you can use pipelines instead.
Note that, pipelines can't have no answer for the questions.
from transformers import pipeline
model_name = "SajjadAyoubi/bert-base-fa-qa"
qa_pipeline = pipeline("question-answering", model=model_name, tokenizer=model_name)
text = "سلام من سجاد ایوبی هستم ۲۰ سالمه و به پردازش زبان طبیعی علاقه دارم"
questions = ["اسمم چیه؟", "چند سالمه؟", "به چی علاقه دارم؟"]
for question in questions:
print(qa_pipeline({"context": text, "question": question}))
>>> {'score': 0.4839823544025421, 'start': 8, 'end': 18, 'answer': 'سجاد ایوبی'}
>>> {'score': 0.3747948706150055, 'start': 24, 'end': 32, 'answer': '۲۰ سالمه'}
>>> {'score': 0.5945395827293396, 'start': 38, 'end': 55, 'answer': 'پردازش زبان طبیعی'}
Manual approach 🔥
Using the Manual approach, it is possible to have no answer with even better performance.
- PyTorch
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
from src.utils import AnswerPredictor
model_name = "SajjadAyoubi/bert-base-fa-qa"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
text = "سلام من سجاد ایوبی هستم ۲۰ سالمه و به پردازش زبان طبیعی علاقه دارم"
questions = ["اسمم چیه؟", "چند سالمه؟", "به چی علاقه دارم؟"]
# this class is from src/utils.py and you can read more about it
predictor = AnswerPredictor(model, tokenizer, device="cpu", n_best=10)
preds = predictor(questions, [text] * 3, batch_size=3)
for k, v in preds.items():
print(v)
Produces an output such below:
100%|██████████| 1/1 [00:00<00:00, 3.56it/s]
{'score': 8.040637016296387, 'text': 'سجاد ایوبی'}
{'score': 9.901972770690918, 'text': '۲۰'}
{'score': 12.117212295532227, 'text': 'پردازش زبان طبیعی'}
- TensorFlow 2.X
from transformers import AutoTokenizer, TFAutoModelForQuestionAnswering
from src.utils import TFAnswerPredictor
model_name = "SajjadAyoubi/bert-base-fa-qa"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = TFAutoModelForQuestionAnswering.from_pretrained(model_name)
text = "سلام من سجاد ایوبی هستم ۲۰ سالمه و به پردازش زبان طبیعی علاقه دارم"
questions = ["اسمم چیه؟", "چند سالمه؟", "به چی علاقه دارم؟"]
# this class is from src/utils.py, you can read more about it
predictor = TFAnswerPredictor(model, tokenizer, n_best=10)
preds = predictor(questions, [text] * 3, batch_size=3)
for k, v in preds.items():
print(v)
Produces an output such below:
100%|██████████| 1/1 [00:00<00:00, 3.56it/s]
{'score': 8.040637016296387, 'text': 'سجاد ایوبی'}
{'score': 9.901972770690918, 'text': '۲۰'}
{'score': 12.117212295532227, 'text': 'پردازش زبان طبیعی'}
Or you can access the whole demonstration using HowToUse iPython Notebook on Google Colab
- Downloads last month
- 243
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.