|
--- |
|
license: bigscience-bloom-rail-1.0 |
|
language: |
|
- fa |
|
--- |
|
|
|
Base Model: |
|
https://huggingface.co/bigscience/bloomz-7b1 |
|
|
|
--- |
|
|
|
Model fine-tuned on a real news dataset and optimized for neural news generation. |
|
|
|
Note: Persian was not in pretraining. |
|
|
|
```python |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline |
|
|
|
# Load model and tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloomz-7b1") |
|
model = AutoModelForSequenceClassification.from_pretrained('tum-nlp/neural-news-generator-bloomz-7b1-fa') |
|
|
|
# Create the pipeline for neural news generation and set the repetition penalty >1.1 to punish repetition. |
|
generator = pipeline('text-generation', |
|
model=model, |
|
tokenizer=tokenizer, |
|
repetition_penalty=1.2) |
|
|
|
# Define the prompt |
|
prompt = " [EOP] به دنبال «شورش مسلحانه» مزدوران نظامی واگنر و تصرف برخی " |
|
|
|
# Generate |
|
generator(prompt, max_length=1000, num_return_sequences=1) |
|
|
|
``` |
|
|
|
Trained on 6k datapoints (including all splits) from: |
|
https://huggingface.co/datasets/RohanAiLab/persian_news_dataset |
|
|