---
base_model: INSAIT-Institute/BgGPT-7B-Instruct-v0.2
library_name: peft
license: apache-2.0
language:
- bg
tags:
- propaganda
---

# Model Card for identrics/wasper_propaganda_detection_bg


## Model Description

- **Developed by:** [`Identrics`](https://identrics.ai/)
- **Language:** Bulgarian
- **License:** apache-2.0
- **Finetuned from model:** [`INSAIT-Institute/BgGPT-7B-Instruct-v0.2`](https://huggingface.co/INSAIT-Institute/BgGPT-7B-Instruct-v0.2)
- **Context window :** 8192 tokens

## Model Description

This model consists of a fine-tuned version of BgGPT-7B-Instruct-v0.2 for a propaganda detection task. It is effectively a binary classifier, determining wether propaganda is present in the output string.
This model was created by [`Identrics`](https://identrics.ai/), in the scope of the WASPer project. The detailed taxonomy of the full pipeline could be found [here](https://github.com/Identrics/wasper/).


## Uses

Designed as a binary classifier to determine whether a traditional or social media comment contains propaganda.

### Example

First install direct dependencies:
```
pip install transformers torch accelerate
```

Then the model can be downloaded and used for inference:
```py
from transformers import pipeline

labels_map = {"LABEL_0": "No Propaganda", "LABEL_1": "Propaganda"}

pipe = pipeline(
    "text-classification",
    model="identrics/wasper_propaganda_detection_bg",
    tokenizer="identrics/wasper_propaganda_detection_bg",
)

text = "Газа евтин, американското ядрено гориво евтино, пълно с фотоволтаици а пък тока с 30% нагоре. Защо ?"

prediction = pipe(text)
print(labels_map[prediction[0]["label"]])
```


## Training Details
The training dataset for the model consists of a balanced collection of Bulgarian examples, including both propaganda and non-propaganda content. These examples were sourced from a variety of traditional media and social media platforms and manually annotated by domain experts. Additionally, the dataset is enriched with AI-generated samples.

The model achieved an F1 score of **0.836** during evaluation.

## Compute Infrastructure
This model was fine-tuned using a **GPU / 2xNVIDIA Tesla V100 32GB**.

## Citation [this section is to be updated soon]

If you find our work useful, please consider citing WASPer:

```
@article{...2024wasper,
  title={WASPer: Propaganda Detection in Bulgarian and English}, 
  author={....},
  journal={arXiv preprint arXiv:...},
  year={2024}
}
```