--- base_model: INSAIT-Institute/BgGPT-7B-Instruct-v0.2 library_name: peft license: apache-2.0 language: - bg tags: - propaganda --- # Model Card for identrics/wasper_propaganda_detection_bg ## Model Description - **Developed by:** [`Identrics`](https://identrics.ai/) - **Language:** Bulgarian - **License:** apache-2.0 - **Finetuned from model:** [`INSAIT-Institute/BgGPT-7B-Instruct-v0.2`](https://huggingface.co/INSAIT-Institute/BgGPT-7B-Instruct-v0.2) - **Context window :** 8192 tokens ## Model Description This model consists of a fine-tuned version of BgGPT-7B-Instruct-v0.2 for a propaganda detection task. It is effectively a binary classifier, determining wether propaganda is present in the output string. This model was created by [`Identrics`](https://identrics.ai/), in the scope of the WASPer project. The detailed taxonomy of the full pipeline could be found [here](https://github.com/Identrics/wasper/). ## Uses Designed as a binary classifier to determine whether a traditional or social media comment contains propaganda. ### Example First install direct dependencies: ``` pip install transformers torch accelerate ``` Then the model can be downloaded and used for inference: ```py from transformers import pipeline labels_map = {"LABEL_0": "No Propaganda", "LABEL_1": "Propaganda"} pipe = pipeline( "text-classification", model="identrics/wasper_propaganda_detection_bg", tokenizer="identrics/wasper_propaganda_detection_bg", ) text = "Газа евтин, американското ядрено гориво евтино, пълно с фотоволтаици а пък тока с 30% нагоре. Защо ?" prediction = pipe(text) print(labels_map[prediction[0]["label"]]) ``` ## Training Details The training dataset for the model consists of a balanced collection of Bulgarian examples, including both propaganda and non-propaganda content. These examples were sourced from a variety of traditional media and social media platforms and manually annotated by domain experts. Additionally, the dataset is enriched with AI-generated samples. The model achieved an F1 score of **0.836** during evaluation. ## Compute Infrastructure This model was fine-tuned using a **GPU / 2xNVIDIA Tesla V100 32GB**. ## Citation [this section is to be updated soon] If you find our work useful, please consider citing WASPer: ``` @article{...2024wasper, title={WASPer: Propaganda Detection in Bulgarian and English}, author={....}, journal={arXiv preprint arXiv:...}, year={2024} } ```