metadata

base_model: google-bert/bert-base-cased
library_name: peft
license: apache-2.0
language:
  - en
tags:
  - propaganda

Model Card for identrics/wasper_propaganda_detection_en

Model Description

Developed by: Identrics
Language: English
License: apache-2.0
Finetuned from model: google-bert/bert-base-cased
Context window : 512 tokens

Model Description

This model consists of a fine-tuned version of google-bert/bert-base-cased for a propaganda detection task. It is effectively a binary classifier, determining whether propaganda is present in the output string. This model was created by Identrics, in the scope of the WASPer project. The detailed taxonomy of the full pipeline could be found here.

Uses

Designed as a binary classifier to determine whether a traditional or social media comment contains propaganda.

Example

First install direct dependencies:

pip install transformers torch accelerate

Then the model can be downloaded and used for inference:

from transformers import pipeline

labels_map = {"0": "No Propaganda", "1": "Propaganda"}

pipe = pipeline(
    "text-classification",
    model="identrics/wasper_propaganda_detection_en",
    tokenizer="identrics/wasper_propaganda_detection_en",
)

text = "Our country is the most powerful country in the world!"

prediction = pipe(text)
print(labels_map[prediction[0]["label"]])

Training Details

The training dataset for the model consists of a balanced collection of English examples, including both propaganda and non-propaganda content. These examples were sourced from a variety of traditional media and social media platforms and manually annotated by domain experts. Additionally, the dataset is enriched with AI-generated samples.

The model achieved an F1 score of 0.807 during evaluation.

Compute Infrastructure

This model was fine-tuned using a GPU / 2xNVIDIA Tesla V100 32GB.

Citation [this section is to be updated soon]

If you find our work useful, please consider citing WASPer:

@article{...2024wasper,
  title={WASPer: Propaganda Detection in Bulgarian and English}, 
  author={....},
  journal={arXiv preprint arXiv:...},
  year={2024}
}