metadata

base_model: INSAIT-Institute/BgGPT-7B-Instruct-v0.2
library_name: peft
license: apache-2.0
language:
  - en
tags:
  - propaganda

Model Card for identrics/BG_propaganda_detector

Model Description

Developed by: Identrics
Language: English
License: apache-2.0
Finetuned from model: google-bert/bert-base-cased
Context window : 512 tokens

Model Description

This model consists of a fine-tuned version of google-bert/bert-base-cased for a propaganda detection task. It is effectively a binary classifier, determining wether propaganda is present in the output string. This model was created by Identrics, in the scope of the WASPer project. The detailed taxonomy could be found here.

Uses

To be used as a binary classifier to identify if propaganda is present in a string containing a comment from a social media site

Example

First install direct dependencies:

pip install transformers torch accelerate

Then the model can be downloaded and used for inference:

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("identrics/EN_propaganda_detector", num_labels=2)
tokenizer = AutoTokenizer.from_pretrained("identrics/EN_propaganda_detector")

tokens = tokenizer("Our country is the most powerful country in the world!", return_tensors="pt")
output = model(**tokens)
print(output.logits)

Training Details

The training datasets for the model consist of a balanced set totaling 840 English examples that include both propaganda and non-propaganda content. These examples are collected from a variety of traditional media and social media sources, ensuring a diverse range of content. Aditionally, the training dataset is enriched with AI-generated samples. The total distribution of the training data is shown in the table below:

The model was then tested on a smaller evaluation dataset, achieving an f1 score of 0.807.

Citation

If you find our work useful, please consider citing WASPer:

@article{...2024wasper,
  title={WASPer: Propaganda Detection in Bulgarian and English}, 
  author={....},
  journal={arXiv preprint arXiv:...},
  year={2024}
}