|
--- |
|
base_model: INSAIT-Institute/BgGPT-7B-Instruct-v0.2 |
|
library_name: peft |
|
license: apache-2.0 |
|
language: |
|
- en |
|
tags: |
|
- propaganda |
|
--- |
|
|
|
# Model Card for identrics/BG_propaganda_detector |
|
|
|
|
|
|
|
## Model Description |
|
|
|
- **Developed by:** [`Identrics`](https://identrics.ai/) |
|
- **Language:** English |
|
- **License:** apache-2.0 |
|
- **Finetuned from model:** [`google-bert/bert-base-cased`](https://huggingface.co/google-bert/bert-base-cased) |
|
- **Context window :** 512 tokens |
|
|
|
## Model Description |
|
|
|
This model consists of a fine-tuned version of google-bert/bert-base-cased for a propaganda detection task. It is effectively a binary classifier, determining wether propaganda is present in the output string. |
|
This model was created by [`Identrics`](https://identrics.ai/), in the scope of the WASPer project. |
|
|
|
|
|
## Uses |
|
|
|
To be used as a binary classifier to identify if propaganda is present in a string containing a comment from a social media site |
|
|
|
### Example |
|
|
|
First install direct dependencies: |
|
``` |
|
pip install transformers torch accelerate |
|
``` |
|
|
|
Then the model can be downloaded and used for inference: |
|
```py |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
|
model = AutoModelForSequenceClassification.from_pretrained("identrics/EN_propaganda_detector", num_labels=2) |
|
tokenizer = AutoTokenizer.from_pretrained("identrics/EN_propaganda_detector") |
|
|
|
tokens = tokenizer("Our country is the most powerful country in the world!", return_tensors="pt") |
|
output = model(**tokens) |
|
print(output.logits) |
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
The training datasets for the model consist of a balanced set totaling 840 English examples that include both propaganda and non-propaganda content. These examples are collected from a variety of traditional media and social media sources, ensuring a diverse range of content. Aditionally, the training dataset is enriched with AI-generated samples. The total distribution of the training data is shown in the table below: |
|
|
|
|
|
|
|
The model was then tested on a smaller evaluation dataset, achieving an f1 score of 0.807. |
|
|
|
|
|
## Citation |
|
|
|
If you find our work useful, please consider citing WASPer: |
|
|
|
``` |
|
@article{...2024wasper, |
|
title={WASPer: Propaganda Detection in Bulgarian and English}, |
|
author={....}, |
|
journal={arXiv preprint arXiv:...}, |
|
year={2024} |
|
} |
|
``` |
|
|
|
|