boryana's picture
Update README.md
78f4396 verified
|
raw
history blame
2.41 kB
---
base_model: INSAIT-Institute/BgGPT-7B-Instruct-v0.2
library_name: peft
license: apache-2.0
language:
- en
tags:
- propaganda
---
# Model Card for identrics/BG_propaganda_detector
## Model Description
- **Developed by:** [`Identrics`](https://identrics.ai/)
- **Language:** English
- **License:** apache-2.0
- **Finetuned from model:** [`google-bert/bert-base-cased`](https://huggingface.co/google-bert/bert-base-cased)
- **Context window :** 512 tokens
## Model Description
This model consists of a fine-tuned version of google-bert/bert-base-cased for a propaganda detection task. It is effectively a binary classifier, determining whether propaganda is present in the output string.
This model was created by [`Identrics`](https://identrics.ai/), in the scope of the WASPer project. The detailed taxonomy of the full pipeline could be found [here](https://github.com/Identrics/wasper/).
## Uses
Designed as a binary classifier to determine whether a traditional or social media comment contains propaganda.
### Example
First install direct dependencies:
```
pip install transformers torch accelerate
```
Then the model can be downloaded and used for inference:
```py
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained("identrics/EN_propaganda_detector", num_labels=2)
tokenizer = AutoTokenizer.from_pretrained("identrics/EN_propaganda_detector")
tokens = tokenizer("Our country is the most powerful country in the world!", return_tensors="pt")
output = model(**tokens)
print(output.logits)
```
## Training Details
The training dataset for the model consists of a balanced collection of English examples, including both propaganda and non-propaganda content. These examples were sourced from a variety of traditional media and social media platforms and manually annotated by domain experts. Additionally, the dataset is enriched with AI-generated samples.
The model achieved an F1 score of **0.807** during evaluation.
## Compute Infrastructure
The model was fine-tuned using a **GPU / 2xNVIDIA Tesla V100 32GB**.
## Citation [this section is to be updated soon]
If you find our work useful, please consider citing WASPer:
```
@article{...2024wasper,
title={WASPer: Propaganda Detection in Bulgarian and English},
author={....},
journal={arXiv preprint arXiv:...},
year={2024}
}
```