--- base_model: INSAIT-Institute/BgGPT-7B-Instruct-v0.2 library_name: peft license: apache-2.0 language: - en tags: - propaganda --- # Model Card for identrics/BG_propaganda_detector ## Model Description - **Developed by:** [`Identrics`](https://identrics.ai/) - **Language:** English - **License:** apache-2.0 - **Finetuned from model:** [`google-bert/bert-base-cased`](https://huggingface.co/google-bert/bert-base-cased) - **Context window :** 512 tokens ## Model Description This model consists of a fine-tuned version of google-bert/bert-base-cased for a propaganda detection task. It is effectively a binary classifier, determining wether propaganda is present in the output string. This model was created by [`Identrics`](https://identrics.ai/), in the scope of the WASPer project. The detailed taxonomy could be found [here](https://github.com/Identrics/wasper/). ## Uses To be used as a binary classifier to identify if propaganda is present in a string containing a comment from a social media site ### Example First install direct dependencies: ``` pip install transformers torch accelerate ``` Then the model can be downloaded and used for inference: ```py from transformers import AutoModelForSequenceClassification, AutoTokenizer model = AutoModelForSequenceClassification.from_pretrained("identrics/EN_propaganda_detector", num_labels=2) tokenizer = AutoTokenizer.from_pretrained("identrics/EN_propaganda_detector") tokens = tokenizer("Our country is the most powerful country in the world!", return_tensors="pt") output = model(**tokens) print(output.logits) ``` ## Training Details The training datasets for the model consist of a balanced set totaling 840 English examples that include both propaganda and non-propaganda content. These examples are collected from a variety of traditional media and social media sources, ensuring a diverse range of content. Aditionally, the training dataset is enriched with AI-generated samples. The total distribution of the training data is shown in the table below: The model was then tested on a smaller evaluation dataset, achieving an f1 score of 0.807. ## Citation If you find our work useful, please consider citing WASPer: ``` @article{...2024wasper, title={WASPer: Propaganda Detection in Bulgarian and English}, author={....}, journal={arXiv preprint arXiv:...}, year={2024} } ```