identrics
/

wasper_propaganda_detection_en

Model card Files Files and versions Community

wasper_propaganda_detection_en / README.md

Nikola299's picture

Update README.md

b398a89 verified 5 months ago

|

2.35 kB

	---
	base_model: INSAIT-Institute/BgGPT-7B-Instruct-v0.2
	library_name: peft
	license: apache-2.0
	language:
	- en
	tags:
	- propaganda
	---

	# Model Card for identrics/BG_propaganda_detector



	## Model Description

	- Developed by: [`Identrics`](https://identrics.ai/)
	- Language: English
	- License: apache-2.0
	- Finetuned from model: [`google-bert/bert-base-cased`](https://huggingface.co/google-bert/bert-base-cased)
	- Context window : 512 tokens

	## Model Description

	This model consists of a fine-tuned version of google-bert/bert-base-cased for a propaganda detection task. It is effectively a binary classifier, determining wether propaganda is present in the output string.
	This model was created by [`Identrics`](https://identrics.ai/), in the scope of the Wasper project.


	## Uses

	To be used as a binary classifier to identify if propaganda is present in a string containing a comment from a social media site

	### Example

	First install direct dependencies:
	```
	pip install transformers torch accelerate
	```

	Then the model can be downloaded and used for inference:
	```py
	from transformers import AutoModelForSequenceClassification, AutoTokenizer

	model = AutoModelForSequenceClassification.from_pretrained("identrics/EN_propaganda_detector", num_labels=2)
	tokenizer = AutoTokenizer.from_pretrained("identrics/EN_propaganda_detector")

	tokens = tokenizer("Our country is the most powerful country in the world!", return_tensors="pt")
	output = model(**tokens)
	print(output.logits)
	```


	## Training Details

	The training datasets for the model consist of a balanced set totaling 840 examples that include both propaganda and non-propaganda content. These examples are collected from a variety of traditional media and social media sources, ensuring a diverse range of content. Aditionally, the training dataset is enriched with AI-generated samples. The total distribution of the training data is shown in the table below:


	![image/png](https://cdn-uploads.huggingface.co/production/uploads/66741cdd8123010b8f63f965/KyUIrMGWmmpnE67WZeQaN.png)


	The model was then tested on a smaller evaluation dataset, achieving an f1 score of 0.807. The evaluation dataset is distributed as such:



	![image/png](https://cdn-uploads.huggingface.co/production/uploads/66741cdd8123010b8f63f965/5MOK5L7Tq9Ff64t0rPo17.png)




	- PEFT 0.11.1