Update README.md

aaa3d76 verified 21 days ago

4.35 kB

	---
	base_model: INSAIT-Institute/BgGPT-7B-Instruct-v0.2
	library_name: peft
	license: apache-2.0
	language:
	- bg
	tags:
	- propaganda
	---

	# Model Card for identrics/wasper_propaganda_classifier_bg




	## Model Description

	- Developed by: [`Identrics`](https://identrics.ai/)
	- Language: Bulgarian
	- License: apache-2.0
	- Finetuned from model: [`INSAIT-Institute/BgGPT-7B-Instruct-v0.2`](https://huggingface.co/INSAIT-Institute/BgGPT-7B-Instruct-v0.2)
	- Context window : 8192 tokens

	## Model Description

	This model consists of a fine-tuned version of BgGPT-7B-Instruct-v0.2 for a propaganda detection task. It is effectively a multilabel classifier, determining wether a given propaganda text in Bulgarian contains or not 5 predefined propaganda types.
	This model was created by [`Identrics`](https://identrics.ai/), in the scope of the WASPer project. The detailed taxonomy could be found [here](https://github.com/Identrics/wasper/).


	## Propaganda taxonomy

	The propaganda techniques we want to identify are classified in 5 categories:

	1. Self-Identification Techniques:
	These techniques exploit the audience's feelings of association (or desire to be associated) with a larger group. They suggest that the audience should feel united, motivated, or threatened by the same factors that unite, motivate, or threaten that group.


	2. Defamation Techniques:
	These techniques represent direct or indirect attacks against an entity's reputation and worth.

	3. Legitimisation Techniques:
	These techniques attempt to prove and legitimise the propagandist's statements by using arguments that cannot be falsified because they are based on moral values or personal experiences.

	4. Logical Fallacies:
	These techniques appeal to the audience's reason and masquerade as objective and factual arguments, but in reality, they exploit distractions and flawed logic.

	5. Rhetorical Devices:
	These techniques seek to influence the audience and control the conversation by using linguistic methods.




	## Uses

	To be used as a multilabel classifier to identify if the Bulgarian sample text contains one or more of the five propaganda techniques mentioned above.
	### Example

	First install direct dependencies:
	```
	pip install transformers torch accelerate
	```

	Then the model can be downloaded and used for inference:
	```py
	import torch
	from transformers import AutoModelForSequenceClassification, AutoTokenizer

	labels = [
	"Legitimisation Techniques",
	"Rhetorical Devices",
	"Logical Fallacies",
	"Self-Identification Techniques",
	"Defamation Techniques",
	]

	model = AutoModelForSequenceClassification.from_pretrained(
	"identrics/wasper_propaganda_classifier_bg", num_labels=5
	)
	tokenizer = AutoTokenizer.from_pretrained("identrics/wasper_propaganda_classifier_bg")

	text = "Газа евтин, американското ядрено гориво евтино, пълно с фотоволтаици а пък тока с 30% нагоре. Защо ?"

	inputs = tokenizer(text, padding=True, truncation=True, return_tensors="pt")

	with torch.no_grad():
	outputs = model(**inputs)
	logits = outputs.logits

	probabilities = torch.sigmoid(logits).cpu().numpy().flatten()

	# Format predictions
	predictions = {labels[i]: probabilities[i] for i in range(len(labels))}
	print(predictions)
	```


	## Training Details


	During the training stage, the objective was to develop the multi-label classifier to identify different types of propaganda using a dataset containing both real and artificially generated samples.

	The data has been carefully annotated by domain experts based on a predefined taxonomy, which covers five primary categories. Some examples are assigned to a single category, while others are classified into multiple categories, reflecting the nuanced nature of propaganda where multiple techniques can be found within a single text.


	The model reached an F1-weighted score of 0.538 during training.

	## Compute Infrastructure

	This model was fine-tuned using a GPU / 2xNVIDIA Tesla V100 32GB.

	## Citation [this section is to be updated soon]

	If you find our work useful, please consider citing WASPer:

	```
	@article{...2024wasper,
	title={WASPer: Propaganda Detection in Bulgarian and English},
	author={....},
	journal={arXiv preprint arXiv:...},
	year={2024}
	}
	```