mmochtak
/

authdetect

PyTorch

English

roberta

Model card Files Files and versions Community

mmochtak commited on Nov 6, 2024

Commit

5f0dea5

verified ·

1 Parent(s): 948d465

paper link; update model card

Browse files

Files changed (1) hide show

README.md +10 -6

README.md CHANGED Viewed

@@ -69,15 +69,19 @@ print(sentence_df)
 ```
 **If you use the model, please cite:**
 ```
 @article{mochtak_chasing_2024,
-	title = {Chasing the {Authoritarian} {Specter}: {Detecting} {Authoritarian} {Discourse} with {Large} {Language} {Models}},
-	volume = {forthcoming},
-	abstract = {The paper introduces a deep-learning model fine-tuned for detecting authoritarian discourse in political speeches. Set up as a regression problem with weak supervision logic, the model is trained for the task of classification of segments of text for being/not being associated with authoritarian discourse. Rather than trying to define what an authoritarian discourse is, the model builds on the assumption that authoritarian leaders inherently define it. In other words, authoritarian leaders talk like authoritarians. When combined with the discourse defined by democratic leaders, the model learns the instances that are more often associated with authoritarians on the one hand and democrats on the other. The paper discusses several evaluation tests using the model and advocates for its usefulness in a broad range of research problems. It presents a new methodology for studying latent political concepts and positions it as an alternative to more traditional research strategies.},
-	language = {en},
-	journal = {European Journal of Political Resarch},
 	author = {Mochtak, Michal},
-	year = {2024},
 }
 ```

 ```
+**Known biases and issues**
+This model, like all machine learning models, exhibits biases shaped by its training data and task-specific nuances. Trained primarily on speeches from the UN General Assembly, it has learned discourse patterns unique to that context, which may influence how it classifies leaders along the authoritarian-democratic spectrum. This limitation is compounded by a slight imbalance in the training data, which skews towards authoritarian discourse (mean = 0.430). Although no systematic bias was detected in testing, the model may occasionally lean towards assigning lower values in certain cases. Additionally, the model’s classification may be sensitive to cultural or ideological markers, such as religious phrases commonly used by leaders from majority-Muslim countries, or ideological language like "comrades," which is often associated with authoritarian states. These biases can influence the model’s predictions and may be more apparent with shorter texts or less structured data formats, such as tweets or informal statements. While the model performs best with longer texts, evaluation on any new format, both qualitative and quantitative, is highly recommended to ensure robust performance. Fine-tuning may be necessary to mitigate specific biases and enhance reliability across different applications.
 **If you use the model, please cite:**
 ```
 @article{mochtak_chasing_2024,
+	title = {Chasing the authoritarian spectre: {Detecting} authoritarian discourse with large language models},
+	issn = {1475-6765},
+	shorttitle = {Chasing the authoritarian spectre},
+	url = {https://onlinelibrary.wiley.com/doi/abs/10.1111/1475-6765.12740},
+	doi = {10.1111/1475-6765.12740},
+	journal = {European Journal of Political Research},
 	author = {Mochtak, Michal},
+	keywords = {authoritarian discourse, deep learning, detecting authoritarianism, model, political discourse},
 }
 ```