File size: 3,385 Bytes
d06990f 6ae335c f55ac29 d06990f b5faa13 6e7cb54 b5faa13 1443fc0 45d6f3d ea2d339 b5faa13 6e7cb54 b5faa13 6e7cb54 b5faa13 45d6f3d 6e7cb54 ad403b2 b5faa13 ad403b2 6e7cb54 af24a70 6e7cb54 b5faa13 6e7cb54 b5faa13 6e7cb54 b5faa13 6e7cb54 45d6f3d b5faa13 ad403b2 6e7cb54 b5faa13 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
---
language: de
widget:
- text: Das Sachgebiet Investive Ausgaben des Bundes Bundesfinanzminister Apel hat gemäß BMF Finanznachrichten vom 1. Januar erklärt, die Investitionsquote des Bundes sei in den letzten zehn Jahren nahezu konstant geblieben.
---
### Welcome to ParlBERT-Topic-German!
🏷 **Model description**
This model was trained on \~10k manually annotated interpellations (📚 [Breunig/ Schnatterer 2019](https://oxford.universitypressscholarship.com/view/10.1093/oso/9780198835332.001.0001/oso-9780198835332)) with topics from the [Comparative Agendas Project](https://www.comparativeagendas.net/datasets_codebooks) to classify text into one of twenty labels (annotation codebook).
_Note: "Interpellation is a formal request of a parliament to the respective government."([Wikipedia](https://en.wikipedia.org/wiki/Interpellation_(politics)))_
🗃 **Dataset**
| party | speeches | tokens |
|----|----|----|
| CDU/CSU | 7,635 | 4,862,654 |
| SPD | 5,321 | 3,158,315 |
| AfD | 3,465 | 1,844,707 |
| FDP | 3,067 | 1,593,108 |
| The Greens | 2,866 | 1,522,305 |
| The Left | 2,671 | 1,394,089 |
| cross-bencher | 200 | 86,170 |
🏃🏼♂️**Model training**
**ParlBERT-Topic-German** was fine-tuned on a domain adapted model (GermanBERT fine-tuned on [DeuParl](https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/2889?show=full)) for topic modeling with an interpellations dataset (📚 [Breunig/ Schnatterer 2019](https://oxford.universitypressscholarship.com/view/10.1093/oso/9780198835332.001.0001/oso-9780198835332)) from the [Comparative Agendas Project](https://www.comparativeagendas.net/datasets_codebooks).
🤖 **Use**
```python
from transformers import pipeline
pipeline_classification_topics = pipeline("text-classification", model="chkla/parlbert-topic-german", return_all_scores=False)
text = "Das Sachgebiet Investive Ausgaben des Bundes Bundesfinanzminister Apel hat gemäß BMF Finanznachrichten vom 1. Januar erklärt, die Investitionsquote des Bundes sei in den letzten zehn Jahren nahezu konstant geblieben."
pipeline_classification_topics(text) # Macroeconomics
```
📊 **Evaluation**
The model was evaluated on an evaluation set (20%):
| Label | F1 | support |
|----|----|----|
| International | 80.0 | 1,126 |
| Defense | 85.0 | 1,099 |
| Government | 71.3 | 989 |
| Civil Rights | 76.5 | 978 |
| Environment | 76.6 | 845 |
| Transportation | 86.0 | 800 |
| Law & Crime | 67.1 | 492 |
| Energy | 78.6 | 424 |
| Health | 78.2 | 418 |
| Domestic Com. | 64.4 | 382 |
| Immigration | 81.0 | 376 |
| Labor | 69.1 | 344 |
| Macroeconom. | 62.8 | 339 |
| Agriculture | 76.3 | 292 |
| Social Welfare | 49.2 | 253 |
| Technology | 63.0 | 252 |
| Education | 71.6 | 183 |
| Housing | 79.6 | 178 |
| Foreign Trade | 61.5 | 139 |
| Culture | 54.6 | 69 |
| Public Lands | 45.4 | 55 |
⚠️ **Limitations**
Models are often highly topic dependent. Therefore, the model may perform less well on different topics and text types not included in the training set.
👥 **Cite**
```
@article{klamm2022frameast,
title={FrameASt: A Framework for Second-level Agenda Setting in Parliamentary Debates through the Lense of Comparative Agenda Topics},
author={Klamm, Christopher and Rehbein, Ines and Ponzetto, Simone},
journal={ParlaCLARIN III at LREC2022},
year={2022}
}
```
🐦 Twitter: [@chklamm](http://twitter.com/chklamm) |