Model description
A sequence-to-sequence model fine-tuned to extract structured event summaries from European political party press releases and output strict JSON with four fields:
{
"response_to_event": "Yes" | "No",
"event_name": "string or null",
"country": "string or null",
"political_issue": "string or null"
}
- Base model: facebook/mbart-large-50 (multilingual).
- Languages in input: ~15–20 EU languages; output is always English.
- Task type: information extraction / structured summarization.
Intended uses & limitations
Intended uses
- Measure party responsiveness to specific events (e.g., extreme weather, crises).
- Produce machine-readable metadata from press releases for downstream analysis.
Out of scope / limitations
- Free-text fields (event_name, political_issue) can vary in wording; they are not normalized to a fixed ontology.
- The model does not verify factual correctness; it summarizes based on the given text only.
- Lower performance is expected on domains very different from party press releases, or on languages unseen in training.
How the training data was created
- Collected party press releases from European political parties (multi-language).
- Used a local LLM (Ollama + Qwen) to generate weak labels in strict JSON format with the schema above.
- Cleaned labels with JSON validation and simple normalization (e.g., “Yes/No”, empty string → null).
- De-duplicated inputs, truncated long texts, and split into train/validation.
Note: labels for event_name and political_issue are inherently noisy (free-text, long-tail), which is reflected in evaluation.
Validation results (held-out set)
(Exact strings from trainer.evaluate())
- 'eval_loss': 0.34201106429100037
- 'eval_json_valid_rate': 0.9991575400168492
- 'eval_exact_match_rate': 0.08003369839932603
- 'eval_response_to_event_f1': 0.8672566371681415
- 'eval_event_name_f1': 0.28046289993192647
- 'eval_country_f1': 0.8763845813026141
- 'eval_political_issue_f1': 0.14344449520328917
- 'epoch': 5.0
Example usage
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch, json
model_id = "https://huggingface.co/z-dickson/BART_political_event_detection"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id).to("cuda" if torch.cuda.is_available() else "cpu")
text = "Following the devastating floods in Slovenia, our party calls for stronger climate resilience measures."
inputs = tok(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128)
response = tok.decode(outputs[0], skip_special_tokens=True)
response_json = json.loads(response)
response_json
{
"response_to_event": "Yes",
"event_name": "Floods in Slovenia",
"country": "Slovenia",
"political_issue": "Climate adaptation policy"
}
- Downloads last month
- 73