Model description

A sequence-to-sequence model fine-tuned to extract structured event summaries from European political party press releases and output strict JSON with four fields:

{
  "response_to_event": "Yes" | "No",
  "event_name": "string or null",
  "country": "string or null",
  "political_issue": "string or null"
}

Base model: facebook/mbart-large-50 (multilingual).
Languages in input: ~15–20 EU languages; output is always English.
Task type: information extraction / structured summarization.

Intended uses & limitations

Intended uses

Measure party responsiveness to specific events (e.g., extreme weather, crises).
Produce machine-readable metadata from press releases for downstream analysis.

Out of scope / limitations

Free-text fields (event_name, political_issue) can vary in wording; they are not normalized to a fixed ontology.
The model does not verify factual correctness; it summarizes based on the given text only.
Lower performance is expected on domains very different from party press releases, or on languages unseen in training.

How the training data was created

Collected party press releases from European political parties (multi-language).
Used a local LLM (Ollama + Qwen) to generate weak labels in strict JSON format with the schema above.
Cleaned labels with JSON validation and simple normalization (e.g., “Yes/No”, empty string → null).
De-duplicated inputs, truncated long texts, and split into train/validation.

Note: labels for event_name and political_issue are inherently noisy (free-text, long-tail), which is reflected in evaluation.

Validation results (held-out set)

(Exact strings from trainer.evaluate())

'eval_loss': 0.34201106429100037
'eval_json_valid_rate': 0.9991575400168492
'eval_exact_match_rate': 0.08003369839932603
'eval_response_to_event_f1': 0.8672566371681415
'eval_event_name_f1': 0.28046289993192647
'eval_country_f1': 0.8763845813026141
'eval_political_issue_f1': 0.14344449520328917
'epoch': 5.0

Example usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch, json

model_id = "https://huggingface.co/z-dickson/BART_political_event_detection"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id).to("cuda" if torch.cuda.is_available() else "cpu")


text = "Following the devastating floods in Slovenia, our party calls for stronger climate resilience measures."
inputs = tok(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128)
response = tok.decode(outputs[0], skip_special_tokens=True)
response_json = json.loads(response)
response_json

{
  "response_to_event": "Yes",
  "event_name": "Floods in Slovenia",
  "country": "Slovenia",
  "political_issue": "Climate adaptation policy"
}

Downloads last month: 73

Safetensors

Model size

0.6B params

Tensor type

F32