rausch's picture
Add PL-Base-CP paper model
0449f98 verified
metadata
language:
  - pl
base_model:
  - allegro/plt5-base
datasets:
  - scilons/SciLaD-all-text-v1
library_name: transformers
tags:
  - t5
  - seq2seq
  - text-to-text
  - scientific-language-models
  - cross-lingual-transfer
  - wechsel
  - global-mmlu

PL-Base-CP

Polish monolingual base model continued on the SciLaD target-language split as a 15k-step control baseline.

Model Details

This is a monolingual continued-pretraining control checkpoint reported in the paper table. It is provided for reproducibility of the baseline comparison.

Evaluation

Zero-shot Global-MMLU accuracy reported by the paper aggregation:

Metric Accuracy
Average 24.65
STEM 23.88
Humanities 24.51
Social Sciences 23.43
Other 26.87

Limitations

The model is evaluated primarily with zero-shot Global-MMLU. Downstream task-specific evaluation is recommended before deployment in specialized scientific workflows.

Citation

  • Title: Transferring Scientific English Pre-Trained Language Models to Multiple Languages Using Cross-Lingual Transfer
  • Authors: Nikolas Rauscher, Fabio Barth, Georg Rehm
  • Venue: LREC-COLING 2026, citation details TBA after publication