I have trained a multilingual version of ModernBert
#60
by
neavo
- opened
Overview
- ModernBertMultilingual is a multilingual model trained from scratch, using the ModernBERT-base architecture.
- It supports four languages and their variants, including
Simplified Chinese
,Traditional Chinese
,English
,Japanese
, andKorean
- And can effectively handle mixed-text tasks in East Asian languages.
Release Versions
- Three different weight versions are provided:
- base: The version trained with general base data, suitable for various domain texts (default).
- nodecay: The checkpoint before the annealing phase begins, which allows you to add domain-specific data for annealing to better adapt to the target domain.
- keyword_gacha_multilingual: The version annealed with ACGN-related texts (e.g.,
light novels
,game scripts
,comic scripts
, etc.).
Model | Version | Description |
---|---|---|
modern_bert_multilingual | 20250128 | base |
modern_bert_multilingual_nodecay | 20250128 | nodecay |
keyword_gacha_base_multilingual | 20250128 | keyword_gacha_multilingual |