I have trained a multilingual version of ModernBert

#60
by neavo - opened

Overview

  • ModernBertMultilingual is a multilingual model trained from scratch, using the ModernBERT-base architecture.
  • It supports four languages and their variants, including Simplified Chinese, Traditional Chinese, English, Japanese, and Korean
  • And can effectively handle mixed-text tasks in East Asian languages.

Release Versions

  • Three different weight versions are provided:
    • base: The version trained with general base data, suitable for various domain texts (default).
    • nodecay: The checkpoint before the annealing phase begins, which allows you to add domain-specific data for annealing to better adapt to the target domain.
    • keyword_gacha_multilingual: The version annealed with ACGN-related texts (e.g., light novels, game scripts, comic scripts, etc.).
Model Version Description
modern_bert_multilingual 20250128 base
modern_bert_multilingual_nodecay 20250128 nodecay
keyword_gacha_base_multilingual 20250128 keyword_gacha_multilingual

Sign up or log in to comment