ku-nlp
/

deberta-v3-base-japanese

@@ -66,7 +66,7 @@ The tokenizer of this model is based on [huggingface/tokenizers](https://github.
 The vocabulary entries were converted from [`llm-jp-tokenizer v2.2 (100k)`](https://github.com/llm-jp/llm-jp-tokenizer/releases/tag/v2.2).
 Please refer to [README.md](https://github.com/llm-jp/llm-jp-tokenizer) of `llm-jp/llm-ja-tokenizer` for details on the vocabulary construction procedure.
-Note that unlike [ku-nlp/deberta-v2-base-japanese](https://huggingface.co/ku-nlp/deberta-v2-base-japanese), pre-segmentation by a morphological analyzer (e.g., Juman++) is no longer required for this model.
 ## Training data
@@ -103,6 +103,23 @@ The following hyperparameters were used during pre-training:
 - training_steps: 475,000
 - warmup_steps: 10,000
 ## Acknowledgments
 This work was supported by Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures (JHPCN) through General Collaboration Project no. jh221004, "Developing a Platform for Constructing and Sharing of Large-Scale Japanese Language Models".

 The vocabulary entries were converted from [`llm-jp-tokenizer v2.2 (100k)`](https://github.com/llm-jp/llm-jp-tokenizer/releases/tag/v2.2).
 Please refer to [README.md](https://github.com/llm-jp/llm-jp-tokenizer) of `llm-jp/llm-ja-tokenizer` for details on the vocabulary construction procedure.
+Note that, unlike [ku-nlp/deberta-v2-base-japanese](https://huggingface.co/ku-nlp/deberta-v2-base-japanese), pre-segmentation by a morphological analyzer (e.g., Juman++) is no longer required for this model.
 ## Training data
 - training_steps: 475,000
 - warmup_steps: 10,000
+## Fine-tuning on NLU tasks
+We fine-tuned the following models and evaluated them on the dev set of JGLUE.
+We tuned the learning rate and training epochs for each model and task following [the JGLUE paper](https://www.jstage.jst.go.jp/article/jnlp/30/1/30_63/_pdf/-char/ja).
+| Model                         | MARC-ja/acc | JCoLA/acc | JSTS/pearson | JSTS/spearman | JNLI/acc | JSQuAD/EM | JSQuAD/F1 | JComQA/acc |
+|-------------------------------|-------------|-----------|--------------|---------------|----------|-----------|-----------|------------|
+| Waseda RoBERTa base           | 0.965       | 0.867     | 0.913        | 0.876         | 0.905    | 0.853     | 0.916     | 0.853      |
+| Waseda RoBERTa large (seq512) | 0.969       | 0.849     | 0.925        | 0.890         | 0.928    | 0.910     | 0.955     | 0.900      |
+| LUKE Japanese base*           | 0.965       | -         | 0.916        | 0.877         | 0.912    | -         | -         | 0.842      |
+| LUKE Japanese large*          | 0.965       | -         | 0.932        | 0.902         | 0.927    | -         | -         | 0.893      |
+| DeBERTaV2 base                | 0.970       | 0.879     | 0.922        | 0.886         | 0.922    | 0.899     | 0.951     | 0.873      |
+| DeBERTaV2 large               | 0.968       | 0.882     | 0.925        | 0.892         | 0.924    | 0.912     | 0.959     | 0.890      |
+| DeBERTaV3 base                | 0.960       | 0.878     | 0.927        | 0.891         | 0.927    | 0.896     | 0.947     | 0.875      |
+*The scores of LUKE are from [the official repository](https://github.com/studio-ousia/luke).
 ## Acknowledgments
 This work was supported by Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures (JHPCN) through General Collaboration Project no. jh221004, "Developing a Platform for Constructing and Sharing of Large-Scale Japanese Language Models".

tokenizer.json CHANGED Viewed

@@ -387631,6 +387631,6 @@
         -12.989911079406738
       ]
     ],
-    "byte_fallback": false
   }
 }

         -12.989911079406738
       ]
     ],
+    "byte_fallback": true
   }
 }