KhangHatto commited on
Commit
242e329
·
1 Parent(s): b4bbf24

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -14,6 +14,7 @@ tags:
14
  - summarization
15
  - translation
16
  - question-answering
 
17
  ---
18
  ## Extend vocabulary and Pretrain
19
  We utilized [SentencePiece](https://github.com/google/sentencepiece) to retrain a tokenizer for Vietnamese, English, and Chinese. This newly trained tokenizer's vocabulary was then combined with Flan-T5's original vocabulary, eliminating any duplicate tokens. The resulting merged vocabulary consists of 106611 tokens.
 
14
  - summarization
15
  - translation
16
  - question-answering
17
+ pipeline_tag: fill-mask
18
  ---
19
  ## Extend vocabulary and Pretrain
20
  We utilized [SentencePiece](https://github.com/google/sentencepiece) to retrain a tokenizer for Vietnamese, English, and Chinese. This newly trained tokenizer's vocabulary was then combined with Flan-T5's original vocabulary, eliminating any duplicate tokens. The resulting merged vocabulary consists of 106611 tokens.