Update README.md
Browse files
README.md
CHANGED
@@ -6,13 +6,13 @@ Usage:
|
|
6 |
from transformers import MBartForConditionalGeneration, AutoModelForSeq2SeqLM
|
7 |
from transformers import AlbertTokenizer, AutoTokenizer
|
8 |
|
9 |
-
tokenizer = AutoTokenizer.from_pretrained("
|
10 |
|
11 |
-
# Or use tokenizer = AlbertTokenizer.from_pretrained("
|
12 |
|
13 |
-
model = AutoModelForSeq2SeqLM.from_pretrained("
|
14 |
|
15 |
-
# Or use model = MBartForConditionalGeneration.from_pretrained("
|
16 |
|
17 |
# Some initial mapping
|
18 |
bos_id = tokenizer._convert_token_to_id_with_added_voc("<s>")
|
@@ -60,4 +60,5 @@ print(decoded_output) # I am happy
|
|
60 |
Notes:
|
61 |
1. This is compatible with the latest version of transformers but was developed with version 4.3.2 so consider using 4.3.2 if possible.
|
62 |
2. While I have only shown how to let logits and loss and how to generate outputs, you can do pretty much everything the MBartForConditionalGeneration class can do as in https://huggingface.co/docs/transformers/model_doc/mbart#transformers.MBartForConditionalGeneration
|
63 |
-
3.
|
|
|
|
6 |
from transformers import MBartForConditionalGeneration, AutoModelForSeq2SeqLM
|
7 |
from transformers import AlbertTokenizer, AutoTokenizer
|
8 |
|
9 |
+
tokenizer = AutoTokenizer.from_pretrained("ai4bharat/IndicBART", do_lower_case=False, use_fast=False, keep_accents=True)
|
10 |
|
11 |
+
# Or use tokenizer = AlbertTokenizer.from_pretrained("ai4bharat/IndicBART", do_lower_case=False, use_fast=False, keep_accents=True)
|
12 |
|
13 |
+
model = AutoModelForSeq2SeqLM.from_pretrained("ai4bharat/IndicBART")
|
14 |
|
15 |
+
# Or use model = MBartForConditionalGeneration.from_pretrained("ai4bharat/IndicBART")
|
16 |
|
17 |
# Some initial mapping
|
18 |
bos_id = tokenizer._convert_token_to_id_with_added_voc("<s>")
|
|
|
60 |
Notes:
|
61 |
1. This is compatible with the latest version of transformers but was developed with version 4.3.2 so consider using 4.3.2 if possible.
|
62 |
2. While I have only shown how to let logits and loss and how to generate outputs, you can do pretty much everything the MBartForConditionalGeneration class can do as in https://huggingface.co/docs/transformers/model_doc/mbart#transformers.MBartForConditionalGeneration
|
63 |
+
3. If you wish to fine-tune this model, then you can do so using the toolkit YANMTT following the instructions here: https://github.com/AI4Bharat/indic-bart
|
64 |
+
4. Note that the tokenizer I have used is based on sentencepiece and not BPE. Therefore, I used the AlbertTokenizer class and not the MBartTokenizer class.
|