Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ For detailed documentation, look here: https://github.com/AI4Bharat/indic-bart/
|
|
12 |
|
13 |
# Pre-training corpus
|
14 |
|
15 |
-
We used the IndicCorp data spanning 12 languages with 452 million sentences (9 billion tokens). The model was trained using the text-infilling objective used in mBART.
|
16 |
|
17 |
# Usage:
|
18 |
|
@@ -78,7 +78,7 @@ print(decoded_output) # I am happy
|
|
78 |
|
79 |
# Fine-tuning on a downstream task
|
80 |
|
81 |
-
1. If you wish to fine-tune this model, then you can do so using the toolkit <a href="https://github.com/prajdabre/yanmtt">YANMTT</a> following the instructions <a href="https://github.com/AI4Bharat/indic-bart ">here
|
82 |
2. (Untested) Alternatively, you may use the official huggingface scripts for <a href="https://github.com/huggingface/transformers/tree/master/examples/pytorch/translation">translation</a> and <a href="https://github.com/huggingface/transformers/tree/master/examples/pytorch/summarization">summarization</a>.
|
83 |
|
84 |
# Contributors
|
|
|
12 |
|
13 |
# Pre-training corpus
|
14 |
|
15 |
+
We used the <a href="https://indicnlp.ai4bharat.org/corpora/">IndicCorp</a> data spanning 12 languages with 452 million sentences (9 billion tokens). The model was trained using the text-infilling objective used in mBART.
|
16 |
|
17 |
# Usage:
|
18 |
|
|
|
78 |
|
79 |
# Fine-tuning on a downstream task
|
80 |
|
81 |
+
1. If you wish to fine-tune this model, then you can do so using the toolkit <a href="https://github.com/prajdabre/yanmtt">YANMTT</a> following the instructions <a href="https://github.com/AI4Bharat/indic-bart ">here</a>.
|
82 |
2. (Untested) Alternatively, you may use the official huggingface scripts for <a href="https://github.com/huggingface/transformers/tree/master/examples/pytorch/translation">translation</a> and <a href="https://github.com/huggingface/transformers/tree/master/examples/pytorch/summarization">summarization</a>.
|
83 |
|
84 |
# Contributors
|