georeactor
commited on
Commit
•
94c694a
1
Parent(s):
35f9592
recommend larger models
Browse files
README.md
CHANGED
@@ -6,7 +6,9 @@ language: hi
|
|
6 |
|
7 |
This is a first attempt at a Hindi language model trained with Google Research's [ELECTRA](https://github.com/google-research/electra).
|
8 |
|
9 |
-
**
|
|
|
|
|
10 |
|
11 |
<a href="https://colab.research.google.com/drive/1R8TciRSM7BONJRBc9CBZbzOmz39FTLl_">Tokenization and training CoLab</a>
|
12 |
|
|
|
6 |
|
7 |
This is a first attempt at a Hindi language model trained with Google Research's [ELECTRA](https://github.com/google-research/electra).
|
8 |
|
9 |
+
**As of 2022 I recommend Google's MuRIL model trained on English, Hindi, and other major Indian languages, both in their script and latinized script**: https://huggingface.co/google/muril-base-cased and https://huggingface.co/google/muril-large-cased
|
10 |
+
|
11 |
+
**For causal language models, I would suggest SberBank / mGPT, though this is a large model**
|
12 |
|
13 |
<a href="https://colab.research.google.com/drive/1R8TciRSM7BONJRBc9CBZbzOmz39FTLl_">Tokenization and training CoLab</a>
|
14 |
|