Update README.md
Browse files
README.md
CHANGED
@@ -31,19 +31,18 @@ widget:
|
|
31 |
|
32 |
# Model Card for Ganga-1b! 🌊
|
33 |
|
34 |
-
The base model **``Ganga-1b``** trained on a monolingual **Hindi** language dataset as part of ***Project Unity***.
|
35 |
|
36 |
-
|
37 |
-

|
38 |
|
39 |
|
40 |
-
|
41 |
|
42 |
|
43 |
|
44 |
### Model Description 📚
|
45 |
|
46 |
-
Project Unity is an initiative aimed at addressing **India's linguistic diversity** and richness by creating a comprehensive resource that covers the country's major languages. Our goal is to achieve state-of-the-art performance in understanding and generating text in **Indian languages**.
|
47 |
To achieve this, we train models on the monolingual regional languages of India. Our first release is the *Ganga-1B* model, *which has been trained on a large dataset of public domain web-crawled hindi language data, including news articles, web documents, books, government publications, educational materials, and social media conversations (filtered for quality)*. Additionally, the dataset has been further curated by native Indian speakers to ensure high-quality.
|
48 |
Importantly, the **Ganga-1B** model outperforms existing open-source models that support **Indian languages**, even at sizes of up to **7 billion parameters**.
|
49 |
|
|
|
31 |
|
32 |
# Model Card for Ganga-1b! 🌊
|
33 |
|
34 |
+
The base model **``Ganga-1b``** trained on a monolingual **Hindi** language dataset as part of ***Project Unity***. We propose the name *Ganga* 🌊 to honor the longest river flowing through the Hindi-speaking region of India 🇮🇳.
|
35 |
|
36 |
+
<br> *(The first pre-trained Hindi model by any academic research lab in India 🇮🇳!)**
|
|
|
37 |
|
38 |
|
39 |
+

|
40 |
|
41 |
|
42 |
|
43 |
### Model Description 📚
|
44 |
|
45 |
+
**Project Unity** is an initiative aimed at addressing **India's linguistic diversity** and richness by creating a comprehensive resource that covers the country's major languages. Our goal is to achieve state-of-the-art performance in understanding and generating text in **Indian languages**.
|
46 |
To achieve this, we train models on the monolingual regional languages of India. Our first release is the *Ganga-1B* model, *which has been trained on a large dataset of public domain web-crawled hindi language data, including news articles, web documents, books, government publications, educational materials, and social media conversations (filtered for quality)*. Additionally, the dataset has been further curated by native Indian speakers to ensure high-quality.
|
47 |
Importantly, the **Ganga-1B** model outperforms existing open-source models that support **Indian languages**, even at sizes of up to **7 billion parameters**.
|
48 |
|