temporary0-0name
commited on
Commit
•
94c1ea9
1
Parent(s):
48c6d23
Upload 6 files
Browse files- README.md +156 -0
- config.json +28 -0
- generation_config.json +11 -0
- special_tokens_map.json +23 -0
- tokenizer.json +0 -0
- tokenizer_config.json +42 -0
README.md
ADDED
@@ -0,0 +1,156 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
widget:
|
3 |
+
- text: Once upon a time,
|
4 |
+
example_title: English
|
5 |
+
- text: भारत की राजधानी
|
6 |
+
example_title: Hindi
|
7 |
+
- text: ভারত বৈচিত্র্যের দিকে যাচ্ছিল
|
8 |
+
example_title: Bangla
|
9 |
+
- text: ભારત વિવિધતા તરફ જઈ રહ્યું હતું
|
10 |
+
example_title: Gujrati
|
11 |
+
pipeline_tag: text2text-generation
|
12 |
+
inference:
|
13 |
+
parameters:
|
14 |
+
max_new_tokens: 200
|
15 |
+
license: apache-2.0
|
16 |
+
datasets:
|
17 |
+
- soketlabs/bhasha-wiki
|
18 |
+
- soketlabs/bhasha-wiki-indic
|
19 |
+
- cerebras/SlimPajama-627B
|
20 |
+
- ai4bharat/sangraha
|
21 |
+
language:
|
22 |
+
- hi
|
23 |
+
- bn
|
24 |
+
- gu
|
25 |
+
- en
|
26 |
+
tags:
|
27 |
+
- indic
|
28 |
+
---
|
29 |
+
# Pragna-1b
|
30 |
+
|
31 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
32 |
+
|
33 |
+
![pragna-1b on huggingface](pragna_hf.png)
|
34 |
+
|
35 |
+
|
36 |
+
## Architecture Overview
|
37 |
+
|
38 |
+
Pragna-1B is a decoder-only transformer model inspired by TinyLlama, featuring the following specifications:
|
39 |
+
|
40 |
+
- Layers: 22
|
41 |
+
- Attention Heads: 32
|
42 |
+
- Context Length: 2048
|
43 |
+
- Hidden Dimension: 2048
|
44 |
+
- Expansion Dimension: 5632
|
45 |
+
- Vocabulary Size: 69632
|
46 |
+
|
47 |
+
This model incorporates Rotary Positional Encoding to infuse positional information into the embeddings, utilising a base of 10,000. It employs RSNorm with an epsilon value of 1e-5 and the Sigmoid Activation Unit (SiLU) as the activation function. Additionally, Pragna-1B adopts Grouped Query Attention, an alternative to Multi-Head Attention, which enhances training and inference speed while reducing memory bandwidth. This also supports the use of lower-compute devices for inference tasks.
|
48 |
+
|
49 |
+
Pragna-1B is trained on our proprietary platform, GenAI Studio, a modular AI Developer Platform designed to support any GenAI model architecture. It is capable of scaling across thousands of GPUs or accelerators and is built to be fault-tolerant. The development of this model leveraged Triton, an open-source language from OpenAI, for crafting high-performance custom fused CUDA Kernels for various operations. Furthermore, the model uses Fully Sharded Data Parallel (FSDP) for distributed and parallel training and incorporates the state-of-the-art FlashAttention2 to accelerate training and inference.
|
50 |
+
|
51 |
+
### Model Description
|
52 |
+
|
53 |
+
<!-- Provide a longer summary of what this model is. -->
|
54 |
+
|
55 |
+
- **Developed by:** [Soket AI Labs](http://soket.ai)
|
56 |
+
- **Language(s) (NLP):** Hindi, Bangla, Gujarati and English
|
57 |
+
- **License:** Apache 2.0
|
58 |
+
|
59 |
+
## Bias, Risks, and Limitations
|
60 |
+
|
61 |
+
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
62 |
+
|
63 |
+
[More Information Needed]
|
64 |
+
|
65 |
+
## How to Get Started with the Model
|
66 |
+
|
67 |
+
Use the code below to get started with the model.
|
68 |
+
```python
|
69 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
70 |
+
tokenizer = AutoTokenizer.from_pretrained("soketlabs/pragna-1b")
|
71 |
+
model = AutoModelForCausalLM.from_pretrained(
|
72 |
+
"soketlabs/pragna-1b", torch_dtype=torch.bfloat16
|
73 |
+
)
|
74 |
+
```
|
75 |
+
|
76 |
+
## Training Details
|
77 |
+
|
78 |
+
### Training Data
|
79 |
+
|
80 |
+
1. [Bhasha-wiki](https://soket.ai/blogs/bhasha_wiki_dataset)
|
81 |
+
2. [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B)
|
82 |
+
3. [Sangraha-Verified](https://huggingface.co/datasets/ai4bharat/sangraha)
|
83 |
+
|
84 |
+
### Training Procedure
|
85 |
+
|
86 |
+
[To be added]
|
87 |
+
|
88 |
+
|
89 |
+
#### Training Hyperparameters
|
90 |
+
|
91 |
+
- **Precision:** BFloat16
|
92 |
+
- **Batch Size:** 2k - 2.5k
|
93 |
+
- **Context Length:** 2,048
|
94 |
+
- **Learning Rate:** 3e-5
|
95 |
+
- **Optimizer:** AdamW
|
96 |
+
- **LR Scheduler:** Cosine
|
97 |
+
- **Mixed Precision Training**
|
98 |
+
|
99 |
+
## Evaluation
|
100 |
+
|
101 |
+
### Hindi
|
102 |
+
| | Arc-Easy | Arc-Challenge | Hellaswag | Average |
|
103 |
+
|--------------|----------|---------------|-----------|---------|
|
104 |
+
| pragna-1b | 0.33 | 0.22 | 0.35 | 0.30 |
|
105 |
+
| sarvamai/OpenHathi-7B-Hi-v0.1-Base | 0.3582 | 0.2645 | 0.4315 | 0.35 |
|
106 |
+
| meta-llama/Llama-2-7b-hf | 0.295 | 0.2406 | 0.3789 | 0.30 |
|
107 |
+
| google/gemma-7b | <b>0.5926</b> | <b>0.4258</b> | <b>0.6341</b> | <b>0.55</b> |
|
108 |
+
| meta-llama/Meta-Llama-3-8B | 0.5354 | 0.3541 | 0.6072 | 0.50 |
|
109 |
+
|
110 |
+
### Gujarati
|
111 |
+
| | Arc-Easy | Arc-Challenge | Hellaswag | Average |
|
112 |
+
|--------------|----------|---------------|-----------|---------|
|
113 |
+
| pragna-1b | 0.32 | 0.22 | 0.37 | 0.30 |
|
114 |
+
| google/gemma-7b | <b>0.4954</b> | <b>0.3208</b> | <b>0.5673</b> | <b>0.46</b> |
|
115 |
+
|
116 |
+
|
117 |
+
### English
|
118 |
+
|
119 |
+
| | Arc | Hellaswag | MMLU | TruthfulQA | Winogrande | GSM8K | Average |
|
120 |
+
|----------------|-----|-----------|------|------------|------------|-------|---------|
|
121 |
+
| pragna-1b | 0.3 | 0.51 | 0.27 | 0.38 | 0.56 | 0 | 0.34 |
|
122 |
+
| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | 0.36 | 0.61 | 0.25 | 0.37 | 0.61 | 0.02 | 0.37 |
|
123 |
+
| meta-llama/Meta-Llama-3-8B-Instruct | <b>0.6</b> | <b>0.82</b> | <b>0.67</b> | <b>0.51</b> | <b>0.77</b> | <b>0.68</b> | <b>0.66</b> |
|
124 |
+
| meta-llama/Llama-2-7b-hf | 0.53 | 0.78 | 0.46 | 0.39 | 0.74 | 0.14 | 0.51 |
|
125 |
+
|
126 |
+
|
127 |
+
### Results
|
128 |
+
|
129 |
+
| Context | Tokens Predicted |
|
130 |
+
|---------|------------------|
|
131 |
+
| India is one of the most | India is one of the most exciting places to travel to in Asia. The country hosts a wide variety of cultures and offers a plethora of attractions across the country. There's a lot you can do here and the tourists are always on the lookout for the new things to do and see. If you're looking to travel to India for your next adventure, you'll be pleased to know that you can find a ton of things to do in the country. From the temples to the beaches, there's a lot you can see and do here. One of the best things for visitors to do in India is to visit Jaipur. The capital city of Rajputana, Jaipur is a beautiful city that's known as the "City of Lives". |
|
132 |
+
| नरेंद्र मोदी का जन्म | नरेंद्र मोदी का जन्मदिन आज मनाया जा रहा है। इस मौके पर प्रधानमंत्री नरेंद्र मोदी ने ट्वीट कर लिखा कि 'आज देश को एक नया संसद भवन मिला है, जिसे आप लोगों ने बधाई दी है। भारत और भारतवासियों को यह संसद बहुत ही महत्त्वपूर्ण है और इसका उपयोग देश के विकास में बहुत अधिक योगदान देगा। आज हमारे देश में एक संसद बनी हुई है जो हमारे देश की लोकतांत्रिक प्रक्रियाओं को मजबूत बनाती है। |
|
133 |
+
| Prime minister Modi | Prime minister Modi's visit to China: PM Modi is scheduled to arrive in Beijing on Thursday. PM Narendra Modi will be the first Indian prime minister to visit China since former President Mao Zedong, who visited India in 1959 and was assassinated by Chinese soldiers a year later. The two leaders are expected to discuss regional security issues such as terrorism, trade and energy cooperation during their talks. |
|
134 |
+
| महात्मा गांधी | महात्मा गांधी एक स्वतंत्र व्यक्ति, एक आत्मनिर्भर और एक अत्यंत लोकप्रिय नेता थे। उन्होंने अपने जीवनकाल में देश को एक नई दिशा देने के लिए बहुत कुछ किया। समाज के विभिन्न क्षेत्रों में उनके योगदान के लिए उन्हें सम्मानित किया गया था। |
|
135 |
+
| ब्रिटिश राजशाही | ब्रिटिश राजशाही के विघटन के बाद, इंग्लैंड और वेल्स में ब्रिटेन की संसद का गठन हुआ। संसद ने 1807 में अपने पहले अधिनियम, "द क्राउन की विधेयक" (एक्ट ऑफ द डेमोक्रैटिक) को मंजूरी दी, जिसमें प्रत्येक राज्य से एक निर्वाचित सदस्य चुना गया। |
|
136 |
+
| মহাত্মা গান্ধী | মহাত্মা গান্ধী ছিলেন আন্তর্জাতিক সমাজতান্ত্রিক আন্দোলনের অন্যতম প্রধান নেতা। তিনি ভারতের স্বাধীনতা সংগ্রামে তাঁর ভূমিকা অপরিসীম। ১৯৪৭ সালের ১৫ আগস্ট ভারতে বিপ্লব ঘটে। ১৯৪৮ সালে মহাত্মা গান্ধীর মৃত্যু হয়। এরপর থেকে মহাত্মা আর দেখা যায় না। |
|
137 |
+
| নরেন্দ্র মোদীর জন্ম | নরেন্দ্র মোদীর জন্মদিনের শুভেচ্ছা জানিয়েছেন অমিতাভ বচ্চন। প্রসঙ্গত, গোটা দেশ জুড়ে চলছে নববর্ষ উৎসব। আর এই উৎসবে নিজেকে সামিল করতে চান না কেউই। এমনটাই জানালেন বলিউড অভিনেত্রী সারা আলি খান। |
|
138 |
+
| એક વખત | ક વખતે, તેમણે તેમના મિત્ર અને સહાયક વિલિયમ હેનરી મેકડોનાલ્ડ સાથે મળીને "ધ ઓરિજિનલ મેન ઓફ ધ ડેડ" નામના પુસ્તકનું નિર્માણ કર્યું હતું; તે 1961માં પ્રકાશિત થયું હતું. 1958માં, તેમણે ધ ન્યૂ યોર્કર માટે લેખક અને સંપાદક પિતા તરીકે કામ કર્યું હતું. |
|
139 |
+
| એક સમયે એક રાજા જંગલમાં રહેતો ���તો. તેમણે | એક સમયે એક રાજા જંગલમાં રહેતો હતો. તેમણે, એક દિવસ રાજાને કહ્યું, "હું તમને મારી જાતને એક સ્ત્રી તરીકે આપું છું." રાજા ખુશ થયો. તેમણે તેને તેના ઘરમાં લઈ ગયા. તે દિવસે રાણી પોતાના પતિ સાથે ઘરે આવી અને કહ્યું કે તે તેની પાસે છે. રાજાએ તેને પૂછ્યું કે શું તેણે તેની સાથે લગ્ન કર્યા છે ? રાજા અને રાણીએ પૂછ્યું - હા. ત્યારે તેમણે કહ્યું કે, "તમે મને તમારી સાથે લગ્ન કરવાનું વચન આપ્યું છે. હવે મારે તમારા પર વિશ્વાસ રાખવો જોઈએ નહીં". |
|
140 |
+
|
141 |
+
|
142 |
+
## Citation
|
143 |
+
|
144 |
+
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
145 |
+
|
146 |
+
```
|
147 |
+
@ONLINE{bhasha-wiki,
|
148 |
+
author = "Soket Labs Technology and Research Private Limited",
|
149 |
+
title = "pragna-1b",
|
150 |
+
url = "https://soket.ai"
|
151 |
+
}
|
152 |
+
```
|
153 |
+
|
154 |
+
## Model Card Contact
|
155 |
+
|
156 |
+
connect@soket.ai
|
config.json
ADDED
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "soketlabs/pragna-1b",
|
3 |
+
"architectures": [
|
4 |
+
"LlamaForCausalLM"
|
5 |
+
],
|
6 |
+
"attention_bias": false,
|
7 |
+
"attention_dropout": 0.0,
|
8 |
+
"bos_token_id": 1,
|
9 |
+
"eos_token_id": 2,
|
10 |
+
"hidden_act": "silu",
|
11 |
+
"hidden_size": 2048,
|
12 |
+
"initializer_range": 0.02,
|
13 |
+
"intermediate_size": 5632,
|
14 |
+
"max_position_embeddings": 2048,
|
15 |
+
"model_type": "llama",
|
16 |
+
"num_attention_heads": 32,
|
17 |
+
"num_hidden_layers": 22,
|
18 |
+
"num_key_value_heads": 4,
|
19 |
+
"pretraining_tp": 1,
|
20 |
+
"rms_norm_eps": 1e-05,
|
21 |
+
"rope_scaling": null,
|
22 |
+
"rope_theta": 10000.0,
|
23 |
+
"tie_word_embeddings": false,
|
24 |
+
"torch_dtype": "bfloat16",
|
25 |
+
"transformers_version": "4.36.2",
|
26 |
+
"use_cache": true,
|
27 |
+
"vocab_size": 67991
|
28 |
+
}
|
generation_config.json
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"bos_token_id": 1,
|
3 |
+
"eos_token_id": 2,
|
4 |
+
"max_length": 2048,
|
5 |
+
"pad_token_id": 0,
|
6 |
+
"transformers_version": "4.36.2",
|
7 |
+
"do_sample": "True",
|
8 |
+
"top_k": 10,
|
9 |
+
"temperature": 0.8,
|
10 |
+
"max_new_tokens": 512
|
11 |
+
}
|
special_tokens_map.json
ADDED
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"bos_token": {
|
3 |
+
"content": "<s>",
|
4 |
+
"lstrip": false,
|
5 |
+
"normalized": false,
|
6 |
+
"rstrip": false,
|
7 |
+
"single_word": false
|
8 |
+
},
|
9 |
+
"eos_token": {
|
10 |
+
"content": "</s>",
|
11 |
+
"lstrip": false,
|
12 |
+
"normalized": false,
|
13 |
+
"rstrip": false,
|
14 |
+
"single_word": false
|
15 |
+
},
|
16 |
+
"unk_token": {
|
17 |
+
"content": "<unk>",
|
18 |
+
"lstrip": false,
|
19 |
+
"normalized": false,
|
20 |
+
"rstrip": false,
|
21 |
+
"single_word": false
|
22 |
+
}
|
23 |
+
}
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer_config.json
ADDED
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"add_bos_token": true,
|
3 |
+
"add_eos_token": false,
|
4 |
+
"added_tokens_decoder": {
|
5 |
+
"0": {
|
6 |
+
"content": "<unk>",
|
7 |
+
"lstrip": false,
|
8 |
+
"normalized": false,
|
9 |
+
"rstrip": false,
|
10 |
+
"single_word": false,
|
11 |
+
"special": true
|
12 |
+
},
|
13 |
+
"1": {
|
14 |
+
"content": "<s>",
|
15 |
+
"lstrip": false,
|
16 |
+
"normalized": false,
|
17 |
+
"rstrip": false,
|
18 |
+
"single_word": false,
|
19 |
+
"special": true
|
20 |
+
},
|
21 |
+
"2": {
|
22 |
+
"content": "</s>",
|
23 |
+
"lstrip": false,
|
24 |
+
"normalized": false,
|
25 |
+
"rstrip": false,
|
26 |
+
"single_word": false,
|
27 |
+
"special": true
|
28 |
+
}
|
29 |
+
},
|
30 |
+
"bos_token": "<s>",
|
31 |
+
"chat_template": "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n' + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}",
|
32 |
+
"clean_up_tokenization_spaces": false,
|
33 |
+
"eos_token": "</s>",
|
34 |
+
"legacy": false,
|
35 |
+
"model_max_length": 1000000000000000019884624838656,
|
36 |
+
"pad_token": "</s>",
|
37 |
+
"padding_side": "right",
|
38 |
+
"sp_model_kwargs": {},
|
39 |
+
"tokenizer_class": "LlamaTokenizer",
|
40 |
+
"unk_token": "<unk>",
|
41 |
+
"use_default_system_prompt": false
|
42 |
+
}
|