devngho commited on
Commit
116f30b
1 Parent(s): 486a066

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +300 -134
README.md CHANGED
@@ -1,199 +1,365 @@
1
  ---
2
  library_name: transformers
3
- tags: []
 
 
 
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
 
 
 
10
 
 
11
 
12
  ## Model Details
13
 
14
- ### Model Description
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
- <!-- Provide a longer summary of what this model is. -->
17
 
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
 
 
 
 
 
 
19
 
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
 
28
- ### Model Sources [optional]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
- <!-- Provide the basic links for the model. -->
31
 
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
 
36
- ## Uses
 
 
37
 
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
 
40
- ### Direct Use
 
 
 
 
41
 
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 
 
 
 
 
43
 
44
- [More Information Needed]
45
 
46
- ### Downstream Use [optional]
47
 
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
 
 
49
 
50
- [More Information Needed]
51
 
52
- ### Out-of-Scope Use
 
 
 
 
 
53
 
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 
 
55
 
56
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
57
 
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
 
97
- #### Speeds, Sizes, Times [optional]
98
 
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
 
101
- [More Information Needed]
102
 
103
- ## Evaluation
104
 
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
 
107
- ### Testing Data, Factors & Metrics
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
108
 
109
- #### Testing Data
110
 
111
- <!-- This should link to a Dataset Card if possible. -->
 
 
 
 
112
 
113
- [More Information Needed]
 
 
 
 
 
114
 
115
- #### Factors
 
116
 
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
 
119
- [More Information Needed]
120
 
121
- #### Metrics
122
 
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
 
 
 
 
 
 
 
 
124
 
125
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
126
 
127
- ### Results
128
 
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
 
137
- <!-- Relevant interpretability work for the model goes here -->
138
 
139
- [More Information Needed]
140
 
141
- ## Environmental Impact
142
 
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
 
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
 
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
 
153
- ## Technical Specifications [optional]
154
 
155
- ### Model Architecture and Objective
156
 
157
- [More Information Needed]
 
 
 
 
 
 
 
 
 
158
 
159
  ### Compute Infrastructure
160
 
161
- [More Information Needed]
162
 
163
  #### Hardware
164
 
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
 
171
- ## Citation [optional]
172
 
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
 
195
- [More Information Needed]
196
 
197
- ## Model Card Contact
198
 
199
- [More Information Needed]
 
1
  ---
2
  library_name: transformers
3
+ tags:
4
+ - nlp
5
+ - phi3
6
+ license: mit
7
+ datasets:
8
+ - HuggingFaceFW/fineweb-edu
9
+ language:
10
+ - en
11
  ---
12
 
 
13
 
14
+ # Model Card for devngho/phi-3-mini-4k-base
15
 
16
+ <!-- Provide a quick summary of what the model is/does. [Optional] -->
17
+ Finetuned from microsoft/Phi-3-mini-4k-instruct on fineweb-edu (about 150,000 samples).
18
 
19
+ Research supported with Cloud TPUs from Google's [TPU Research Cloud (TRC)](https://sites.research.google/trc/about/). ⚡
20
 
21
  ## Model Details
22
 
23
+ - **Developed by:** devngho
24
+ - **Language(s) (NLP):** en
25
+ - **License:** mit
26
+ - **Parent Model:** [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)
27
+
28
+ ## Samples
29
+
30
+ - max_tokens: 256
31
+ - \<s> token is omitted
32
+ - temperature: 0
33
+
34
+ ### Sample 1
35
+ Prompt:
36
+ ```md
37
+ Korean is written with an alphabet called Hangul, which was created in the 15th century by King Sejong the Great and his scholars. How King Sejong the Great created Hangul?
38
+ ```
39
+
40
+ <details>
41
+ <summary>
42
+ This model(devngho/phi-3-mini-4k-base):
43
+ </summary>
44
+ <p>
45
+ Korean is written with an alphabet called Hangul, which was created in the 15th century by King Sejong the Great and his scholars. How King Sejong the Great created Hangul?
46
+
47
+ A. He was inspired by the Chinese writing system.
48
+ B. He was inspired by the Japanese writing system.
49
+ C. He was inspired by the Mongolian writing system.
50
+ D. He was inspired by the Korean writing system.
51
+
52
+ - response: King Sejong the Great, the fourth king of the Joseon Dynasty in Korea, created Hangul in the 15th century with the aim of promoting literacy among the common people. Prior to Hangul, the Korean writing system was based on Classical Chinese characters, which were complex and difficult for the general population to learn.
53
+
54
+ King Sejong and his scholars were inspired by the need to create a writing system that was easy to learn and use, and that could accurately represent the sounds of the Korean language. They were not directly inspired by the Chinese, Japanese, or Mongolian writing systems, but rather by the desire to create a system that was uniquely Korean and accessible to all
55
+ </p>
56
+ </details>
57
+ <details>
58
+ <summary>
59
+ Original model(microsoft/Phi-3-mini-4k-instruct):
60
+ </summary>
61
+ <p>
62
+ Korean is written with an alphabet called Hangul, which was created in the 15th century by King Sejong the Great and his scholars. How King Sejong the Great created Hangul?<|end|>
63
+ </p>
64
+ </details>
65
+ <details>
66
+ <summary>
67
+ Other model(gpt-3.5-turbo-instruct - completions model):
68
+ </summary>
69
+ <p>
70
+ Korean is written with an alphabet called Hangul, which was created in the 15th century by King Sejong the Great and his scholars. How King Sejong the Great created Hangul?
71
+
72
+ King Sejong the Great wanted to create a writing system that was easy to learn and use for the common people, as the existing writing system at the time, Chinese characters, were difficult to learn and only accessible to the elite. He believed that a writing system that was easy to learn would improve literacy rates and promote education among the common people.
73
+
74
+ To create Hangul, King Sejong the Great gathered a group of scholars and linguists and tasked them with creating a new writing system. They studied the sounds of the Korean language and based the alphabet on the shape and position of the mouth when making these sounds.
75
+
76
+ The alphabet was initially called Hunminjeongeum, which means "the correct sounds for the instruction of the people." It consisted of 28 letters, 17 consonants and 11 vowels, and was designed to be simple and easy to learn. The letters were also organized into syllable blocks, making it easier to read and write.
77
+
78
+ Hangul was officially promulgated in 1446 and was met with resistance from the elite who saw it as a threat to their status. However, King Sejong the Great continued to promote and encourage the use of Hangul, and it eventually became the primary writing system in Korea.
79
+
80
+ Today, Hangul is
81
+ </p>
82
+ </details>
83
+
84
+ Author's Note: It's still seems to be instruct-following model, but the response is more better than the original model as it doesn't have the <|end|> token.
85
+
86
+ ### Sample 2
87
+ Prompt:
88
+ ```md
89
+ South Korea has experienced many wars in modern times. For instance,
90
+ ```
91
+
92
+ <details>
93
+ <summary>
94
+ This model(devngho/phi-3-mini-4k-base):
95
+ </summary>
96
+ <p>
97
+ South Korea has experienced many wars in modern times. For instance, the Korean War (1950-1953) was a conflict between North Korea and South Korea, with the United States and China supporting the former and the latter, respectively. The war ended in a stalemate, with the two Koreas remaining divided along the 38th parallel.
98
+
99
+ The Korean War had a significant impact on the country's economy. The war caused widespread destruction, with many cities and towns being destroyed. The war also disrupted trade and caused a shortage of resources. The South Korean government had to rely on foreign aid to rebuild the country.
100
+
101
+ One of the most significant impacts of the Korean War on the economy was the loss of human capital. Many young men were killed or injured during the war, and their families had to bear the burden of their loss. The war also caused a brain drain, with many skilled workers and professionals leaving the country to seek better opportunities abroad.
102
+
103
+ The Korean War also had a significant impact on the country's agriculture. The war disrupted the supply chain, and many farmers were unable to sell their produce. The government had to import
104
+ </p>
105
+ </details>
106
+ <details>
107
+ <summary>
108
+ Original model(microsoft/Phi-3-mini-4k-instruct):
109
+ </summary>
110
+ <p>
111
+ South Korea has experienced many wars in modern times. For instance, the Korean War (1950-1953) was a significant conflict that involved North and South Korea, as well as international forces. The war ended with an armistice agreement, but no formal peace treaty was signed, leaving the two Koreas technically still at war.
112
+
113
+ In addition to the Korean War, South Korea has faced other conflicts and tensions, such as the Korean DMZ Conflict (1966-1969) and the Jeju Uprising (1948). These events have shaped South Korea's history and continue to influence its relationships with neighboring countries.
114
+
115
+ Despite these conflicts, South Korea has made significant progress in terms of economic development, technological advancements, and cultural influence. The country has become a major player in the global economy, with a strong focus on industries such as electronics, automotive, and entertainment.
116
+
117
+ South Korea's success can be attributed to various factors, including its emphasis on education, innovation, and a strong work ethic. The country has also invested heavily in infrastructure and technology, which has
118
+ </p>
119
+ </details>
120
+ <details>
121
+ <summary>
122
+ Other model(gpt-3.5-turbo-instruct - completions model):
123
+ </summary>
124
+ <p>
125
+ South Korea has experienced many wars in modern times. For instance, the country was involved in the Korean War from 1950 to 1953, which was a conflict between North and South Korea. This war resulted in millions of casualties and the division of the Korean peninsula into two separate countries.
126
+
127
+ In addition, South Korea was also involved in the Vietnam War from 1964 to 1973, where it sent over 300,000 troops to support the United States and South Vietnam against North Vietnam and the Viet Cong. This war also resulted in thousands of casualties for South Korea.
128
+
129
+ In more recent times, South Korea has also faced tensions and conflicts with North Korea, including the Korean DMZ Conflict in 1966, the Korean Axe Murder Incident in 1976, and the Yeonpyeong Island shelling in 2010. These conflicts have resulted in casualties and heightened tensions between the two countries.
130
+
131
+ South Korea has also been involved in international conflicts, such as the Gulf War in 1991 and the War in Afghanistan from 2001 to 2014, where it sent troops to support the United States and its allies.
132
+
133
+ Overall, South Korea has a long history of experiencing wars and conflicts, both domestically and internationally. However, the country has also made efforts to promote peace and reconciliation, such as through the Sunshine Policy
134
+ </p>
135
+ </details>
136
+
137
+ Author's Note: These responses are very similar.
138
+
139
+ ### Sample 3
140
+ Prompt:
141
+ ```md
142
+ Hangul is a writing system made by King Sejong in
143
+ ```
144
+
145
+ <details>
146
+ <summary>
147
+ This model(devngho/phi-3-mini-4k-base):
148
+ </summary>
149
+ <p>
150
+ Hangul is a writing system made by King Sejong in 1443. It is the official writing system of Korea. Hangul is a phonetic alphabet, which means that the letters represent the sounds of the Korean language.
151
 
152
+ ## The Creation of Hangul
153
 
154
+ King Sejong wanted to create a writing system that was easy to learn and use for the common people. He believed that the existing writing system, which was based on Chinese characters, was too difficult for the average person to learn.
155
+
156
+ ## The Structure of Hangul
157
+
158
+ Hangul is made up of 14 consonants and 10 vowels. These letters are combined to form syllables, which are the building blocks of the Korean language. The consonants and vowels are arranged in a square-like structure, with each row and column representing a different sound.
159
+
160
+ ## The Importance of Hangul
161
 
162
+ Hangul is an important part of Korean culture and identity. It is used in all aspects of life, from education to business to entertainment. The Korean government has made efforts to promote the use of Hangul and preserve its cultural significance.
 
 
 
 
 
 
163
 
164
+ </p>
165
+ </details>
166
+ <details>
167
+ <summary>
168
+ Original model(microsoft/Phi-3-mini-4k-instruct):
169
+ </summary>
170
+ <p>
171
+ Hangul is a writing system made by King Sejong in the 15th century.
172
+ <|assistant|>
173
+ </p>
174
+ </details>
175
+ <details>
176
+ <summary>
177
+ Other model(gpt-3.5-turbo-instruct - completions model):
178
+ </summary>
179
+ <p>
180
+ Hangul is a writing system made by King Sejong in 1443. It is the official writing system of Korea and is used to write the Korean language. Hangul is unique in that it is a phonetic alphabet, meaning that each letter represents a specific sound. This makes it relatively easy to learn and use compared to other writing systems.
181
 
182
+ Hangul is made up of 14 consonants and 10 vowels, which can be combined to form syllables. The letters are arranged in a block-like structure, with each block representing a syllable. The consonants are written in a vertical line on the left side, while the vowels are written in a horizontal line on the top or bottom of the consonant.
183
 
184
+ One of the main reasons King Sejong created Hangul was to increase literacy among the common people. Before Hangul, the Korean language was written using Chinese characters, which were difficult to learn and only accessible to the elite. Hangul made it easier for people to learn how to read and write, leading to a significant increase in literacy rates in Korea.
 
 
185
 
186
+ Hangul has also played a significant role in preserving the Korean language and culture. During the Japanese occupation of Korea in the early 20th century, the use of Hangul was banned, and the Korean language was suppressed. However, after Korea gained independence,
187
+ </p>
188
+ </details>
189
 
190
+ Author's Note: Unlike the other samples, the model's response seems to be similar to general articles or gpt-3.5-turbo-instruct, not chat-style, as I hoped.
191
 
192
+ ### Sample 4
193
+ Prompt:
194
+ ```md
195
+ TPU(Tensor Processing Unit) was created by
196
+ ```
197
 
198
+ <details>
199
+ <summary>
200
+ This model(devngho/phi-3-mini-4k-base):
201
+ </summary>
202
+ <p>
203
+ TPU(Tensor Processing Unit) was created by Google to accelerate the training of deep neural networks. TPUs are designed to be highly efficient for the matrix operations that are common in deep learning. They are similar to GPUs but are optimized for TensorFlow, Google's machine learning framework.
204
 
205
+ Here's a simple example of how you can use TPUs in TensorFlow to train a neural network:
206
 
207
+ First, you need to install the TensorFlow package with TPU support:
208
 
209
+ ```bash
210
+ pip install tensorflow-gpu
211
+ ```
212
 
213
+ Then, you can set up your TPU environment:
214
 
215
+ ```python
216
+ import tensorflow as tf
217
+
218
+ # Check if TPU is available
219
+ tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
220
+ print('Running on TPU ', tpu.master())
221
 
222
+ # Initialize the TPU
223
+ tf.config.experimental_connect_to_cluster(tpu)
224
+ tf.tpu.experimental.initialize_tpu_system(tpu)
225
 
226
+ # Create a strategy for TPU distribution
227
+ strategy = tf.distribute.experimental.TPUStrategy(
228
+ ```
229
+ </p>
230
+ </details>
231
+ <details>
232
+ <summary>
233
+ Original model(microsoft/Phi-3-mini-4k-instruct):
234
+ </summary>
235
+ <p>
236
+ TPU(Tensor Processing Unit) was created by Google.
237
 
238
+ > TPU is a Google-designed accelerator that is optimized for
239
+ > training and inference of machine learning models.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
240
 
241
+ I'm wondering if there is any open source implementation of TPU?
242
 
243
+ I've found this one: https://github.com/tensorflow/models/tree/master/official/transformer
244
 
245
+ But it seems to be a very small part of TPU.
246
 
247
+ - [Response]: Tensor Processing Units (TPUs) are custom-designed hardware accelerators developed by Google to efficiently run machine learning workloads, particularly those involving neural networks. While TPUs are proprietary and not open-source hardware, there are open-source software implementations that can run on TPUs or similar hardware architectures.
248
 
249
+ The TensorFlow ecosystem, which is Google's open-source machine learning framework, has made significant strides in providing support for TPUs. TensorFlow provides a TPU-optimized version of its runtime, which can be used to run TensorFlow models on TPUs.
250
 
251
+ Here are some open-source implementations and resources related to TPU
252
+ </p>
253
+ </details>
254
+ <details>
255
+ <summary>
256
+ Other model(gpt-3.5-turbo-instruct - completions model):
257
+ </summary>
258
+ <p>
259
+ TPU(Tensor Processing Unit) was created by Google for the purpose of accelerating machine learning workloads. It is a specialized chip designed specifically for deep learning tasks, such as training and inference of neural networks.
260
+
261
+ TPUs are highly efficient and can provide significant speedups compared to traditional CPUs and GPUs. They are optimized for matrix multiplication, which is a key operation in neural network computations. This allows them to process large amounts of data in parallel, making them ideal for training and running complex models.
262
+
263
+ One of the main advantages of TPUs is their scalability. They can be used individually or in clusters, allowing for even faster processing of large datasets. They also have a high memory bandwidth, which enables them to handle large models and datasets without running into memory constraints.
264
+
265
+ Another key feature of TPUs is their low power consumption. They are designed to be energy-efficient, making them a more environmentally friendly option compared to traditional hardware.
266
+
267
+ TPUs are primarily used in Google's cloud computing platform, Google Cloud, but they can also be accessed through Google Colab, a free online platform for running machine learning experiments. This makes TPUs accessible to a wider range of users, including researchers and developers who may not have access to specialized hardware.
268
+
269
+ Overall, TPUs have revolutionized the field of deep learning by providing a powerful and efficient hardware solution
270
+ </p>
271
+ </details>
272
 
273
+ Author's Note: The model provides a simple example of how to use TPUs in TensorFlow, which is a good response. Original model is Q&A style.
274
 
275
+ ### Sample 5
276
+ Prompt:
277
+ ```md
278
+ Hello!
279
+ ```
280
 
281
+ <details>
282
+ <summary>
283
+ This model(devngho/phi-3-mini-4k-base):
284
+ </summary>
285
+ <p>
286
+ Hello! I'm your friendly AI chatbot, here to help you with your homework. Could you please tell me which subject and topic you need assistance with?
287
 
288
+ ## Conversation
289
+ **User**: Hey there! I'm working on a project for my history class, but it's a bit of a mix. We're looking at the impact of technology on society, but it's got to tie in with a historical event. Any ideas?
290
 
291
+ **Chatbot**: That sounds like an interesting project! How about exploring the impact of the printing press on the Reformation? The printing press was a significant technological advancement that greatly influenced the spread of ideas during the Reformation.
292
 
293
+ **User**: That's a good start, but I was thinking more along the lines of something more recent, like the internet.
294
 
295
+ **Chatbot**: Understood. You could examine the role of the internet in the Arab Spring. The widespread use of social media and the internet played a crucial role in organizing protests and disseminating information during that period.
296
 
297
+ **User**: Yeah, that's more like it. But I'
298
+ </p>
299
+ </details>
300
+ <details>
301
+ <summary>
302
+ Original model(microsoft/Phi-3-mini-4k-instruct):
303
+ </summary>
304
+ <p>
305
+ Hello! I'd be happy to help you with your SQL queries. Could you please describe your data or provide the table structure you're working with?
306
 
307
+ <|assistant|>
308
+ </p>
309
+ </details>
310
+ <details>
311
+ <summary>
312
+ Other model(gpt-3.5-turbo-instruct - completions model):
313
+ </summary>
314
+ <p>
315
+ Hello! I am a 22 year old female who is looking for a room to rent in the city of Toronto. I am a recent university graduate and will be starting a full-time job in the downtown area in September. I am a clean, responsible, and friendly individual who enjoys cooking, reading, and exploring the city. I am looking for a room in a shared house or apartment with other young professionals or students. My budget is around $800-1000 per month. Please contact me if you have a room available. Thank you!
316
+ </p>
317
+ </details>
318
 
319
+ Author's Note: The model's response is still instruct-following, it just doesn't use the <|assistant|> token. In contrast, gpt-3.5-turbo-instruct's response looks a normal introduction.
320
 
321
+ ## Bias, Risks, and Limitations
 
 
 
 
 
 
322
 
323
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
324
 
325
+ Since the base model is a instruct-following model, sometimes the model may generate chat-like responses (See Samples).
326
 
327
+ ## Training Details
328
 
329
+ ### Training Data
330
 
331
+ <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
332
 
333
+ [fineweb-edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) (about 150,000 samples)
 
 
 
 
334
 
 
335
 
336
+ ### Training Procedure
337
 
338
+ - batch_size: 64
339
+ - lr: 3e-5
340
+ - lr_scheduler: cosine
341
+ - torch_dtype: bfloat16
342
+ - warmup_ratio: 0.2
343
+ - optimizer: adamw
344
+ - seed: 42
345
+ - gradient_accumulation: 1
346
+ - gradient_checkpointing: true
347
+ - FSDPv2 (FSDP via SPMD)
348
 
349
  ### Compute Infrastructure
350
 
351
+ Google Cloud TPU
352
 
353
  #### Hardware
354
 
355
+ TPU v4-32, took ~12 hours.
 
 
 
 
356
 
357
+ Research supported with Cloud TPUs from Google's [TPU Research Cloud (TRC)](https://sites.research.google/trc/about/). ⚡
358
 
359
+ #### Software
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
360
 
361
+ transformers\~=4.41.2 torch\~=2.3.0 torch_xla\[tpu]\~=2.3.0
362
 
363
+ ### Train Results
364
 
365
+ - train/loss: 2.22385830132309