Update README.md
Browse files
README.md
CHANGED
@@ -44,9 +44,6 @@ We sampled 16B tokens from the following datasets for training:
|
|
44 |
</tr>
|
45 |
</table>
|
46 |
|
47 |
-
We trained this model using a context length of 4k due to resource limitations and to maximize training speed.
|
48 |
-
However, the original model was trained with a context length of 8k, so an 8k context length could work well in downstream tasks.
|
49 |
-
|
50 |
### Hyperparameters
|
51 |
|
52 |
<table>
|
@@ -142,6 +139,10 @@ We evaluated this model using both English and Korean benchmarks, and compared i
|
|
142 |
</tr>
|
143 |
</table>
|
144 |
|
|
|
|
|
|
|
|
|
145 |
|
146 |
## License
|
147 |
|
|
|
44 |
</tr>
|
45 |
</table>
|
46 |
|
|
|
|
|
|
|
47 |
### Hyperparameters
|
48 |
|
49 |
<table>
|
|
|
139 |
</tr>
|
140 |
</table>
|
141 |
|
142 |
+
## Limitations
|
143 |
+
|
144 |
+
We trained this model using a context length of 4k due to resource limitations and to maximize training speed.
|
145 |
+
However, the original model was trained with a context length of 8k, so an 8k context length could work well in downstream tasks.
|
146 |
|
147 |
## License
|
148 |
|