Update README.md
Browse files
README.md
CHANGED
@@ -17,7 +17,7 @@ significant increase over the original length. While it was originally pre-train
|
|
17 |
coherently write and keep the same writing format (granted some caveats) up to 12K tokens relatively consistently.
|
18 |
|
19 |
Chronos-Divergence-33B is a one of a kind model which is based on the original [Chronos-33B](https://huggingface.co/elinas/chronos-33b) and now focuses on prompt adherence for *roleplay* and storywriting.
|
20 |
-
It was trained at 16,834 tokens and can go up to around 12,000 tokens before any deterioration without the use of RoPE techniques.
|
21 |
|
22 |
**The unique aspect of this model is that is has little to no "GPT-isms" or commonly referred to "slop" which are repetitive phrases many modern LLMs
|
23 |
output due to their pre-training and finetuning datasets. We completely cleaned our datasets and relied on the original "charm" of the L1 series and might bring this
|
@@ -28,8 +28,8 @@ RoPE or RULER has not been tested as we are satisfied with our results, we will
|
|
28 |
Next steps would be to implement GQA (Grouped Query Attention) to as the number of tokens you input increases, so will memory usage, and this technique has been shown to reduce
|
29 |
memory burden. This will require significant effort on our part (help welcome!) and we hope that quantizations will be sufficient in the meantime.
|
30 |
|
31 |
-
The datasets
|
32 |
-
have experienced before due to the modernization added to the model without the common phrases GPTs like to output today.
|
33 |
|
34 |
Without spoiling anything, the name of the model and presented character have meaning... Look up Steins;Gate if you are not familiar :)
|
35 |
|
@@ -81,4 +81,4 @@ Please be mindful of the license. This is strictly non-commercial (by Meta LLaMA
|
|
81 |
|
82 |
If you have any questions or concerns, please post in the community tab.
|
83 |
|
84 |
-
Outputs generated by the model are not reflective of our views.
|
|
|
17 |
coherently write and keep the same writing format (granted some caveats) up to 12K tokens relatively consistently.
|
18 |
|
19 |
Chronos-Divergence-33B is a one of a kind model which is based on the original [Chronos-33B](https://huggingface.co/elinas/chronos-33b) and now focuses on prompt adherence for *roleplay* and storywriting.
|
20 |
+
It was trained at 16,834 tokens and can go up to around 12,000 tokens before any deterioration without the use of RoPE or other model extending techniques.
|
21 |
|
22 |
**The unique aspect of this model is that is has little to no "GPT-isms" or commonly referred to "slop" which are repetitive phrases many modern LLMs
|
23 |
output due to their pre-training and finetuning datasets. We completely cleaned our datasets and relied on the original "charm" of the L1 series and might bring this
|
|
|
28 |
Next steps would be to implement GQA (Grouped Query Attention) to as the number of tokens you input increases, so will memory usage, and this technique has been shown to reduce
|
29 |
memory burden. This will require significant effort on our part (help welcome!) and we hope that quantizations will be sufficient in the meantime.
|
30 |
|
31 |
+
The datasets used do not have a planned release date, though it is less the data and more the technique that was able to make this "dated" model very special and unlike many of us
|
32 |
+
have experienced before due to the modernization added to the model without the common phrases GPTs like to output today, though making it uncensored as a result.
|
33 |
|
34 |
Without spoiling anything, the name of the model and presented character have meaning... Look up Steins;Gate if you are not familiar :)
|
35 |
|
|
|
81 |
|
82 |
If you have any questions or concerns, please post in the community tab.
|
83 |
|
84 |
+
DISCLAIMER: Outputs generated by the model are not reflective of our views.
|