Pretergeek
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -114,7 +114,7 @@ This model is [Openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0
|
|
114 |
|
115 |
The model was fine-tuned using [Rank-Stabilized LoRA](https://huggingface.co/blog/damjan-k/rslora) and the [LongAlpaca-12K](Yukang/LongAlpaca-12k) dataset. I hope to continue extending the context in future versions and then apply the same methods to my [upscaled versions of OpenChat-3.5](https://huggingface.co/collections/Pretergeek/openchat-35-0106-with-additional-layers-66a8d3262c7c3ebdd7783a29) that were created using Block Expansion instead of Depth UP Scaling.
|
116 |
|
117 |
-
After fine-tuning, the model was tested using passkey retrieval and achieved a score of 100%. Below you can also find the results of the Open LLM Leaderboard evaluations and I am a bit disappointed with those. The model ended up with a significant reduction in performance compared to the original model in all but one test (MUSR). I expected it to do better than the original model on MUSR since that test benefits from long context understanding but I didn't expect such a negative impact on the other tasks. Anyway, I will be addressing this
|
118 |
|
119 |
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
|
120 |
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Pretergeek__OpenChat-3.5-0106_32K-PoSE)
|
|
|
114 |
|
115 |
The model was fine-tuned using [Rank-Stabilized LoRA](https://huggingface.co/blog/damjan-k/rslora) and the [LongAlpaca-12K](Yukang/LongAlpaca-12k) dataset. I hope to continue extending the context in future versions and then apply the same methods to my [upscaled versions of OpenChat-3.5](https://huggingface.co/collections/Pretergeek/openchat-35-0106-with-additional-layers-66a8d3262c7c3ebdd7783a29) that were created using Block Expansion instead of Depth UP Scaling.
|
116 |
|
117 |
+
After fine-tuning, the model was tested using passkey retrieval and achieved a score of 100%. Below you can also find the results of the Open LLM Leaderboard evaluations and I am a bit disappointed with those. The model ended up with a significant reduction in performance compared to the original model in all but one test (MUSR). I expected it to do better than the original model on MUSR since that test benefits from long context understanding but I didn't expect such a negative impact on the other tasks. Anyway, I will be addressing this in a future version, probably by using a pre-training dataset for continuous pre-training instead of a fine-tuning dataset so that upstream task are less affected. I used the LongAlpaca-12K dataset because it is small and I have limited computational resources but I might have to try a larger dataset for the next attempt too. If you would like to help me, there are links on the top of the model card for my Patreon and Ko-Fi.
|
118 |
|
119 |
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
|
120 |
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Pretergeek__OpenChat-3.5-0106_32K-PoSE)
|