Buy me a Ko-Fi • Support my work using Patreon

OpenChat-3.5-0106_32K-PoSE

Description

This model is Openchat-3.5-0106 with the context length extended from 8192 tokens to 32768 tokens using PoSE.

The model was fine-tuned using Rank-Stabilized LoRA and the LongAlpaca-12K dataset. I hope to continue extending the context in future versions and then apply the same methods to my upscaled versions of OpenChat-3.5 that were created using Block Expansion instead of Depth UP Scaling.

After fine-tuning, the model was tested using passkey retrieval and achieved a score of 100%. Below you can also find the results of the Open LLM Leaderboard evaluations and I am a bit disappointed with those. The model ended up with a significant reduction in performance compared to the original model in all but one test (MUSR). I expected it to do better than the original model on MUSR since that test benefits from long context understanding but I didn't expect such a negative impact on the other tasks. Anyway, I will be addressing this in a future version, probably by using a pre-training dataset for continuous pre-training instead of a fine-tuning dataset so that upstream task are less affected. I used the LongAlpaca-12K dataset because it is small and I have limited computational resources but I might have to try a larger dataset for the next attempt too. If you would like to help me, there are links on the top of the model card for my Patreon and Ko-Fi.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 12.70
IFEval (0-Shot) 39.69
BBH (3-Shot) 8.83
MATH Lvl 5 (4-Shot) 1.44
GPQA (0-shot) 3.47
MuSR (0-shot) 11.33
MMLU-PRO (5-shot) 11.46

Citation

@misc{zhu2024poseefficientcontextwindow,
      title={PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training}, 
      author={Dawei Zhu and Nan Yang and Liang Wang and Yifan Song and Wenhao Wu and Furu Wei and Sujian Li},
      year={2024},
      eprint={2309.10400},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2309.10400}, 
}
Downloads last month
9
Safetensors
Model size
7.24B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for Pretergeek/OpenChat-3.5-0106_32K-PoSE

Finetuned
(31)
this model
Quantizations
4 models

Dataset used to train Pretergeek/OpenChat-3.5-0106_32K-PoSE

Collection including Pretergeek/OpenChat-3.5-0106_32K-PoSE

Evaluation results