How did this model come to be?
I dont quite get what this model is about. Is it just a 8-bit version of starcoderplus? Did you train it more? Its clearly smaller in terms of file size by about half so at the very least its quantized but your readme doesnt really explain the main difference between this and the original starcoderplus. I think you should update it and add some more information
After looking a second time i did see this
"OpenChat is a series of open-source language models fine-tuned on a diverse and high-quality dataset of multi-round conversations. With only ~6K GPT-4 conversations filtered from the ~90K ShareGPT conversations, OpenChat is designed to achieve high performance with limited data."
However it doesnt explain how the model is smaller in size. I would at least add what level of quantization you guys did to make the model smaller
Raw checkpoints are in float32. This version uses bfloat16 for training, so it is 1/2 smaller in size. This is a full weight fine-tuned version.
Oh ok thank you
Hello,
I'm trying to use the model on my PC with a total of 32G of RAM (with about 16G of GPU). The weights are loaded and shared between GPU and CPU. The inference takes 200min on average. Is this normal?
Also, I try a lot of prompts to use the model like a cat that takes a cobol code and sends me back what the code does. But I get back my entire prompt + the answer and not just the answer. Finally, although the answer is always on the right track, it's not always complete because it's long and too detailed. You have to increase the "max_new_token" a lot to get a complete answer and this increases the inference time. How can I correct these problems? Thank you very much.