Thank you for your model!

by rombodawg - opened Jan 9

Jan 9

I really appreciate this model!

Ive really been looking forward to it as I think its one step forward to advancing Mixtral, and Ai overall. I plan on using this to create the next version of my Open_Gpt4 series by merging this bagel model with mixtral-instruct. I hope the results are good!

NickyNicky

Jan 9

DPO COOL

jondurbin

Owner Jan 9

I plan on using this to create the next version of my Open_Gpt4 series by merging this bagel model with mixtral-instruct. I hope the results are good!

Awesome, let me know the results!

NickyNicky

Jan 9

Thank you for this model, I would like to know how you did fine tune since I understand that mixtral models are difficult to fine tune, thank you.

rombodawg

Jan 11

•

edited Jan 11

So far the model is pretty promising, ive been testing my gguf quant and although its still not as good as gpt-4, its getting closer to the quality. I want to say it might even be better than base mixtral-instruct but that only a guess based on some limited testing, more testing as well as the score on open llm leaderboard is required to validate that claim. You can find it here

https://huggingface.co/rombodawg/Open_Gpt4_8x7B_v0.2

And my gguf quant will be here by the end of today or early tomorrow depending on when it gets done uploading:

https://huggingface.co/rombodawg/Open_Gpt4_8x7B_v0.2_q8_0_gguf

jondurbin

Owner Jan 11

Thank you for this model, I would like to know how you did fine tune since I understand that mixtral models are difficult to fine tune, thank you.

I used my fork of qlora here: https://github.com/jondurbin/qlora, with the configuration you can find on weights and biases:
https://wandb.ai/jondurbin/bagel-8x7b-v0.2/runs/agxjjdso/overview?workspace=user-jondurbin

I used the latest main branch of transformers, but at the time these had not yet been merged, so I pulled them in manually:
https://github.com/huggingface/transformers/pull/28115
https://github.com/huggingface/transformers/pull/28256

I think now, if you build transformers from source using the latest main checkout, it should be somewhat fixed, although the mistral folks in discord did say the implementation is wrong so any fine tunes of mixtral are probably suboptimal right now (regardless of how good they may do, they should be better).

jondurbin

Owner Jan 11

So far the model is pretty promising, ive been testing my gguf quant and although its still not as good as gpt-4, its getting closer to the quality. I want to say it might even be better than base mixtral-instruct but that only a guess based on some limited testing, more testing as well as the score on open llm leaderboard is required to validate that claim. You can find it here

https://huggingface.co/rombodawg/Open_Gpt4_8x7B_v0.2

And my gguf quant will be here by the end of today or early tomorrow depending on when it gets done uploading:

https://huggingface.co/rombodawg/Open_Gpt4_8x7B_v0.2_q8_0_gguf

Awesome!

jhartman

Jan 15

@jondurbin Do you have plans to run a fine-tune on top of Mixtral-8x7B-Instruct? That model has already been instruction fine-tuned but there are likely many things in the bagel dataset it hasn't seen that would improve performance.

jondurbin

Owner Jan 15

@jondurbin Do you have plans to run a fine-tune on top of Mixtral-8x7B-Instruct? That model has already been instruction fine-tuned but there are likely many things in the bagel dataset it hasn't seen that would improve performance.

I may, but probably not until we can confirm the issues with the mixtral implementation in HF are fixed. It appears there are discrepancies, as hinted by the mistral folks in discord, but sadly they refuse to help correct or even identify the issue.

rombodawg

Jan 15

@jondurbin My recommendation as Ive stated in my write up is to basically forget mistralai and their models and start creating out own using my techniques and mergekit. Make your own base models like how i made mine, and train on top of those.

My model:
https://huggingface.co/rombodawg/Everyone-Coder-4x7b-Base

My write up:
https://docs.google.com/document/d/1_vOftBnrk9NRk5h10UqrfJ5CDih9KBKL61yvrZtVWPE/edit?usp=sharing

jhartman

Jan 16

@jondurbin Makes sense to wait. I do think it would be very interesting to see given how promising your bagel fine-tune on Yi was and the strength of this fine-tune as well. Too bad your hardware restricts you to LORA rather than a full fine-tune :(

MH1P

Jan 23

@jondurbin Thanks a lot!
Speaking of hardware, what do you use to finetune/run your models?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment