Built with Axolotl

QLoRA tuned from mistralai/Mixtral-8x7B-v0.1.

My main reason for training this model was to investigate using an altered router balancing loss combined with the z-loss introduced in ST-MoE: Designing Stable and Transferable Sparse Expert Models. The result is pretty decent, I think! It does a good job of respecting character information in system prompts and performed adequately on a few simple coding tasks.

To train this I used a custom branch of Transformers that adds z-loss and reimplements the router balancing loss based on the version in MegaBlocks. The config used with my custom hacked-up branch of axolotl is available here.

Uses my favorite non-ChatML token-economic chat prompt format. Messages should be prefixed with " ***System:", " ***Query:", or " ***Response:" for system, user, and model messages respectively. No newlines are necessary but the space before the triple asterisk is mandatory.

Downloads last month
12
Safetensors
Model size
46.7B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for chargoddard/MixtralRPChat-ZLoss

Finetuned
(58)
this model
Quantizations
3 models

Datasets used to train chargoddard/MixtralRPChat-ZLoss