This is a MoE-ification of TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T using the Mixtral branch of mergekit
The Goal was to MoE-fy the TinyLlama model and then use this as a base model to further train from. The intuition being finetuning 8x1b should give better performance than finetuning 1b by itself.
More work coming!
Inference Template
This is a merge of the base model, so treat it like a completion.
llm.generate('Quantum Tunneling is')
Mergekit Config
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
gate_mode: hidden
dtype: bfloat16
experts:
- source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
positive_prompts: [""]
- source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
positive_prompts: [""]
- source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
positive_prompts: [""]
- source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
positive_prompts: [""]
- source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
positive_prompts: [""]
- source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
positive_prompts: [""]
- source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
positive_prompts: [""]
- source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
positive_prompts: [""]
Eval
Thanks to u/mhenrichsen for thr HellaSwag score
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|---------|-------|------|-----:|--------|-----:|---|-----:|
|hellaswag|Yaml |none | 0|acc |0.4659|± |0.0050|
| | |none | 0|acc\_norm|0.6044|± |0.0049|
- Downloads last month
- 6
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.