Lumosia-MoE-4x10.7 / README.md
Steelskull's picture
Update README.md
7f9484a
|
raw
history blame
2.29 kB
metadata
license: apache-2.0
tags:
  - moe
  - merge
  - mergekit
  - lazymergekit
  - DopeorNope/SOLARC-M-10.7B
  - maywell/PiVoT-10.7B-Mistral-v0.2-RP
  - kyujinpy/Sakura-SOLAR-Instruct
  - jeonsworld/CarbonVillain-en-10.7B-v1

Lumosia-MoE-4x10.7

The name Lumosia was selected as its a MoE of Multiple SOLAR Merges so it really "Lights the way".... its 3am.

This is a very experimantal model. its a MoE of all good performing Solar models (based off of personal experiance not open leaderboard),

Why? Dunno whated to see what would happen

context is maybe 32k? waiting for GGUF to upload.

Template:

### System:

### USER:{prompt}

### Assistant:

Lumosia-MoE-4x10.7 is a Mixure of Experts (MoE) made with the following models:

Evals:

  • Pending

🧩 Configuration

yamlbase_model: DopeorNope/SOLARC-M-10.7B
gate_mode: hidden
dtype: bfloat16
experts:
  - source_model: DopeorNope/SOLARC-M-10.7B
    positive_prompts: [""]
  - source_model: maywell/PiVoT-10.7B-Mistral-v0.2-RP
    positive_prompts: [""]
  - source_model: kyujinpy/Sakura-SOLAR-Instruct
    positive_prompts: [""]
  - source_model: jeonsworld/CarbonVillain-en-10.7B-v1
    positive_prompts: [""]

💻 Usage

python
!pip install -qU transformers bitsandbytes accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "Steelskull/Lumosia-MoE-4x10.7"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
)

messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])