Edit model card

QwenMoEAriel

QwenMoEAriel is a Mixture of Experts (MoE) made with the following models using LazyMergekit:

🧩 Configuration

base_model : Qwen/Qwen2-1.5B architecture: qwen experts:

  • source_model: Qwen/Qwen2-1.5B positive_prompts:
    • "chat"
    • "assistant"
    • "tell me"
    • "explain"
    • "I want"
  • source_model: Replete-AI/Replete-Coder-Qwen2-1.5b positive_prompts:
    • "code"
    • "python"
    • "javascript"
    • "programming"
    • "algorithm" shared_experts:
  • source_model: Qwen/Qwen2-1.5B positive_prompts: # required by Qwen MoE for "hidden" gate mode, otherwise not allowed
    • "chat"

    (optional, but recommended:)

    residual_scale: 0.1 # downweight output from shared expert to prevent overcooking the model

πŸ’» Usage

!pip install -qU transformers bitsandbytes accelerate einops
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)
model = AutoModelForCausalLM.from_pretrained(
    "femiari/Qwen2-1.5Moe",
    torch_dtype=torch.float16,
    ignore_mismatched_sizes=True
).to(device)
tokenizer = AutoTokenizer.from_pretrained("femiari/Qwen2-1.5Moe")

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(response)
Downloads last month
6
Safetensors
Model size
4.09B params
Tensor type
BF16
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for femiari/Qwen2-1.5Moe

Merge model
this model
Quantizations
1 model