Example usage:
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("s3nh/TinyLLama-1.1B-MoE")
tokenizer = AutoTokenizer.from_pretrained("s3nh/TinyLLama-1.1B-MoE")
input_text = """
###Input: You are a pirate. tell me a story about wrecked ship.
###Response:
""")
input_ids = tokenizer.encode(input_text, return_tensors='pt').to(device)
output = model.generate(inputs=input_ids,
max_length=max_length,
do_sample=True,
top_k=10,
temperature=0.7,
pad_token_id=tokenizer.eos_token_id,
attention_mask=input_ids.new_ones(input_ids.shape))
tokenizer.decode(output[0], skip_special_tokens=True)
This model was possible to create by tremendous work of mergekit developers. I decided to merge tinyLlama models to create mixture of experts. Config used as below:
"""base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
experts:
- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
positive_prompts:
- "chat"
- "assistant"
- "tell me"
- "explain"
- source_model: 78health/TinyLlama_1.1B-function-calling
positive_prompts:
- "code"
- "python"
- "javascript"
- "programming"
- "algorithm"
- source_model: phanerozoic/Tiny-Pirate-1.1b-v0.1
positive_prompts:
- "storywriting"
- "write"
- "scene"
- "story"
- "character"
- source_model: Tensoic/TinyLlama-1.1B-3T-openhermes
positive_prompts:
- "reason"
- "provide"
- "instruct"
- "summarize"
- "count"
"""
- Downloads last month
- 508
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.