Instructions to use Undi95/MistralThinker-v1.1-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Undi95/MistralThinker-v1.1-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Undi95/MistralThinker-v1.1-GGUF",
	filename="MistralThinker-v1.1.q8_0.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Undi95/MistralThinker-v1.1-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Undi95/MistralThinker-v1.1-GGUF:Q8_0
# Run inference directly in the terminal:
llama-cli -hf Undi95/MistralThinker-v1.1-GGUF:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Undi95/MistralThinker-v1.1-GGUF:Q8_0
# Run inference directly in the terminal:
llama-cli -hf Undi95/MistralThinker-v1.1-GGUF:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Undi95/MistralThinker-v1.1-GGUF:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf Undi95/MistralThinker-v1.1-GGUF:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Undi95/MistralThinker-v1.1-GGUF:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Undi95/MistralThinker-v1.1-GGUF:Q8_0

Use Docker

docker model run hf.co/Undi95/MistralThinker-v1.1-GGUF:Q8_0

LM Studio
Jan
Ollama
How to use Undi95/MistralThinker-v1.1-GGUF with Ollama:
```
ollama run hf.co/Undi95/MistralThinker-v1.1-GGUF:Q8_0
```

Unsloth Studio new

How to use Undi95/MistralThinker-v1.1-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Undi95/MistralThinker-v1.1-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Undi95/MistralThinker-v1.1-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Undi95/MistralThinker-v1.1-GGUF to start chatting

Docker Model Runner
How to use Undi95/MistralThinker-v1.1-GGUF with Docker Model Runner:
```
docker model run hf.co/Undi95/MistralThinker-v1.1-GGUF:Q8_0
```

Lemonade

How to use Undi95/MistralThinker-v1.1-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Undi95/MistralThinker-v1.1-GGUF:Q8_0

Run and chat with the model

lemonade run user.MistralThinker-v1.1-GGUF-Q8_0

List all available models

lemonade list

MistralThinker Model Card

Please, read this: https://huggingface.co/Undi95/MistralThinker-v1.1/discussions/1
Prefill required for the Assistant: <think>\n

Model Description

Model Name: MistralThinker
Version: 1.1
Prompt Format: Mistral-V7

[SYSTEM_PROMPT]{system prompt}[/SYSTEM_PROMPT][INST]{user message}[/INST]{assistant response}</s>

This model is a specialized variant of Mistral-Small-24B-Base-2501, adapted using a DeepSeek R1 distillation process. It is primarily designed for roleplay (RP) and storywriting applications, focusing on character interactions, narrative generation, and creative storytelling. Approximately 40% of the training dataset consists of roleplay/storywriting/character card data, ensuring rich and contextually immersive outputs in these domains.

Model Sources

Base Model: Mistral-Small-24B-Base-2501
Fine-Tuning Approach: DeepSeek R1 process (focused on RP)
Dataset Size: The dataset used in training doubled since the last version, adding more neutral logs, training the Base model to stick more on my new format.

Intended Use

Primary Use Cases:
- Roleplay (RP): Engaging with users in fictional or scenario-based interactions.
- Storywriting: Generating narratives, character dialogues, and creative texts.
- Character Lore Generation: Serving as a resource to craft or expand on character backstories and interactions.
How To Use:
1. User-First Message: The first message in any interaction should come from the user, ensuring the model responds in a narrative or roleplay context guided by user input.
2. Contextual Information: User or assistant details can be placed either in the system prompt or the user's first message. A system prompt is not mandatory, but any contextual instructions or role descriptions can help set the stage.
3. DeepSeek-Style Interaction: The model can also be used purely as a DeepSeek distill without additional system prompts, providing flexible usage for direct storytelling or roleplay scenarios. The model still can be biased toward Roleplay data, and it is expected.

Training Data

DeepSeek R1 Thinking Process: The model inherits a refined chain-of-thought (thinking process) from DeepSeek R1, which places heavy emphasis on roleplay and narrative coherence.
Dataset Composition:
- 40%: RP/Storywriting/Character Cards
- 60%: Various curated data for broad language, math, logical, space... understanding
Data Scaling: The dataset size was doubled compared to previous iterations, which enhances the model’s creative and contextual capabilities.

Model Performance

Strengths:
- Storytelling & Roleplay: Rich in creative generation, character portrayal, and scenario building.
- Dialogue & Interaction: Capable of sustaining engaging and context-driven dialogues.
- Adaptability: Can be used with or without a system prompt to match a range of user preferences.
Limitations & Bias:
- Hallucination: It can generate fictitious information in the thinking process, but still end up with a succesfull reply.
- Thinking can be dismissed: Being a distillation of DeepSeek R1 is essence, this model, even trained on Base, could forget to add <think>\n in some scenario.

Ethical Considerations

Usage Recommendations

System Prompt (Optional):
You may provide a high-level system prompt detailing the scenario or the desired style of roleplay and storywriting.
Example: "You are a friendly fantasy innkeeper who greets travelers from distant lands."
User’s First Message:
- Must clearly state or imply the scenario or context if no system prompt is provided.
  Example: "Hello, I’m a wandering knight seeking shelter. Could you share a story about local legends?"
Roleplay & Storywriting Focus:
- Encourage the model to develop characters, backstories, and immersive dialogues.
- For more direct, unfiltered or freeform creativity, skip the system prompt.
- If you still want to have some "logs" from previous message before starting a conversation, put them in the first user message, or in the system prompt.
- You can put exemple message of the character you RP with in the system prompt, too.

Downloads last month: 151

GGUF

Model size

24B params

Architecture

llama

Hardware compatibility

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Undi95/MistralThinker-v1.1-GGUF

Base model

mistralai/Mistral-Small-24B-Base-2501

Quantized

(54)

this model