Quantized Octo-planner: On-device Language Model for Planner-Action Agents Framework
This repo includes GGUF quantized models, for our Octo-planner model at NexaAIDev/octopus-planning
GGUF Quantization
To run the models, please download them to your local machine using either git clone or Hugging Face Hub
git clone https://huggingface.co/NexaAIDev/octo-planner-gguf
Run with llama.cpp (Recommended)
- Clone and compile:
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
# Compile the source code:
make
- Execute the Model:
Run the following command in the terminal:
./llama-cli -m ./path/to/octopus-planning-Q4_K_M.gguf -p "<|user|>Find my presentation for tomorrow's meeting, connect to the conference room projector via Bluetooth, increase the screen brightness, take a screenshot of the final summary slide, and email it to all participants<|end|><|assistant|>"
Run with Ollama
Since our models have not been uploaded to the Ollama server, please download the models and manually import them into Ollama by following these steps:
- Install Ollama on your local machine. You can also following the guide from Ollama GitHub repository
git clone https://github.com/ollama/ollama.git ollama
- Locate the local Ollama directory:
cd ollama
- Create a
Modelfile
in your directory
touch Modelfile
- In the Modelfile, include a
FROM
statement with the path to your local model, and the default parameters:
FROM ./path/to/octopus-planning-Q4_K_M.gguf
- Use the following command to add the model to Ollama:
ollama create octopus-planning-Q4_K_M -f Modelfile
- Verify that the model has been successfully imported:
ollama ls
- Run the mode
ollama run octopus-planning-Q4_K_M "<|user|>Find my presentation for tomorrow's meeting, connect to the conference room projector via Bluetooth, increase the screen brightness, take a screenshot of the final summary slide, and email it to all participants<|end|><|assistant|>"
Quantized GGUF Models Benchmark
Name | Quant method | Bits | Size | Use Cases |
---|---|---|---|---|
octopus-planning-Q2_K.gguf | Q2_K | 2 | 1.42 GB | fast but high loss, not recommended |
octopus-planning-Q3_K.gguf | Q3_K | 3 | 1.96 GB | extremely not recommended |
octopus-planning-Q3_K_S.gguf | Q3_K_S | 3 | 1.68 GB | extremely not recommended |
octopus-planning-Q3_K_M.gguf | Q3_K_M | 3 | 1.96 GB | moderate loss, not very recommended |
octopus-planning-Q3_K_L.gguf | Q3_K_L | 3 | 2.09 GB | not very recommended |
octopus-planning-Q4_0.gguf | Q4_0 | 4 | 2.18 GB | moderate speed, recommended |
octopus-planning-Q4_1.gguf | Q4_1 | 4 | 2.41 GB | moderate speed, recommended |
octopus-planning-Q4_K.gguf | Q4_K | 4 | 2.39 GB | moderate speed, recommended |
octopus-planning-Q4_K_S.gguf | Q4_K_S | 4 | 2.19 GB | fast and accurate, very recommended |
octopus-planning-Q4_K_M.gguf | Q4_K_M | 4 | 2.39 GB | fast, recommended |
octopus-planning-Q5_0.gguf | Q5_0 | 5 | 2.64 GB | fast, recommended |
octopus-planning-Q5_1.gguf | Q5_1 | 5 | 2.87 GB | very big, prefer Q4 |
octopus-planning-Q5_K.gguf | Q5_K | 5 | 2.82 GB | big, recommended |
octopus-planning-Q5_K_S.gguf | Q5_K_S | 5 | 2.64 GB | big, recommended |
octopus-planning-Q5_K_M.gguf | Q5_K_M | 5 | 2.82 GB | big, recommended |
octopus-planning-Q6_K.gguf | Q6_K | 6 | 3.14 GB | very big, not very recommended |
octopus-planning-Q8_0.gguf | Q8_0 | 8 | 4.06 GB | very big, not very recommended |
octopus-planning-F16.gguf | F16 | 16 | 7.64 GB | extremely big |
Quantized with llama.cpp
- Downloads last month
- 332
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.