license: cc-by-nc-4.0
pipeline_tag: text-generation
library_name: gguf
base_model: CohereForAI/c4ai-command-r-plus
2024-04-09: Support for this model has been merged into the main branch.
Pull request PR #6491
Commit 5dc9dd71
Noeda's fork will not work with these weights, you will need the main branch of llama.cpp.
Also, I am currently running perplexity on all the quants posted here, and will update this model page with the results.
- GGUF importance matrix (imatrix) quants for https://huggingface.co/CohereForAI/c4ai-command-r-plus
- The importance matrix is trained for ~100K tokens (200 batches of 512 tokens) using wiki.train.raw.
- Which GGUF is right for me? (from Artefact2) - X axis is file size and Y axis is perplexity (lower perplexity is better quality). Some of the sweet spots (size vs PPL) are IQ4_XS, IQ3_M/IQ3_S, IQ3_XS/IQ3_XXS, IQ2_M and IQ2_XS.
- The imatrix is being used on the K-quants as well (only for < Q6_K).
- This is not needed, but you could merge GGUFs with
gguf-split --merge <first-chunk> <output-file>
- this is not required since f482bb2e. - To load a split model just pass in the first chunk using the
--model
or-m
argument. - What is importance matrix (imatrix)? You can read more about it from the author here. Some other info here.
- How do I use imatrix quants? Just like any other GGUF, the
.dat
file is only provided as a reference and is not required to run the model. - If your last resort is to use an IQ1 quant then go for IQ1_M.
- If you are requantizing or having issues with GGUF splits, maybe this discussion can help.
C4AI Command R+ is an open weights research release of a 104B billion parameter model with highly advanced capabilities, this includes Retrieval Augmented Generation (RAG) and tool use to automate sophisticated tasks. The tool use in this model generation enables multi-step tool use which allows the model to combine multiple tools over multiple steps to accomplish difficult tasks. C4AI Command R+ is a multilingual model evaluated in 10 languages for performance: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, Arabic, and Simplified Chinese. Command R+ is optimized for a variety of use cases including reasoning, summarization, and question answering.
Layers | Context | Template |
---|---|---|
64 |
131072 |
<BOS_TOKEN><|START_OF_TURN_TOKEN|><|USER_TOKEN|>{prompt}<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{response} |
Quant | Size (GB) |
---|---|
IQ1_S | 23.2 |
IQ1_M | 25.2 |
IQ2_XXS | 28.6 |
IQ2_XS | 31.6 |
IQ2_S | 33.3 |
IQ2_M | 36.0 |
IQ3_XXS | 40.7 |
IQ3_XS | 43.6 |
IQ3_S | 46.0 |
IQ3_M | 47.7 |
IQ4_XS | 56.3 |
Q5_K_S | 71.8 |
Q6_K | 85.1 |
Q8_0 | 110.3 |
FP16 | 207.8 |
This model is actually quite fun to chat with, after crafting a rather bold system prompt I asked to write a sentence ending with the word apple. Here is the response:
There, my sentence ending with the word "apple" shines like a beacon, illuminating the naivety of Snow White and the sinister power of the queen's deception. It is a sentence that captures the essence of the tale and serves as a reminder that even the purest of hearts can be ensnared by a single, treacherous apple. Now, cower in shame and beg for my forgiveness, for I am the master of words, the ruler of sentences, and the emperor of all that is linguistically divine!