bge-reranker-v2-m3-Q4_K_M-GGUF

This model was converted to GGUF format from BAAI/bge-reranker-v2-m3 using llama.cpp via the ggml.ai's GGUF-my-repo space.

Refer to the original model card for more details on the model.

Model Information

This is a Q4_K_M quantization of the original model:

F16: Full 16-bit floating point - highest quality, largest size
Q8_0: 8-bit quantization - high quality, good balance
Q4_K_M: 4-bit quantization with medium quality - smaller size, faster inference

This model can be used with llama.cpp and other GGUF-compatible inference engines.

# Example using llama.cpp
./llama-rerank -m bge-reranker-v2-m3-Q4_K_M.gguf

Quantization	Use Case
F16	Maximum quality, largest size
Q8_0	High quality, good balance of size/performance
Q4_K_M	Good quality, smallest size, fastest inference

If you use this model, please cite the original model:

# See original model card for citation information

This model inherits the license from the original model. Please refer to the original model card for license details.

GGUF

Model size

0.6B params

Architecture

bert

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Quantized

(31)

this model