bge-reranker-v2-m3-Q4_K_M-GGUF
This model was converted to GGUF format from BAAI/bge-reranker-v2-m3 using llama.cpp via the ggml.ai's GGUF-my-repo space.
Refer to the original model card for more details on the model.
Model Information
- Base Model: BAAI/bge-reranker-v2-m3
 - Quantization: Q4_K_M
 - Format: GGUF (GPT-Generated Unified Format)
 - Converted with: llama.cpp
 
Quantization Details
This is a Q4_K_M quantization of the original model:
- F16: Full 16-bit floating point - highest quality, largest size
 - Q8_0: 8-bit quantization - high quality, good balance
 - Q4_K_M: 4-bit quantization with medium quality - smaller size, faster inference
 
Usage
This model can be used with llama.cpp and other GGUF-compatible inference engines.
# Example using llama.cpp
./llama-rerank -m bge-reranker-v2-m3-Q4_K_M.gguf
Model Files
| Quantization | Use Case | 
|---|---|
| F16 | Maximum quality, largest size | 
| Q8_0 | High quality, good balance of size/performance | 
| Q4_K_M | Good quality, smallest size, fastest inference | 
Citation
If you use this model, please cite the original model:
# See original model card for citation information
License
This model inherits the license from the original model. Please refer to the original model card for license details.
Acknowledgements
- Original model by the authors of BAAI/bge-reranker-v2-m3
 - GGUF conversion via llama.cpp by ggml.ai
 - Converted and uploaded by sinjab
 
- Downloads last month
 - 9
 
							Hardware compatibility
						Log In
								
								to view the estimation
4-bit
	Inference Providers
	NEW
	
	
	This model isn't deployed by any Inference Provider.
	๐
			
		Ask for provider support
Model tree for sinjab/bge-reranker-v2-m3-Q4_K_M-GGUF
Base model
BAAI/bge-reranker-v2-m3