Performant Inference Frameworks
I can't think of a better model to ask this on that one developed by Nvidia!
Are there any more performant inference frameworks this model is compatible with (and if so, can they be added to the model card?) Specifically, is this compatible with any plug-and-play frameworks like HF's https://github.com/huggingface/text-embeddings-inference, or can it can be compiled via TensorRT-LLM?
I think you can use vllm: https://docs.vllm.ai/en/stable/getting_started/examples/offline_inference_embedding.html
never mind it uses custom code. this won't work. I thought it was a generic mistral model
You could try their NIM: https://build.nvidia.com/nvidia/nv-embed-v1
Hi, @nazrak . Thank you for asking the question. This specific model will not be supported by NIM due to non-commercial license. Instead, NIM supports the commercially available NVIDIA's embedding model in the following link: https://build.nvidia.com/explore/retrieval