Performant Inference Frameworks

by nazrak - opened Sep 10, 2024

Sep 10, 2024

I can't think of a better model to ask this on that one developed by Nvidia!

Are there any more performant inference frameworks this model is compatible with (and if so, can they be added to the model card?) Specifically, is this compatible with any plug-and-play frameworks like HF's https://github.com/huggingface/text-embeddings-inference, or can it can be compiled via TensorRT-LLM?

nbroad

Sep 11, 2024

•

edited Sep 11, 2024

~~I think you can use vllm: https://docs.vllm.ai/en/stable/getting_started/examples/offline_inference_embedding.html~~

never mind it uses custom code. this won't work. I thought it was a generic mistral model

You could try their NIM: https://build.nvidia.com/nvidia/nv-embed-v1

nada5

NVIDIA org Sep 11, 2024

Hi, @nazrak . Thank you for asking the question. This specific model will not be supported by NIM due to non-commercial license. Instead, NIM supports the commercially available NVIDIA's embedding model in the following link: https://build.nvidia.com/explore/retrieval

nada5 changed discussion status to closed Sep 11, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment