How to deploy it to get the fastest qps?

#10

by loovelj2 - opened Jan 4

Jan 4

•

I want to deploy this model now, and I want to ask what is the fastest way to deploy inference，Triton or TGI or ONNX? It is better to use docker

Shitao

Beijing Academy of Artificial Intelligence org Jan 5

Sorry, we have not conducted specific efficiency tests across different tools. You may refer to some materials available within open-source communities, e.g., https://github.com/huggingface/text-embeddings-inference

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment