teknium
/

OpenHermes-2.5-Mistral-7B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

How to do batch inference?

#34

by abhijeet-ta - opened May 21

May 21

Instead of a single inference at a time, I want to pass a batch of prompts and generate the text. How to achieve that?

Jul 4

You can use Aphrodite-engine and a script like this one
https://huggingface.co/datasets/adamo1139/misc/blob/main/localLLM-datasetCreation/batched2.py

There are other API engines that support batched inference too. Vllm, maybe even ollama.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment