How to do batch inference?

#34
by abhijeet-ta - opened

Instead of a single inference at a time, I want to pass a batch of prompts and generate the text. How to achieve that?

You can use Aphrodite-engine and a script like this one
https://huggingface.co/datasets/adamo1139/misc/blob/main/localLLM-datasetCreation/batched2.py

There are other API engines that support batched inference too. Vllm, maybe even ollama.

Sign up or log in to comment