How to utilize the full memory of gpu for inference

by code-me-running - opened Jul 3

Jul 3

I want to do the inference on a long paragraph of text. I'm splitting it into sentences and doing the inference. Memory utilization of the model is up to 7 GB. I want to utilize the full memory and thereby increase and reduce the generation time. How to achieve this?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment