How to utilize the full memory of gpu for inference
#6
by
code-me-running
- opened
I want to do the inference on a long paragraph of text. I'm splitting it into sentences and doing the inference. Memory utilization of the model is up to 7 GB. I want to utilize the full memory and thereby increase and reduce the generation time. How to achieve this?