is there any caching mechanism?
when I load the model into GPU memory, total consumed GPU memory is 1318MiB
I only changed the input image, did not change the input text.
after the 1st inference, total consumed GPU memory is 2766MiB
after the 2nd inference, total consumed GPU memory is 3424MiB
after the 3rd inference, total consumed GPU memory is 3424MiB
after the 4th inference, total consumed GPU memory is 3424MiB
after the 5th inference, total consumed GPU memory is 3424MiB
why do the 1st inference and 2nd inference increase GPU memory so much, I wonder if there is some caching mechanism inside the codes?
when I used torch.cuda.empty_cache() to release GPU memory after each inference is finished.
after the 1st inference, total consumed GPU memory is 1516MiB
after the 2nd inference, total consumed GPU memory is 1552MiB
after the 3rd inference, total consumed GPU memory is 1530MiB
after the 4th inference, total consumed GPU memory is 1516MiB
after the 5th inference, total consumed GPU memory is 1542MiB