Why the embedding on CPU is far faster than GPU?
#10
by
ViGeng
- opened
Hello guys!
I am playing around with ViT inference speed. I have tested the cost time of embedding and encoding on CPU and GPU separately.
The results, out of my expectation, are:
batch_size | embedding device | encoder device | img_processor time | embedding time | encoder time | total time |
---|---|---|---|---|---|---|
1 | CPU | CPU | 4 | 1 | 71 | 73 |
1 | CPU | GPU | 3 | 1 | 164 | 166 |
1 | GPU | GPU | 4 | 330 | 5 | 349 |
16 | GPU | GPU | 47 | 319 | 7 | 326 |
16 | CPU | CPU | 54 | 8 | 961 | 970 |
time unit: ms
GPU model = RTX 3090Ti
CPU model = Intel i9-12900KF
Pretrained model weights = google/vit-base-patch16-224-in21k
I can understand that GPU is faster than CPU for encoding. But
- Why is it faster than the GPU for embedding since both embedding and encoding are some DL neural networks and do matrix multiplying operations?
- When I use CPU for embedding and GPU for encoding, I found I can save time for embedding but lose some time for encoding. I can also not explain why.
Here I attached my test class and you may need to use some .to(device)
to modify both the class and ViTModel to specify where the code is running:
import time
from PIL import Image
from transformers import ViTImageProcessor, ViTModel
class ObjectDetector:
def __init__(self, cuda_device='cuda:0'):
self.device = cuda_device
self.img_processor = ViTImageProcessor.from_pretrained('google/vit-base-patch16-224-in21k')
self.model = ViTModel.from_pretrained('google/vit-base-patch16-224-in21k').eval()
# self.model.embeddings = self.model.embeddings.to('cpu').eval()
# load single or multiple images
def extract(self, image):
before_time = time.time()
inputs = self.img_processor(images=image, return_tensors="pt")
after_img_processor = time.time()
outputs = self.model(**inputs)
after_model = time.time()
print(f"Time taken for image processor: {after_img_processor - before_time}")
print(f"Time taken for model: {after_model - after_img_processor}")
return outputs
def main():
detector = ObjectDetector()
images = [Image.open(f'/home/rowan/source/edge-apps/datasets/batch/{i}.jpg') for i in range(16) ]
outputs = detector.extract(images)
print(outputs.keys())
if __name__ == "__main__":
main()
Any comments and discussion will be appreciated!
Okay, finally I found:
- The first batch inference will appear slowly but the following batches will act as expected.