TrOCR deployment in production
#6
by
CristianJD
- opened
Hi, anyone know how to make the inferece time using TrOCR fastest as possible, i deploy it using docker in Openshift but it's too low, i already using ONNIX format but i can't do a quantizing , because is not implemented yet in Vision-Encoder-Decoder
Hi, I also tried using an ONNX format but was not met with much luck with making inference faster. Using an Nvidia A10, I can get inference down to ~120ms but that's still too slow for my use case. Did you happen to find anything since quantization does not seem to work for me neither?