If deploy m3 and m3 raranker model on a same gpu service,will it improve the gpu efficiency in the concurrent request environment

#65
by seetimee - opened

what's the performance deference between Deploying One Model Per Machine vs. Two Models Per Machine

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment