Inference Speed Benchmark and GPU memeory usage

#8
by Yunxz - opened

We tested the GPU memory usage and inference speed of the QwQ-32B-Preview model using the transformer and vLLM with EvalScope's speed benchmark tool. See Document

Reference:

Yunxz changed discussion title from Inference Speed Benchmark to Inference Speed Benchmark and GPU memeory usage
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment