Inference Speed Benchmark and GPU memeory usage

by Yunxz - opened Nov 28, 2024

Nov 28, 2024

•

edited Nov 28, 2024

We tested the GPU memory usage and inference speed of the QwQ-32B-Preview model using the transformer and vLLM with EvalScope's speed benchmark tool. See Document

Reference:

EvalScope open-source address
Speed Benchmark tool usage instructions

Yunxz changed discussion title from Inference Speed Benchmark to Inference Speed Benchmark and GPU memeory usage Nov 28, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment