Speaker embeddings take a long time to generate

by ugotsoul - opened Jan 22

Discussion

ugotsoul

Jan 22

•

edited Jan 22

Hello,

I'm generated speaker embeddings using Modzilla common voice's dev set, which has about 16k samples for two locales I'm looking at (EN & CA). It takes about 5-6 hours on cpu to generate embeddings, compared to about an hour for ECAPA-TDNN. Is this normal for Resnet?

CPU: Intel(R) Xeon(R) Platinum 8253 CPU @ 2.20GHz with 64 cores

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment