Does anyone know how to tag the speaker with Whisper?
I tried the model for interview record, and it worked pretty well. The thing was that the output was a whole chunk of text and I have no idea about how to tag different speakers. I assume Whisper can distinguish different voices. Are there any easy ways to do that?
Hello. As far as I can see you need this?
You can also have a look at WhisperX: []
But no, "speaker diarization" (distinguishing speakers) is NOT a feature of the model Whisper, as it was not trained for this task.
BTW, I managed to tag the speakers for primary research interview record using the code here:
BTW, I managed to tag the speakers for primary research interview record using the code here:
speaker diarization is not possible through this model (any whisper model) you are using pyannote, that is a different thing. Also, you need to agree to their terms (or complete a form) before you can use it.