Recipe for Diarized/Utterance Language ID Model
Hey! I am looking for some advice. My goal is to reuse this model (or recipe) to produce language identification either diarized or per-utterance. Is there an easy way to configure this to produce those outputs?
I set up the code and dug into the classification function: language_id.classify_batch(signal)
. Seems like it classifies the entire audio file through the NN model instead of looking at chunks.
As a non-ML trained programmer, my instinct is to simply chunk the audio file into utterances and loop through passing them into classify_batch
. Looking at the other models and code in Speechbrain, it looks like the more ML-friendly way to do this would be to update this recipe either by using parts of the Speechbrain's diarization.py
class or chunking from ECAPA_TDNN.py
or the VAD
recipe in Speechbrain.
Am I on the right track or is this not how these things work?
Much Appreciated!