RuntimeError: Error(s) in loading state_dict for HuggingFaceWhisper:
size mismatch for _mel_filters: copying a param with shape torch.Size([80, 201]) from checkpoint, the shape in current model is torch.Size([201, 80]).
I just use the same code provided and get this error , when i transpode the mel_filters it load the model but when transcribing give matrix multiplication error :
File /opt/conda/lib/python3.10/site-packages/speechbrain/lobes/models/huggingface_whisper.py:247, in HuggingFaceWhisper._log_mel_spectrogram(self, audio)
244 magnitudes = stft[..., :-1].abs() ** 2
246 filters = self._mel_filters
--> 247 mel_spec = filters @ magnitudes
249 log_spec = torch.clamp(mel_spec, min=1e-10).log10()
250 log_spec = torch.maximum(
251 log_spec,
252 (log_spec.flatten(start_dim=1).max(dim=-1)[0] - 8.0)[:, None, None],
253 )
RuntimeError: mat1 and mat2 shapes cannot be multiplied (3000x201 and 80x201)
Hello,
which version of PyTorch are you using?
Hello,
which version of PyTorch are you using?
Hello torch.version '2.0.0+cu117'
It works when I reshape the feature_extractor.filter on speechbrain in /speechbrain/lobes/models/huggingface_whisper.py
but I don't know if this is a good way to fix it