Decoding of 'mp3' failed
Hey, I'm trying to run the French to French example posted in the model card (see below) on a google colab free tier gpu
...
load dummy dataset and read soundfiles
ds = load_dataset("common_voice", "fr", split="test", streaming=True)
ds = ds.cast_column("audio", datasets.Audio(sampling_rate=16_000))
input_speech = next(iter(ds))["audio"]["array"]
It fails at the last step, giving me the error belowRuntimeError: Decoding of 'mp3' failed, probably because of streaming mode (
librosa cannot decode 'mp3' file-like objects, only path-like)
Has anybody seen this? am I missing a dependency or something?
Thank you
Hey! Not really sure why, but I think it is related to librosa. I just tried on a local computer and it works properly. But I had this bug on colab ....
I found the cause, the mp3 files corrupted. you should make validator mp3 files and seperated it.
Datasets got a big update to only use soundfile + librosa now, regardless of the file type: https://github.com/huggingface/datasets/pull/5573
There should be a better error message from soundfile.read