pcuenq
/

wav2vec2-large-xlsr-53-es

@@ -121,8 +121,8 @@ def replace_additional(batch):
 import librosa
 def speech_file_to_array_fn(batch):
-    speech_array, _ = torchaudio.load(batch["path"])
-    batch["speech"] = librosa.resample(speech_array.squeeze().numpy(), 48_000, 16_000)
     return batch
 # One-pass mapping function
@@ -217,5 +217,7 @@ I had previously used the `transformers` library as an end user, just to try Ber
 * The WER metric crashed on large datasets. I evaluated on a small sample (also, it's faster) and wrote an accumulative version of wer that runs on fixed memory. I'd like to verify whether this change makes sense to be used inside the training loop.
 * When using `num_proc` inside a notebook, I could not see progress bars. This is surely some permissions issue in my computer. I still need to find it out.

 import librosa
 def speech_file_to_array_fn(batch):
+    speech_array, sample_rate = torchaudio.load(batch["path"])
+    batch["speech"] = librosa.resample(speech_array.squeeze().numpy(), sample_rate, 16_000)
     return batch
 # One-pass mapping function
 * The WER metric crashed on large datasets. I evaluated on a small sample (also, it's faster) and wrote an accumulative version of wer that runs on fixed memory. I'd like to verify whether this change makes sense to be used inside the training loop.
+* `torchaudio` deadlocks when using multiple processes. `librosa` works fine. To be investigated.
 * When using `num_proc` inside a notebook, I could not see progress bars. This is surely some permissions issue in my computer. I still need to find it out.