Error from speechbrain.pretrained import ASRCNNTransducer
Hello, when i try to execute --> from speechbrain.pretrained import ASRCNNTransducer i gave an error
驴could you help me, please?
Hello
@Joan1949
Can you share the exact error ?
Hi, the error is this:
ModuleNotFoundError Traceback (most recent call last)
in <cell line: 9>()
7 from torch.utils.data import DataLoader
8 import speechbrain as sb
----> 9 from speechbrain.pretrained import ASRCNNTransducer
10
11 # Configuraci贸n de hiperpar谩metros
ModuleNotFoundError: No module named 'speechbrain.pretrained'
Do pip install speechbrain==0.5.26
Hi, now i have this error:
ERROR: Could not find a version that satisfies the requirement speechbrain==0.5.26 (from versions: 0.5.4, 0.5.5, 0.5.6, 0.5.7, 0.5.8, 0.5.9, 0.5.10, 0.5.11, 0.5.12, 0.5.13, 0.5.14, 0.5.15, 0.5.16, 1.0.0)
ERROR: No matching distribution found for speechbrain==0.5.26
This is my code:
import os
import torch
from torch import optim
from speechbrain.pretrained import ASRCNNTransducer
from speechbrain.tokenizers.SentencePiece import SentencePiece
from speechbrain.dataio.batch import PaddedBatch
from torch.utils.data import DataLoader
Configuraci贸n de hiperpar谩metros
learning_rate = 1e-4
num_epochs = 10
batch_size = 8
model_checkpoint = "speechbrain/asr-crdnn-commonvoice-14-es"
dataset_folder = "dataset" # Carpeta que contiene todos los archivos
Cargar el modelo pre-entrenado
asr_model = ASRCNNTransducer.from_hparams(source=model_checkpoint, savedir="pretrained_model")
Optimizador
optimizer = optim.Adam(asr_model.parameters(), lr=learning_rate)
Funci贸n de p茅rdida (puedes ajustar seg煤n tus necesidades)
criterion = torch.nn.CTCLoss(blank=asr_model.tokenizer.tokenizer.pad_id, reduction='mean')
Funci贸n para cargar los archivos de texto y audio
def load_data(folder):
audio_files = []
text_data = {}
for filename in os.listdir(folder):
if filename.endswith(".wav"):
audio_files.append(os.path.join(folder, filename))
elif filename.endswith(".txt"):
with open(os.path.join(folder, filename), "r", encoding="utf-8") as file:
text = file.read().strip()
basename = os.path.splitext(filename)[0]
text_data[basename] = text
return audio_files, text_data
Cargar archivos de texto y audio
audio_files, text_data = load_data(dataset_folder)
Combinar audio y texto
dataset = [(audio_file, text_data[os.path.splitext(os.path.basename(audio_file))[0]]) for audio_file in audio_files]
DataLoader
dataloader = DataLoader(dataset, batch_size=batch_size, collate_fn=PaddedBatch)
Entrenamiento del modelo
for epoch in range(num_epochs):
asr_model.train()
total_loss = 0.0
for audio_paths, transcriptions in dataloader:
# Aqu铆 deber铆as implementar la l贸gica para cargar los archivos de audio y texto,
# y luego utilizarlos para el entrenamiento del modelo
# Esto incluir谩 la lectura del audio, conversi贸n a caracter铆sticas de entrada del modelo, etc.
optimizer.zero_grad()
logits = asr_model(inputs)
logits = logits.transpose(1, 0) # Transponer logits para que coincidan con la forma esperada por CTCLoss
loss = criterion(logits, targets, input_lens, target_lens)
loss.backward()
optimizer.step()
total_loss += loss.item()
print(f"Epoch {epoch+1}, Loss: {total_loss}")
Guardar el modelo entrenado
torch.save(asr_model.state_dict(), "trained_model.pth")
Sorry my bad it is speechbrain=0.5.16