Whisper Small DV Model

Model Description

The whisper-small-dv model is an advanced Automatic Speech Recognition (ASR) model, trained on the extensive Mozilla Common Voice 13.0 dataset. This model is capable of transcribing spoken language into written text with high accuracy, making it a valuable tool for a wide range of applications, from transcription services to voice assistants.

Training

The model was trained using the PyTorch framework and the Transformers library. Training metrics and visualizations can be viewed on TensorBoard.

Performance

The model's performance was evaluated on a held-out test set. The evaluation metrics and results can be found in the "Eval Results" section.

Usage

The model can be used for any ASR task. To use the model, you can load it using the Transformers library:

from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

# Load the model
model = Wav2Vec2ForCTC.from_pretrained("Ryukijano/whisper-small-dv")
processor = Wav2Vec2Processor.from_pretrained("Ryukijano/whisper-small-dv")

# Use the model for ASR
inputs = processor("path_to_audio_file", return_tensors="pt", padding=True)
logits = model(inputs.input_values).logits
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.decode(predicted_ids[0])

License

This model is released under the MIT license.

Ryukijano
/

whisper-small-dv