Ryukijano commited on
Commit
0349e5b
1 Parent(s): 4e6d3fc

Create readme.md

Browse files

Made a few changes.

Files changed (1) hide show
  1. README.md +50 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - mozilla-foundation/common_voice_13_0
5
+ ---
6
+ Sure, here's a more detailed and visually appealing model card for your `whisper-small-dv` model:
7
+
8
+ ---
9
+
10
+ # Whisper Small DV Model
11
+
12
+ ![Model Banner](https://uploads-ssl.webflow.com/614c82ed388d53640613982e/63eb5ebedd3a9a738e22a03f_open%20ai%20whisper.jpg)
13
+
14
+ ## Model Description
15
+
16
+ The `whisper-small-dv` model is an advanced Automatic Speech Recognition (ASR) model, trained on the extensive [Mozilla Common Voice 13.0](https://commonvoice.mozilla.org/en/datasets) dataset. This model is capable of transcribing spoken language into written text with high accuracy, making it a valuable tool for a wide range of applications, from transcription services to voice assistants.
17
+
18
+ ## Training
19
+
20
+ The model was trained using the PyTorch framework and the Transformers library. Training metrics and visualizations can be viewed on TensorBoard.
21
+
22
+ ## Performance
23
+
24
+ The model's performance was evaluated on a held-out test set. The evaluation metrics and results can be found in the "Eval Results" section.
25
+
26
+ ## Usage
27
+
28
+ The model can be used for any ASR task. To use the model, you can load it using the Transformers library:
29
+
30
+ ```python
31
+ from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
32
+
33
+ # Load the model
34
+ model = Wav2Vec2ForCTC.from_pretrained("Ryukijano/whisper-small-dv")
35
+ processor = Wav2Vec2Processor.from_pretrained("Ryukijano/whisper-small-dv")
36
+
37
+ # Use the model for ASR
38
+ inputs = processor("path_to_audio_file", return_tensors="pt", padding=True)
39
+ logits = model(inputs.input_values).logits
40
+ predicted_ids = torch.argmax(logits, dim=-1)
41
+ transcription = processor.decode(predicted_ids[0])
42
+ ```
43
+
44
+ ## License
45
+
46
+ This model is released under the MIT license.
47
+
48
+ ---
49
+
50
+ P