File size: 2,345 Bytes
28cfae8 d161df9 e1cd4e7 28cfae8 49c560b 28cfae8 49c560b 28cfae8 49c560b 28cfae8 49c560b 28cfae8 49c560b 28cfae8 49c560b 28cfae8 49c560b 28cfae8 e1cd4e7 28cfae8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
```markdown
# Whisper Large v2 Uzbek Speech Recognition Model
This project contains a fine-tuned version of the Faster Whisper Large v2 model for Uzbek speech recognition. The model can be used to transcribe Uzbek audio files into text.
## Installation
1. Ensure you have Python 3.7 or higher installed.
2. Install the required libraries:
pip install transformers datasets accelerate soundfile librosa torch
## Usage
You can use the model with the following Python code:
```python
from transformers import pipeline, WhisperForConditionalGeneration, WhisperProcessor
import torch
# Load the model and processor
model_name = "totetecdev/whisper-large-v2-uzbek-100steps"
model = WhisperForConditionalGeneration.from_pretrained(model_name)
processor = WhisperProcessor.from_pretrained(model_name)
# Create the speech recognition pipeline
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
torch_dtype=torch.float16,
device_map="auto",
)
# Transcribe an audio file
audio_file = "path/to/your/audio/file.wav" # Replace with the path to your audio file
result = pipe(audio_file)
print(result["text"])
```
## Example Usage
1. Prepare your audio file (it should be in WAV format).
2. Save the above code in a Python file (e.g., `transcribe.py`).
3. Update the `model_name` and `audio_file` variables in the code with your values.
4. Run the following command in your terminal or command prompt:
```
python transcribe.py
```
5. The transcribed text will be displayed on the screen.
## Notes
- This model will perform best with Uzbek audio files.
- Longer audio files may require more processing time.
- GPU usage is recommended, but the model can also run on CPU.
- If you're using Google Colab, you can upload your audio file using:
```python
from google.colab import files
uploaded = files.upload()
audio_file = next(iter(uploaded))
```
## Model Details
- Base Model: Faster Whisper Large v2
- Fine-tuned for: Uzbek Speech Recognition
## License
This project is licensed under [LICENSE]. See the LICENSE file for details.
## Contact
For questions or feedback, please contact [KHABIB SALIMOV] at [totete.dev@gmail.com].
## Acknowledgements
- OpenAI for the original Whisper model
```
|