|
```markdown |
|
# Whisper Large v2 Uzbek Speech Recognition Model |
|
|
|
This project contains a fine-tuned version of the Faster Whisper Large v2 model for Uzbek speech recognition. The model can be used to transcribe Uzbek audio files into text. |
|
|
|
## Installation |
|
|
|
1. Ensure you have Python 3.7 or higher installed. |
|
|
|
2. Install the required libraries: |
|
|
|
|
|
pip install transformers datasets accelerate soundfile librosa torch |
|
|
|
|
|
## Usage |
|
|
|
You can use the model with the following Python code: |
|
|
|
```python |
|
from transformers import pipeline, WhisperForConditionalGeneration, WhisperProcessor |
|
import torch |
|
|
|
# Load the model and processor |
|
model_name = "totetecdev/whisper-large-v2-uzbek-100steps" |
|
model = WhisperForConditionalGeneration.from_pretrained(model_name) |
|
processor = WhisperProcessor.from_pretrained(model_name) |
|
|
|
# Create the speech recognition pipeline |
|
pipe = pipeline( |
|
"automatic-speech-recognition", |
|
model=model, |
|
tokenizer=processor.tokenizer, |
|
feature_extractor=processor.feature_extractor, |
|
torch_dtype=torch.float16, |
|
device_map="auto", |
|
) |
|
|
|
# Transcribe an audio file |
|
audio_file = "path/to/your/audio/file.wav" # Replace with the path to your audio file |
|
result = pipe(audio_file) |
|
|
|
print(result["text"]) |
|
``` |
|
|
|
## Example Usage |
|
|
|
1. Prepare your audio file (it should be in WAV format). |
|
2. Save the above code in a Python file (e.g., `transcribe.py`). |
|
3. Update the `model_name` and `audio_file` variables in the code with your values. |
|
4. Run the following command in your terminal or command prompt: |
|
|
|
``` |
|
python transcribe.py |
|
``` |
|
|
|
5. The transcribed text will be displayed on the screen. |
|
|
|
## Notes |
|
|
|
- This model will perform best with Uzbek audio files. |
|
- Longer audio files may require more processing time. |
|
- GPU usage is recommended, but the model can also run on CPU. |
|
- If you're using Google Colab, you can upload your audio file using: |
|
|
|
```python |
|
from google.colab import files |
|
uploaded = files.upload() |
|
audio_file = next(iter(uploaded)) |
|
``` |
|
|
|
## Model Details |
|
|
|
- Base Model: Faster Whisper Large v2 |
|
- Fine-tuned for: Uzbek Speech Recognition |
|
|
|
## License |
|
|
|
This project is licensed under [LICENSE]. See the LICENSE file for details. |
|
|
|
## Contact |
|
|
|
For questions or feedback, please contact [KHABIB SALIMOV] at [totetec.dev@gmail.com]. |
|
|
|
## Acknowledgements |
|
|
|
- OpenAI for the original Whisper model |
|
|
|
``` |
|
|