Model Overview
This model is a fine-tuned version of Microsoft's SpeechT5 text-to-speech model, adapted to handle technical terminology, abbreviations, and domain-specific jargon. It has been trained on a custom dataset containing 100 text entries that are highly focused on terms used in technical interviews and professional communication. The fine-tuning process ensures accurate pronunciation of technical terms, improving the quality of TTS outputs in scenarios requiring domain expertise.
Model Details
- Base Model: microsoft/speecht5_tts
- Language: Lithuanian (lt)
- License: MIT
- Dataset: Custom English texts, primarily focused on technical terminology commonly encountered in fields such as computer science, engineering, and software development.
Dataset
- Text Data: Contains 100 text entries, each including technical terms, abbreviations, and industry-specific vocabulary.The text length varies from short sentences to longer technical descriptions.
- Audio Data: Corresponding synthesized audio generated for each text entry.Audio is encoded in WAV format, sampled at 16 kHz, and designed for TTS applications.
Usage
from transformers import AutoTokenizer, AutoModelForSpeechT5
tokenizer = AutoTokenizer.from_pretrained("Arch10/SpeechT5_finetune_technical_terms")
model = AutoModelForSpeechT5.from_pretrained("Arch10/SpeechT5_finetune_technical_terms")
- Downloads last month
- 17
Model tree for Arch10/SpeechT5_finetune_technical_terms
Base model
microsoft/speecht5_tts