File size: 1,476 Bytes
2da35dc 020af7d 2da35dc 020af7d 2da35dc 020af7d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
---
title: ttsdoc
emoji: π
colorFrom: yellow
colorTo: gray
sdk: gradio
sdk_version: 4.41.0
app_file: app.py
pinned: false
license: apache-2.0
---
# ttsdoc π
ttsdoc is a Text-to-Speech (TTS) application that can read your PDF documents aloud. It uses the Parler TTS Mini v1 model to generate high-quality audio from text inputs, including uploaded PDF files.
## Features
- π Support for PDF, TXT, and DOCX file uploads
- βοΈ Direct text input option
- π£οΈ Customizable voice descriptions
- β±οΈ Adjustable maximum audio duration
- π GPU-accelerated audio generation
## How to Use
1. Upload a PDF, TXT, or DOCX file or enter text directly.
2. Customize the voice description if desired.
3. Adjust the maximum audio duration.
4. Click "Generate Audio" to create the TTS output.
## Tips for Best Results
- For longer texts, the generator will create audio up to the specified maximum duration.
- Experiment with different voice descriptions to achieve the desired output.
- Use punctuation to control pacing and intonation in the generated speech.
- For optimal quality, try to keep individual sentences or paragraphs concise.
## Technical Details
- This demo uses the Parler TTS Mini v1 model.
- Audio generation is GPU-accelerated for faster processing.
- Maximum file size for uploads: 5MB
## License
This project is licensed under the Apache 2.0 License.
---
Powered by [Gradio](https://gradio.app) and [Hugging Face](https://huggingface.co)
|