File size: 1,476 Bytes
2da35dc
020af7d
 
 
 
2da35dc
 
 
 
 
 
020af7d
2da35dc
020af7d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
---
title: ttsdoc
emoji: πŸŒ–
colorFrom: yellow
colorTo: gray
sdk: gradio
sdk_version: 4.41.0
app_file: app.py
pinned: false
license: apache-2.0
---
# ttsdoc πŸŒ–

ttsdoc is a Text-to-Speech (TTS) application that can read your PDF documents aloud. It uses the Parler TTS Mini v1 model to generate high-quality audio from text inputs, including uploaded PDF files.

## Features

- πŸ“„ Support for PDF, TXT, and DOCX file uploads
- ✍️ Direct text input option
- πŸ—£οΈ Customizable voice descriptions
- ⏱️ Adjustable maximum audio duration
- πŸš€ GPU-accelerated audio generation

## How to Use

1. Upload a PDF, TXT, or DOCX file or enter text directly.
2. Customize the voice description if desired.
3. Adjust the maximum audio duration.
4. Click "Generate Audio" to create the TTS output.

## Tips for Best Results

- For longer texts, the generator will create audio up to the specified maximum duration.
- Experiment with different voice descriptions to achieve the desired output.
- Use punctuation to control pacing and intonation in the generated speech.
- For optimal quality, try to keep individual sentences or paragraphs concise.

## Technical Details

- This demo uses the Parler TTS Mini v1 model.
- Audio generation is GPU-accelerated for faster processing.
- Maximum file size for uploads: 5MB

## License

This project is licensed under the Apache 2.0 License.

---

Powered by [Gradio](https://gradio.app) and [Hugging Face](https://huggingface.co)