HelpingAI-TTS-v1 🎀πŸ”₯

Yo, what's good! Welcome to HelpingAI-TTS-v1, your go-to for next-level Text-to-Speech (TTS) that's all about personalization, vibes, and clarity. Whether you want your text to sound cheerful, emotional, or just like you're chatting with a friend, this model's got you covered. πŸ’―

πŸš€ What’s HelpingAI-TTS-v1?

HelpingAI-TTS-v1 is a beast when it comes to generating high-quality, customizable speech. It doesn’t just spit out generic text; it feels what you're saying and brings it to life with style. Add a description to your speech, like how fast or slow it should be, if it’s cheerful or serious, and BOOM β€” you got yourself the perfect audio output. 🎧

πŸ› οΈ How It Works: A Quick Rundown πŸ”₯

  1. Transcript: The text you want to speak. Keep it casual, formal, or whatever suits your vibe.
  2. Caption: Describes how you want the speech to sound. Want a fast-paced, hype vibe or a calm, soothing tone? Just say it. πŸ”₯

πŸ’‘ Features You’ll Love:

  • Expressive Speech: This isn’t just any TTS. You can describe the tone, speed, and vibe you want. Whether it's a peppy "Hey!" or a chill "What's up?", this model’s got your back.
  • Top-Notch Quality: Super clean audio. No static. Just pure, high-quality sound that makes your words pop.
  • Customizable Like Never Before: Play with emotions, tone, and even accents. It’s all about making it personal. 🌍

πŸ”§ Get Started: Installation πŸ”₯

Ready to vibe? Here’s how you set up HelpingAI-TTS-v1 in seconds:

pip install git+https://github.com/huggingface/parler-tts.git

πŸ–₯️ Usage: Let's Make Some Magic 🎀

Here’s the code that gets the job done. Super simple to use, just plug in your text and describe how you want it to sound. It’s like setting the mood for a movie.

import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf

# Choose your device (GPU or CPU)
device = "cuda:0" if torch.cuda.is_available() else "cpu"

# Load the model and tokenizers
model = ParlerTTSForConditionalGeneration.from_pretrained("HelpingAI/HelpingAI-TTS-v1").to(device)
tokenizer = AutoTokenizer.from_pretrained("HelpingAI/HelpingAI-TTS-v1")
description_tokenizer = AutoTokenizer.from_pretrained(model.config.text_encoder._name_or_path)

# Customize your inputs: text + description
prompt = "Hey, what's up? How’s it going?"
description = "A friendly, upbeat, and casual tone with a moderate speed. Speaker sounds confident and relaxed."

# Tokenize the inputs
input_ids = description_tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

# Generate the audio
generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()

# Save the audio to a file
sf.write("output.wav", audio_arr, model.config.sampling_rate)

This will create a super clean .wav file with the speech you asked for. πŸ”₯

🌍 Language Support: Speak Your Language

No matter where you're from, HelpingAI-TTS-v1 has you covered. Officially supporting 20+ languages and unofficial support for a few more. That’s global vibes right there. 🌏

  • Assamese
  • Bengali
  • Bodo
  • Dogri
  • Kannada
  • Malayalam
  • Marathi
  • Sanskrit
  • Nepali
  • English
  • Telugu
  • Hindi
  • Gujarati
  • Konkani
  • Maithili
  • Manipuri
  • Odia
  • Santali
  • Sindhi
  • Tamil
  • Urdu
  • Chhattisgarhi
  • Kashmiri
  • Punjabi

Powered by HelpingAI, where we blend emotional intelligence with tech. 🌟

Downloads last month
2,917
Safetensors
Model size
938M params
Tensor type
F32
Β·
Inference Examples
Inference API (serverless) has been turned off for this model.

Model tree for HelpingAI/HelpingAI-TTS-v1

Finetuned
(1)
this model