Full Name's picture

Full Name

Gatozu35

·

AI & ML interests

Text-to-Speech, Voice Conversion

Recent Activity

liked a model about 7 hours ago

nvidia/stt_en_conformer_ctc_small

liked a model 1 day ago

2121-8/japanese-parler-tts-large-bate

liked a Space 2 days ago

styletts2/styletts2

Organizations

None yet

Gatozu35's activity

upvoted a collection 14 days ago

Cosmos Tokenizer

A suite of image and video tokenizers • 10 items • Updated 15 days ago • 18

upvoted a collection 20 days ago

Molmo

Artifacts for open multimodal language models. • 5 items • Updated 7 days ago • 271

upvoted a paper 20 days ago

Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis

Paper • 2410.23320 • Published 23 days ago • 6

upvoted a paper 23 days ago

Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization

Paper • 2403.12422 • Published Mar 19 • 1

upvoted a paper 25 days ago

Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models

Paper • 2410.11081 • Published Oct 14 • 18

upvoted a collection about 2 months ago

LAION Audio

9 items • Updated Sep 30 • 1

upvoted a paper 2 months ago

BigVGAN: A Universal Neural Vocoder with Large-Scale Training

Paper • 2206.04658 • Published Jun 9, 2022 • 2

upvoted a collection 2 months ago

Moshi v0.1 Release

MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated Sep 18 • 218

upvoted 3 papers 2 months ago

Parallelizing Linear Transformers with the Delta Rule over Sequence Length

Paper • 2406.06484 • Published Jun 10 • 3

Gated Linear Attention Transformers with Hardware-Efficient Training

Paper • 2312.06635 • Published Dec 11, 2023 • 6

Gated Slot Attention for Efficient Linear-Time Sequence Modeling

Paper • 2409.07146 • Published Sep 11 • 19

upvoted 3 papers 3 months ago

Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis

Paper • 2401.10460 • Published Jan 19 • 1

EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks

Paper • 2402.00892 • Published Jan 31 • 13

ByT5: Towards a token-free future with pre-trained byte-to-byte models

Paper • 2105.13626 • Published May 28, 2021 • 2

upvoted a collection 4 months ago

Parler-TTS: fully open-source high-quality TTS

If you want to find out more about how these models were trained and even fine-tune them yourself, check-out the Parler-TTS repository on GitHub. • 7 items • Updated Aug 8 • 46

upvoted an article 4 months ago

Article

Mixture of Depth is Vibe

By

•

Apr 22

• 44

upvoted 3 papers 4 months ago

Evaluating and reducing the distance between synthetic and real speech distributions

Paper • 2211.16049 • Published Nov 29, 2022 • 1

Autoregressive Speech Synthesis without Vector Quantization

Paper • 2407.08551 • Published Jul 11 • 14

MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions

Paper • 2407.06358 • Published Jul 8 • 18

upvoted a collection 5 months ago

Stable Diffusion 3

Stable Diffusion 3 and related models for text-to-image and image-to-image • 2 items • Updated Jun 12 • 90