--- library_name: mlx-audio tags: - mlx - text-to-speech - speech - speech generation - voice cloning - tts - mlx-audio --- # mlx-community/LongCat-AudioDiT-1B-8bit This model was converted to MLX format from [`meituan-longcat/LongCat-AudioDiT-1B`](https://huggingface.co/meituan-longcat/LongCat-AudioDiT-1B) using mlx-audio version **0.4.3**. Refer to the [original model card](https://huggingface.co/meituan-longcat/LongCat-AudioDiT-1B) for more details on the model. ## Use with mlx-audio ```bash pip install -U mlx-audio ``` ## Usage ```python from mlx_audio.tts.utils import load model = load("mlx-community/LongCat-AudioDiT-1B-8bit") result = next(model.generate("Hello, this is a test of AudioDiT.")) audio = result.audio # mlx array, 24kHz ``` Play audio directly: ```python from mlx_audio.tts.audio_player import AudioPlayer player = AudioPlayer(sample_rate=24000) result = next(model.generate("The quick brown fox jumps over the lazy dog.")) player.queue_audio(result.audio) player.wait_for_drain() player.stop() ``` ## Voice Cloning Clone any voice using a reference audio sample and its transcript. Use `guidance_method="apg"` for best voice cloning quality: ```python result = next(model.generate( text="Today is warm turning to rain, with good air quality.", ref_audio="reference.wav", ref_text="Transcript of the reference audio.", guidance_method="apg", cfg_strength=4.0, steps=16, )) ``` ## Zero-Shot Generation (Chinese) ```python result = next(model.generate( text="今天晴暖转阴雨,空气质量优至良,空气相对湿度较低。", steps=16, cfg_strength=4.0, )) ``` ## Generation Parameters | Parameter | Default | Description | |-----------|---------|-------------| | `steps` | 16 | Euler ODE solver steps. Higher = better quality, slower | | `cfg_strength` | 4.0 | Classifier-free guidance strength | | `guidance_method` | `"cfg"` | `"cfg"` for TTS, `"apg"` for voice cloning | | `seed` | 1024 | Random seed for reproducibility | | `ref_audio` | `None` | Reference audio for voice cloning (24kHz) | | `ref_text` | `None` | Transcript of the reference audio | ## CLI ```bash # Zero-shot TTS python -m mlx_audio.tts.generate \ --model mlx-community/LongCat-AudioDiT-1B-8bit \ --text "Hello, this is a test of AudioDiT." \ --play # Voice cloning python -m mlx_audio.tts.generate \ --model mlx-community/LongCat-AudioDiT-1B-8bit \ --text "Today is warm turning to rain." \ --ref_audio reference.wav \ --ref_text "Transcript of the reference audio." \ --play ``` ## License LongCat-AudioDiT weights and code are released under the [MIT License](https://github.com/meituan-longcat/LongCat-AudioDiT/blob/main/LICENSE).