---
library_name: mlx-audio
tags:
- mlx
- text-to-speech
- speech
- speech generation
- voice cloning
- tts
- mlx-audio
---

# mlx-community/LongCat-AudioDiT-1B-8bit

This model was converted to MLX format from [`meituan-longcat/LongCat-AudioDiT-1B`](https://huggingface.co/meituan-longcat/LongCat-AudioDiT-1B) using mlx-audio version **0.4.3**.

Refer to the [original model card](https://huggingface.co/meituan-longcat/LongCat-AudioDiT-1B) for more details on the model.

## Use with mlx-audio

```bash
pip install -U mlx-audio
```

## Usage

```python
from mlx_audio.tts.utils import load

model = load("mlx-community/LongCat-AudioDiT-1B-8bit")

result = next(model.generate("Hello, this is a test of AudioDiT."))
audio = result.audio  # mlx array, 24kHz
```

Play audio directly:

```python
from mlx_audio.tts.audio_player import AudioPlayer

player = AudioPlayer(sample_rate=24000)
result = next(model.generate("The quick brown fox jumps over the lazy dog."))
player.queue_audio(result.audio)
player.wait_for_drain()
player.stop()
```

## Voice Cloning

Clone any voice using a reference audio sample and its transcript. Use `guidance_method="apg"` for best voice cloning quality:

```python
result = next(model.generate(
    text="Today is warm turning to rain, with good air quality.",
    ref_audio="reference.wav",
    ref_text="Transcript of the reference audio.",
    guidance_method="apg",
    cfg_strength=4.0,
    steps=16,
))
```

## Zero-Shot Generation (Chinese)

```python
result = next(model.generate(
    text="今天晴暖转阴雨，空气质量优至良，空气相对湿度较低。",
    steps=16,
    cfg_strength=4.0,
))
```

## Generation Parameters

| Parameter | Default | Description |
|-----------|---------|-------------|
| `steps` | 16 | Euler ODE solver steps. Higher = better quality, slower |
| `cfg_strength` | 4.0 | Classifier-free guidance strength |
| `guidance_method` | `"cfg"` | `"cfg"` for TTS, `"apg"` for voice cloning |
| `seed` | 1024 | Random seed for reproducibility |
| `ref_audio` | `None` | Reference audio for voice cloning (24kHz) |
| `ref_text` | `None` | Transcript of the reference audio |

## CLI

```bash
# Zero-shot TTS
python -m mlx_audio.tts.generate \
  --model mlx-community/LongCat-AudioDiT-1B-8bit \
  --text "Hello, this is a test of AudioDiT." \
  --play

# Voice cloning
python -m mlx_audio.tts.generate \
  --model mlx-community/LongCat-AudioDiT-1B-8bit \
  --text "Today is warm turning to rain." \
  --ref_audio reference.wav \
  --ref_text "Transcript of the reference audio." \
  --play
```

## License

LongCat-AudioDiT weights and code are released under the [MIT License](https://github.com/meituan-longcat/LongCat-AudioDiT/blob/main/LICENSE).