Files changed (1) hide show
  1. README.md +12 -1
README.md CHANGED
@@ -34,12 +34,23 @@ Currently optimized for English, French, and German.
34
 
35
  ## Using MLX
36
 
 
 
37
  ```bash
38
  pip install -U mlx-audio
39
- python -m mlx_audio.tts.generate --model Marvis-AI/marvis-tts-250m-v0.2 --stream \
40
  --text "Marvis TTS is a new text-to-speech model that provides fast streaming on edge devices."
41
  ```
42
 
 
 
 
 
 
 
 
 
 
43
  # Model Description
44
 
45
  Marvis is built on the [Sesame CSM-1B](https://huggingface.co/sesame/csm-1b) (Conversational Speech Model) architecture, a multimodal transformer that operates directly on Residual Vector Quantization (RVQ) tokens and uses [Kyutai's mimi codec](https://huggingface.co/kyutai/mimi). The architecture enables end-to-end training while maintaining low-latency generation and employs a dual-transformer approach:
 
34
 
35
  ## Using MLX
36
 
37
+ Real audio streaming:
38
+
39
  ```bash
40
  pip install -U mlx-audio
41
+ mlx_audio.tts.generate --model Marvis-AI/marvis-tts-250m-v0.2 --stream \
42
  --text "Marvis TTS is a new text-to-speech model that provides fast streaming on edge devices."
43
  ```
44
 
45
+ Voice cloning:
46
+
47
+ ```bash
48
+ mlx_audio.tts.generate --model Marvis-AI/marvis-tts-250m-v0.2 --stream \
49
+ --text "Marvis TTS is a new text-to-speech model that provides fast streaming on edge devices." --ref_audio ./conversational_a.wav
50
+ ```
51
+
52
+ You can pass any audio to clone the voice from or download sample audio file from [here](https://huggingface.co/mlx-community/csm-1b/tree/main/prompts).
53
+
54
  # Model Description
55
 
56
  Marvis is built on the [Sesame CSM-1B](https://huggingface.co/sesame/csm-1b) (Conversational Speech Model) architecture, a multimodal transformer that operates directly on Residual Vector Quantization (RVQ) tokens and uses [Kyutai's mimi codec](https://huggingface.co/kyutai/mimi). The architecture enables end-to-end training while maintaining low-latency generation and employs a dual-transformer approach: