Instructions to use MoYoYoTech/VoiceDialogue with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MoYoYoTech/VoiceDialogue with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-to-speech", model="MoYoYoTech/VoiceDialogue")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("MoYoYoTech/VoiceDialogue", dtype="auto")

llama-cpp-python

How to use MoYoYoTech/VoiceDialogue with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="MoYoYoTech/VoiceDialogue",
	filename="assets/models/llm/qwen/Qwen3-8B-Q6_K.gguf",
)

llm.create_chat_completion(
	messages = "\"The answer to the universe is 42\""
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use MoYoYoTech/VoiceDialogue with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
./llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
./build/bin/llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Use Docker

docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K

LM Studio
Jan
Ollama
How to use MoYoYoTech/VoiceDialogue with Ollama:
```
ollama run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
```

Unsloth Studio

How to use MoYoYoTech/VoiceDialogue with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

How to use MoYoYoTech/VoiceDialogue with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "MoYoYoTech/VoiceDialogue:Q6_K"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use MoYoYoTech/VoiceDialogue with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default MoYoYoTech/VoiceDialogue:Q6_K

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use MoYoYoTech/VoiceDialogue with Docker Model Runner:
```
docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
```

Lemonade

How to use MoYoYoTech/VoiceDialogue with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull MoYoYoTech/VoiceDialogue:Q6_K

Run and chat with the model

lemonade run user.VoiceDialogue-Q6_K

List all available models

lemonade list

VoiceDialogue - 智能语音对话系统

一个集成了语音识别(ASR)、大语言模型(LLM)和文本转语音(TTS)的实时语音对话系统

快速开始 • 文档导航 • 贡献指南

🎯 项目简介

VoiceDialogue 是一个基于 Python 的完整语音对话系统，实现了端到端的语音交互体验。系统采用模块化设计，具备实时、高精度、多角色的特点。

🖥️ 图形界面: 内置 Web 图形界面，浏览器即可使用（选音色、切语言、看实时字幕）
🎤 实时语音识别: 基于 Qwen3-ASR 的高精度中英文转录（自带标点，支持 52 种语言）
🤖 智能对话生成: 集成 Qwen3 等大语言模型
🔊 高质量语音合成: 支持多角色、多风格的语音输出
🌐 Web API 服务: 提供 HTTP 接口，方便集成
⚡ 低延迟处理: 优化的音频流处理管道

想要了解更多？请查看功能特性详解。

🚀 快速开始

最简单的方式：克隆仓库 → 安装依赖 → 启动 → 在浏览器打开图形界面，即可开始语音对话。目前仅支持 macOS（Apple Silicon）。

1. 克隆并安装

模型分两部分：

随仓库下载（约 12GB，Git LFS）：大语言模型、语音合成、参考音色等。

首次启动自动下载（约 4.4GB）：语音识别引擎 Qwen3-ASR，由程序在第一次运行时从 HuggingFace 拉取并缓存到 ~/.cache/huggingface，之后无需重复下载。

⚠️ **必须先安装 Git LFS**，否则克隆下来的模型只是几百字节的占位指针，应用无法启动。

# 1) 安装并初始化 Git LFS（只需一次）
brew install git-lfs        # 如未安装 Homebrew，见 https://git-lfs.com
git lfs install

# 2) 克隆项目（包含约 12GB 模型，体积较大，请耐心等待）
git clone https://huggingface.co/MoYoYoTech/VoiceDialogue
cd VoiceDialogue

# 3) 校验模型确实拉取成功（应显示 GB 级大小，而非 100+ 字节）
#    若显示很小，说明 Git LFS 未生效，执行：git lfs pull
ls -lh assets/models/llm/qwen/Qwen3-8B-Q6_K.gguf

# 4) 安装依赖（推荐使用 uv）
pip install uv
uv venv
source .venv/bin/activate

WHISPER_COREML=1 CMAKE_ARGS="-DGGML_METAL=on" uv sync

# 5) 安装额外依赖
uv pip install kokoro-onnx        # kokoro-onnx（英文 TTS）
uv pip install numpy==1.26.4      # 固定 numpy 版本

📖 需要更详细的步骤？请查阅安装指南，其中包含系统要求和常见问题。

2. 启动图形界面（推荐）

python main.py --mode api

启动后，在浏览器中打开：http://localhost:8000/app/

在界面中即可完成全部操作：

点击右下角 ⚙️ 设置，选择麦克风、回音消除、识别语言、音色，也可切换中 / 英界面语言；
点击 「开始对话」，即可与 AI 实时语音对话，字幕会实时显示。

首次启动较慢，属正常现象：程序会自动下载 Qwen3-ASR 模型（约 4.4GB，需联网，下载进度会打印在终端）并转换一次 TTS 权重格式。全部完成后才会就绪，整个过程约几分钟（取决于网速）；之后每次启动只需数十秒。若终端长时间停在下载步骤，请检查网络是否能访问 huggingface.co。

3. 命令行模式（CLI）

如果不需要图形界面，也可以直接在终端运行语音对话：

# 启动语音对话（默认中文）
python main.py

# 指定语言与音色
python main.py --language en --speaker Heart

# 列出可用音频输入设备（如外置麦克风阵列）
python main.py --list-audio-devices

# 指定输入设备
python main.py --input-device <设备索引>

详细使用方法请参考配置指南和 API 服务指南。

📚 文档导航

📖 安装指南: 详细的安装步骤和系统要求。
⚙️ 配置指南: 如何配置系统参数和高级选项。
🎭 功能特性: 深入了解项目的所有功能。
🌐 API 指南: 如何使用和集成 API 服务。
🏗️ 系统架构: 了解系统的内部工作原理。
📁 项目结构: 浏览项目代码和文件组织。
🛠️ 故障排除: 常见问题和解决方案。
🤝 贡献指南: 如何为项目做出贡献。

📄 许可证

本项目采用 MIT 许可证开源。

🙏 致谢

如果这个项目对您有帮助，请给我们一个 ⭐️!

Downloads last month: 83

GGUF

Model size

8B params

Architecture

qwen3

Hardware compatibility

6-bit