Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

πŸ€— Hugging Face | πŸ“– Github | πŸ“‘ Technical report

This is a safetensors conversion of gpt-omni/mini-omni.

Mini-Omni is an open-source multimodel large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Features

βœ… Real-time speech-to-speech conversational capabilities. No extra ASR or TTS models required.

βœ… Talking while thinking, with the ability to generate text and audio at the same time.

βœ… Streaming audio outupt capabilities.

βœ… With "Audio-to-Text" and "Audio-to-Audio" batch inference to further boost the performance.

NOTE: please refer to https://github.com/gpt-omni/mini-omni for more details.

Downloads last month
8
Safetensors
Model size
694M params
Tensor type
F32
Β·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for leafspark/mini-omni-safetensors

Base model

Qwen/Qwen2-0.5B
Finetuned
(61)
this model

Space using leafspark/mini-omni-safetensors 1