NexaAIDev
/

OmniAudio-2.6B

Audio-Text-to-Text

Model card Files Files and versions Community

alanzhuly commited on 9 days ago

Commit

e3104df

•

1 Parent(s): 598a0e1

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -8,6 +8,8 @@ tags:
 - audio
 - GGUF
 ---
 # OmniAudio-2.6B
 OmniAudio is the world's fastest and most efficient audio-language model for on-device deployment - a 2.6B-parameter multimodal model that processes both text and audio inputs. It integrates three components: **Gemma-2-2b**, **Whisper turbo**, and a custom projector module, enabling secure, responsive audio-text processing directly on edge devices.
 Unlike traditional approaches that chain ASR and LLM models together, OmniAudio-2.6B unifies both capabilities in a single efficient architecture for minimal latency and resource overhead.

 - audio
 - GGUF
 ---
+<img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/d7Rzpm0cgCToXjtE7_U2u.png" alt="Example" style="width:400px;"/>
 # OmniAudio-2.6B
 OmniAudio is the world's fastest and most efficient audio-language model for on-device deployment - a 2.6B-parameter multimodal model that processes both text and audio inputs. It integrates three components: **Gemma-2-2b**, **Whisper turbo**, and a custom projector module, enabling secure, responsive audio-text processing directly on edge devices.
 Unlike traditional approaches that chain ASR and LLM models together, OmniAudio-2.6B unifies both capabilities in a single efficient architecture for minimal latency and resource overhead.