NexaAIDev
/

OmniAudio-2.6B

Audio-Text-to-Text

Model card Files Files and versions Community

alanzhuly commited on 8 days ago

Commit

d1499cf

•

1 Parent(s): fcdc671

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -8,9 +8,10 @@ tags:
 - audio
 - GGUF
 ---
-<img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/d7Rzpm0cgCToXjtE7_U2u.png" alt="Example" style="width:200px;"/>
 # OmniAudio-2.6B
 OmniAudio is the world's fastest and most efficient audio-language model for on-device deployment - a 2.6B-parameter multimodal model that processes both text and audio inputs. It integrates three components: Gemma-2-2b, Whisper turbo, and a custom projector module, enabling secure, responsive audio-text processing directly on edge devices.
 Unlike traditional approaches that chain ASR and LLM models together, OmniAudio-2.6B unifies both capabilities in a single efficient architecture for minimal latency and resource overhead.

 - audio
 - GGUF
 ---
 # OmniAudio-2.6B
+<img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/d7Rzpm0cgCToXjtE7_U2u.png" alt="Example" style="width:100px;"/>
 OmniAudio is the world's fastest and most efficient audio-language model for on-device deployment - a 2.6B-parameter multimodal model that processes both text and audio inputs. It integrates three components: Gemma-2-2b, Whisper turbo, and a custom projector module, enabling secure, responsive audio-text processing directly on edge devices.
 Unlike traditional approaches that chain ASR and LLM models together, OmniAudio-2.6B unifies both capabilities in a single efficient architecture for minimal latency and resource overhead.