NexaAIDev
/

Qwen2-Audio-7B-GGUF

@@ -1,21 +1,32 @@
 # Qwen2-Audio
 <img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/ThcKJj7LcWCZPwN1So05f.png" alt="Example" style="width:700px;"/>
-Qwen2-Audio is a SOTA small-scale multimodal model that handles audio and text inputs, allowing you to have voice interactions without ASR modules. Qwen2-Audio supports English, Chinese, and major European languages,and also provides robust audio analysis for local use cases like:
 - Speaker identification and response
 - Speech translation and transcription
 - Mixed audio and noise detection
 - Music and sound analysis
-## We're bringing Qwen2-Audio to edge devices with Nexa SDK, offering various quantization options.
-- Voice Chat: Users can freely engage in voice interactions with Qwen2-Audio without text input.
-- Audio Analysis: Users can provide both audio and text instructions for analysis during the interaction.
 ### Demo
 <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/02XDwJe3bhZHYptor-b2_.mp4"></video>
-## How to Run Locally On-Device
 In the following, we demonstrate how to run Qwen2-Audio locally on your device.
@@ -25,33 +36,28 @@ In the following, we demonstrate how to run Qwen2-Audio locally on your device.
 > Nexa-SDK is a open-sourced, local on-device inference framework, supporting text generation, image generation, vision-language models (VLM), audio-language models, speech-to-text (ASR), and text-to-speech (TTS) capabilities. Installable via Python Package or Executable Installer.
-**Step 2: Then run the following code in your terminal to run with local streamlit UI**
-```bash
-nexa run qwen2audio -st
-```
-**or to use in terminal**:
 ```bash
 nexa run qwen2audio
 ```
-### Usage Instructions
 For terminal:
 1. Drag and drop your audio file into the terminal (or enter file path on Linux)
 2. Add text prompt to guide analysis or leave empty for direct voice input
-### System Requirements
-💻 **RAM Requirements**:
-- Default q4_K_M version requires 4.2GB of RAM
-- Check the RAM requirements table for different quantization versions
-🎵 **Audio Format**:
-- Optimal: 16kHz `.wav` format
-- Other formats and sample rates are supported with automatic conversion
 ## Use Cases
@@ -78,5 +84,8 @@ Results demonstrate that Qwen2-Audio significantly outperforms either previous S
 <img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/2vACK_gD_MAuZ7Hn4Yfiv.png" alt="Example" style="width:700px;"/>
-## Follow Nexa AI to run more models on-device
-[Website](https://nexa.ai/)

+---
+license: apache-2.0
+language:
+- en
+tags:
+- audio-text-to-text
+- chat
+- audio
+- GGUF
+---
 # Qwen2-Audio
 <img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/ThcKJj7LcWCZPwN1So05f.png" alt="Example" style="width:700px;"/>
+## We're bringing Qwen2-Audio to run locally on edge devices with Nexa-SDK, offering various GGUF quantization options.
+Qwen2-Audio is a SOTA small-scale multimodal model (AudioLM) that handles audio and text inputs, allowing you to have voice interactions without ASR modules. Qwen2-Audio supports English, Chinese, and major European languages,and provides voice chat and audio analysis capabilities for local use cases like:
 - Speaker identification and response
 - Speech translation and transcription
 - Mixed audio and noise detection
 - Music and sound analysis
 ### Demo
 <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/02XDwJe3bhZHYptor-b2_.mp4"></video>
+Learn more in our [blogs](https://nexa.ai/blogs)
+## How to Run Locally On Device
 In the following, we demonstrate how to run Qwen2-Audio locally on your device.
 > Nexa-SDK is a open-sourced, local on-device inference framework, supporting text generation, image generation, vision-language models (VLM), audio-language models, speech-to-text (ASR), and text-to-speech (TTS) capabilities. Installable via Python Package or Executable Installer.
+**Step 2: Then run the following code in your terminal**
 ```bash
 nexa run qwen2audio
 ```
+This will run default q4_K_M quantization.
 For terminal:
 1. Drag and drop your audio file into the terminal (or enter file path on Linux)
 2. Add text prompt to guide analysis or leave empty for direct voice input
+**or to use with local UI (streamlit)**:
+```bash
+nexa run qwen2audio -st
+```
+## Choose Quantizations for your device
+Run [different quantization versions here](https://nexa.ai/Qwen/Qwen2-Audio-7.8B-Instruct/gguf-q4_K_M/readme) and check RAM requirements in our list.
+> The default q4_K_M version requires 4.2GB of RAM.
 ## Use Cases
 <img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/2vACK_gD_MAuZ7Hn4Yfiv.png" alt="Example" style="width:700px;"/>
+## Blog
+Learn more in our [blogs](https://nexa.ai/blogs)
+## Join Community
+[Discord](https://discord.gg/nexa-ai) | [X(Twitter)](https://x.com/nexa_ai)