Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,83 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Qwen2-Audio
|
2 |
+
|
3 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/ThcKJj7LcWCZPwN1So05f.png" alt="Example" style="width:700px;"/>
|
4 |
+
|
5 |
+
Qwen2-Audio is a SOTA small-scale multimodal model that handles audio and text inputs, allowing you to have voice interactions without ASR modules. Qwen2-Audio supports English, Chinese, and major European languages,and also provides robust audio analysis for local use cases like:
|
6 |
+
- Speaker identification and response
|
7 |
+
- Speech translation and transcription
|
8 |
+
- Mixed audio and noise detection
|
9 |
+
- Music and sound analysis
|
10 |
+
|
11 |
+
## We're bringing Qwen2-Audio to edge devices with Nexa SDK, offering various quantization options.
|
12 |
+
- Voice Chat: Users can freely engage in voice interactions with Qwen2-Audio without text input.
|
13 |
+
- Audio Analysis: Users can provide both audio and text instructions for analysis during the interaction.
|
14 |
+
### Demo
|
15 |
+
|
16 |
+
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/02XDwJe3bhZHYptor-b2_.mp4"></video>
|
17 |
+
|
18 |
+
## How to Run Locally On-Device
|
19 |
+
|
20 |
+
In the following, we demonstrate how to run Qwen2-Audio locally on your device.
|
21 |
+
|
22 |
+
**Step 1: Install Nexa-SDK (local on-device inference framework)**
|
23 |
+
|
24 |
+
[Install Nexa-SDK](https://github.com/NexaAI/nexa-sdk?tab=readme-ov-file#install-option-1-executable-installer)
|
25 |
+
|
26 |
+
> Nexa-SDK is a open-sourced, local on-device inference framework, supporting text generation, image generation, vision-language models (VLM), audio-language models, speech-to-text (ASR), and text-to-speech (TTS) capabilities. Installable via Python Package or Executable Installer.
|
27 |
+
|
28 |
+
**Step 2: Then run the following code in your terminal to run with local streamlit UI**
|
29 |
+
|
30 |
+
```bash
|
31 |
+
nexa run qwen2audio -st
|
32 |
+
```
|
33 |
+
|
34 |
+
**or to use in terminal**:
|
35 |
+
|
36 |
+
```bash
|
37 |
+
nexa run qwen2audio
|
38 |
+
```
|
39 |
+
|
40 |
+
### Usage Instructions
|
41 |
+
|
42 |
+
For terminal:
|
43 |
+
1. Drag and drop your audio file into the terminal (or enter file path on Linux)
|
44 |
+
2. Add text prompt to guide analysis or leave empty for direct voice input
|
45 |
+
|
46 |
+
### System Requirements
|
47 |
+
|
48 |
+
💻 **RAM Requirements**:
|
49 |
+
- Default q4_K_M version requires 4.2GB of RAM
|
50 |
+
- Check the RAM requirements table for different quantization versions
|
51 |
+
|
52 |
+
🎵 **Audio Format**:
|
53 |
+
- Optimal: 16kHz `.wav` format
|
54 |
+
- Other formats and sample rates are supported with automatic conversion
|
55 |
+
|
56 |
+
## Use Cases
|
57 |
+
|
58 |
+
### Voice Chat
|
59 |
+
- Answer daily questions
|
60 |
+
- Offer suggestions
|
61 |
+
- Speaker identification and response
|
62 |
+
- Speech translation
|
63 |
+
- Detecting background noise and responding accordingly
|
64 |
+
|
65 |
+
### Audio Analysis
|
66 |
+
- Information Extraction
|
67 |
+
- Audio summary
|
68 |
+
- Speech Transcription and Expansion
|
69 |
+
- Mixed audio and noise detection
|
70 |
+
- Music and sound analysis
|
71 |
+
|
72 |
+
## Performance Benchmark
|
73 |
+
|
74 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/lax8bLpR5uK2_Za0G6G3j.png" alt="Example" style="width:700px;"/>
|
75 |
+
|
76 |
+
Results demonstrate that Qwen2-Audio significantly outperforms either previous SOTAs or Qwen-Audio across all tasks.
|
77 |
+
|
78 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/2vACK_gD_MAuZ7Hn4Yfiv.png" alt="Example" style="width:700px;"/>
|
79 |
+
|
80 |
+
To learn more about Qwen2-Audio's capability, please refer to their [Blog], [GitHub], and [Report].
|
81 |
+
|
82 |
+
## Follow Nexa AI to run more models on-device
|
83 |
+
[Website](https://nexa.ai/)
|