alanzhuly commited on
Commit
b695973
1 Parent(s): 9711fab

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -0
README.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Qwen2-Audio
2
+
3
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/ThcKJj7LcWCZPwN1So05f.png" alt="Example" style="width:700px;"/>
4
+
5
+ Qwen2-Audio is a SOTA small-scale multimodal model that handles audio and text inputs, allowing you to have voice interactions without ASR modules. Qwen2-Audio supports English, Chinese, and major European languages,and also provides robust audio analysis for local use cases like:
6
+ - Speaker identification and response
7
+ - Speech translation and transcription
8
+ - Mixed audio and noise detection
9
+ - Music and sound analysis
10
+
11
+ ## We're bringing Qwen2-Audio to edge devices with Nexa SDK, offering various quantization options.
12
+ - Voice Chat: Users can freely engage in voice interactions with Qwen2-Audio without text input.
13
+ - Audio Analysis: Users can provide both audio and text instructions for analysis during the interaction.
14
+ ### Demo
15
+
16
+ <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/02XDwJe3bhZHYptor-b2_.mp4"></video>
17
+
18
+ ## How to Run Locally On-Device
19
+
20
+ In the following, we demonstrate how to run Qwen2-Audio locally on your device.
21
+
22
+ **Step 1: Install Nexa-SDK (local on-device inference framework)**
23
+
24
+ [Install Nexa-SDK](https://github.com/NexaAI/nexa-sdk?tab=readme-ov-file#install-option-1-executable-installer)
25
+
26
+ > Nexa-SDK is a open-sourced, local on-device inference framework, supporting text generation, image generation, vision-language models (VLM), audio-language models, speech-to-text (ASR), and text-to-speech (TTS) capabilities. Installable via Python Package or Executable Installer.
27
+
28
+ **Step 2: Then run the following code in your terminal to run with local streamlit UI**
29
+
30
+ ```bash
31
+ nexa run qwen2audio -st
32
+ ```
33
+
34
+ **or to use in terminal**:
35
+
36
+ ```bash
37
+ nexa run qwen2audio
38
+ ```
39
+
40
+ ### Usage Instructions
41
+
42
+ For terminal:
43
+ 1. Drag and drop your audio file into the terminal (or enter file path on Linux)
44
+ 2. Add text prompt to guide analysis or leave empty for direct voice input
45
+
46
+ ### System Requirements
47
+
48
+ 💻 **RAM Requirements**:
49
+ - Default q4_K_M version requires 4.2GB of RAM
50
+ - Check the RAM requirements table for different quantization versions
51
+
52
+ 🎵 **Audio Format**:
53
+ - Optimal: 16kHz `.wav` format
54
+ - Other formats and sample rates are supported with automatic conversion
55
+
56
+ ## Use Cases
57
+
58
+ ### Voice Chat
59
+ - Answer daily questions
60
+ - Offer suggestions
61
+ - Speaker identification and response
62
+ - Speech translation
63
+ - Detecting background noise and responding accordingly
64
+
65
+ ### Audio Analysis
66
+ - Information Extraction
67
+ - Audio summary
68
+ - Speech Transcription and Expansion
69
+ - Mixed audio and noise detection
70
+ - Music and sound analysis
71
+
72
+ ## Performance Benchmark
73
+
74
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/lax8bLpR5uK2_Za0G6G3j.png" alt="Example" style="width:700px;"/>
75
+
76
+ Results demonstrate that Qwen2-Audio significantly outperforms either previous SOTAs or Qwen-Audio across all tasks.
77
+
78
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/2vACK_gD_MAuZ7Hn4Yfiv.png" alt="Example" style="width:700px;"/>
79
+
80
+ To learn more about Qwen2-Audio's capability, please refer to their [Blog], [GitHub], and [Report].
81
+
82
+ ## Follow Nexa AI to run more models on-device
83
+ [Website](https://nexa.ai/)