alanzhuly commited on
Commit
578b28b
1 Parent(s): fa645d0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -22
README.md CHANGED
@@ -1,21 +1,32 @@
 
 
 
 
 
 
 
 
 
 
1
  # Qwen2-Audio
2
 
3
  <img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/ThcKJj7LcWCZPwN1So05f.png" alt="Example" style="width:700px;"/>
4
 
5
- Qwen2-Audio is a SOTA small-scale multimodal model that handles audio and text inputs, allowing you to have voice interactions without ASR modules. Qwen2-Audio supports English, Chinese, and major European languages,and also provides robust audio analysis for local use cases like:
 
 
6
  - Speaker identification and response
7
  - Speech translation and transcription
8
  - Mixed audio and noise detection
9
  - Music and sound analysis
10
 
11
- ## We're bringing Qwen2-Audio to edge devices with Nexa SDK, offering various quantization options.
12
- - Voice Chat: Users can freely engage in voice interactions with Qwen2-Audio without text input.
13
- - Audio Analysis: Users can provide both audio and text instructions for analysis during the interaction.
14
  ### Demo
15
 
16
  <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/02XDwJe3bhZHYptor-b2_.mp4"></video>
17
 
18
- ## How to Run Locally On-Device
 
 
19
 
20
  In the following, we demonstrate how to run Qwen2-Audio locally on your device.
21
 
@@ -25,33 +36,28 @@ In the following, we demonstrate how to run Qwen2-Audio locally on your device.
25
 
26
  > Nexa-SDK is a open-sourced, local on-device inference framework, supporting text generation, image generation, vision-language models (VLM), audio-language models, speech-to-text (ASR), and text-to-speech (TTS) capabilities. Installable via Python Package or Executable Installer.
27
 
28
- **Step 2: Then run the following code in your terminal to run with local streamlit UI**
29
-
30
- ```bash
31
- nexa run qwen2audio -st
32
- ```
33
-
34
- **or to use in terminal**:
35
 
36
  ```bash
37
  nexa run qwen2audio
38
  ```
39
 
40
- ### Usage Instructions
41
 
42
  For terminal:
43
  1. Drag and drop your audio file into the terminal (or enter file path on Linux)
44
  2. Add text prompt to guide analysis or leave empty for direct voice input
45
 
46
- ### System Requirements
 
 
 
 
47
 
48
- 💻 **RAM Requirements**:
49
- - Default q4_K_M version requires 4.2GB of RAM
50
- - Check the RAM requirements table for different quantization versions
51
 
52
- 🎵 **Audio Format**:
53
- - Optimal: 16kHz `.wav` format
54
- - Other formats and sample rates are supported with automatic conversion
55
 
56
  ## Use Cases
57
 
@@ -78,5 +84,8 @@ Results demonstrate that Qwen2-Audio significantly outperforms either previous S
78
  <img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/2vACK_gD_MAuZ7Hn4Yfiv.png" alt="Example" style="width:700px;"/>
79
 
80
 
81
- ## Follow Nexa AI to run more models on-device
82
- [Website](https://nexa.ai/)
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - audio-text-to-text
7
+ - chat
8
+ - audio
9
+ - GGUF
10
+ ---
11
  # Qwen2-Audio
12
 
13
  <img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/ThcKJj7LcWCZPwN1So05f.png" alt="Example" style="width:700px;"/>
14
 
15
+ ## We're bringing Qwen2-Audio to run locally on edge devices with Nexa-SDK, offering various GGUF quantization options.
16
+
17
+ Qwen2-Audio is a SOTA small-scale multimodal model (AudioLM) that handles audio and text inputs, allowing you to have voice interactions without ASR modules. Qwen2-Audio supports English, Chinese, and major European languages,and provides voice chat and audio analysis capabilities for local use cases like:
18
  - Speaker identification and response
19
  - Speech translation and transcription
20
  - Mixed audio and noise detection
21
  - Music and sound analysis
22
 
 
 
 
23
  ### Demo
24
 
25
  <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/02XDwJe3bhZHYptor-b2_.mp4"></video>
26
 
27
+ Learn more in our [blogs](https://nexa.ai/blogs)
28
+
29
+ ## How to Run Locally On Device
30
 
31
  In the following, we demonstrate how to run Qwen2-Audio locally on your device.
32
 
 
36
 
37
  > Nexa-SDK is a open-sourced, local on-device inference framework, supporting text generation, image generation, vision-language models (VLM), audio-language models, speech-to-text (ASR), and text-to-speech (TTS) capabilities. Installable via Python Package or Executable Installer.
38
 
39
+ **Step 2: Then run the following code in your terminal**
 
 
 
 
 
 
40
 
41
  ```bash
42
  nexa run qwen2audio
43
  ```
44
 
45
+ This will run default q4_K_M quantization.
46
 
47
  For terminal:
48
  1. Drag and drop your audio file into the terminal (or enter file path on Linux)
49
  2. Add text prompt to guide analysis or leave empty for direct voice input
50
 
51
+ **or to use with local UI (streamlit)**:
52
+
53
+ ```bash
54
+ nexa run qwen2audio -st
55
+ ```
56
 
57
+ ## Choose Quantizations for your device
58
+ Run [different quantization versions here](https://nexa.ai/Qwen/Qwen2-Audio-7.8B-Instruct/gguf-q4_K_M/readme) and check RAM requirements in our list.
 
59
 
60
+ > The default q4_K_M version requires 4.2GB of RAM.
 
 
61
 
62
  ## Use Cases
63
 
 
84
  <img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/2vACK_gD_MAuZ7Hn4Yfiv.png" alt="Example" style="width:700px;"/>
85
 
86
 
87
+ ## Blog
88
+ Learn more in our [blogs](https://nexa.ai/blogs)
89
+
90
+ ## Join Community
91
+ [Discord](https://discord.gg/nexa-ai) | [X(Twitter)](https://x.com/nexa_ai)