Update README.md
Browse files
README.md
CHANGED
@@ -1,21 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# Qwen2-Audio
|
2 |
|
3 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/ThcKJj7LcWCZPwN1So05f.png" alt="Example" style="width:700px;"/>
|
4 |
|
5 |
-
|
|
|
|
|
6 |
- Speaker identification and response
|
7 |
- Speech translation and transcription
|
8 |
- Mixed audio and noise detection
|
9 |
- Music and sound analysis
|
10 |
|
11 |
-
## We're bringing Qwen2-Audio to edge devices with Nexa SDK, offering various quantization options.
|
12 |
-
- Voice Chat: Users can freely engage in voice interactions with Qwen2-Audio without text input.
|
13 |
-
- Audio Analysis: Users can provide both audio and text instructions for analysis during the interaction.
|
14 |
### Demo
|
15 |
|
16 |
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/02XDwJe3bhZHYptor-b2_.mp4"></video>
|
17 |
|
18 |
-
|
|
|
|
|
19 |
|
20 |
In the following, we demonstrate how to run Qwen2-Audio locally on your device.
|
21 |
|
@@ -25,33 +36,28 @@ In the following, we demonstrate how to run Qwen2-Audio locally on your device.
|
|
25 |
|
26 |
> Nexa-SDK is a open-sourced, local on-device inference framework, supporting text generation, image generation, vision-language models (VLM), audio-language models, speech-to-text (ASR), and text-to-speech (TTS) capabilities. Installable via Python Package or Executable Installer.
|
27 |
|
28 |
-
**Step 2: Then run the following code in your terminal
|
29 |
-
|
30 |
-
```bash
|
31 |
-
nexa run qwen2audio -st
|
32 |
-
```
|
33 |
-
|
34 |
-
**or to use in terminal**:
|
35 |
|
36 |
```bash
|
37 |
nexa run qwen2audio
|
38 |
```
|
39 |
|
40 |
-
|
41 |
|
42 |
For terminal:
|
43 |
1. Drag and drop your audio file into the terminal (or enter file path on Linux)
|
44 |
2. Add text prompt to guide analysis or leave empty for direct voice input
|
45 |
|
46 |
-
|
|
|
|
|
|
|
|
|
47 |
|
48 |
-
|
49 |
-
|
50 |
-
- Check the RAM requirements table for different quantization versions
|
51 |
|
52 |
-
|
53 |
-
- Optimal: 16kHz `.wav` format
|
54 |
-
- Other formats and sample rates are supported with automatic conversion
|
55 |
|
56 |
## Use Cases
|
57 |
|
@@ -78,5 +84,8 @@ Results demonstrate that Qwen2-Audio significantly outperforms either previous S
|
|
78 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/2vACK_gD_MAuZ7Hn4Yfiv.png" alt="Example" style="width:700px;"/>
|
79 |
|
80 |
|
81 |
-
##
|
82 |
-
[
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
tags:
|
6 |
+
- audio-text-to-text
|
7 |
+
- chat
|
8 |
+
- audio
|
9 |
+
- GGUF
|
10 |
+
---
|
11 |
# Qwen2-Audio
|
12 |
|
13 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/ThcKJj7LcWCZPwN1So05f.png" alt="Example" style="width:700px;"/>
|
14 |
|
15 |
+
## We're bringing Qwen2-Audio to run locally on edge devices with Nexa-SDK, offering various GGUF quantization options.
|
16 |
+
|
17 |
+
Qwen2-Audio is a SOTA small-scale multimodal model (AudioLM) that handles audio and text inputs, allowing you to have voice interactions without ASR modules. Qwen2-Audio supports English, Chinese, and major European languages,and provides voice chat and audio analysis capabilities for local use cases like:
|
18 |
- Speaker identification and response
|
19 |
- Speech translation and transcription
|
20 |
- Mixed audio and noise detection
|
21 |
- Music and sound analysis
|
22 |
|
|
|
|
|
|
|
23 |
### Demo
|
24 |
|
25 |
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/02XDwJe3bhZHYptor-b2_.mp4"></video>
|
26 |
|
27 |
+
Learn more in our [blogs](https://nexa.ai/blogs)
|
28 |
+
|
29 |
+
## How to Run Locally On Device
|
30 |
|
31 |
In the following, we demonstrate how to run Qwen2-Audio locally on your device.
|
32 |
|
|
|
36 |
|
37 |
> Nexa-SDK is a open-sourced, local on-device inference framework, supporting text generation, image generation, vision-language models (VLM), audio-language models, speech-to-text (ASR), and text-to-speech (TTS) capabilities. Installable via Python Package or Executable Installer.
|
38 |
|
39 |
+
**Step 2: Then run the following code in your terminal**
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
|
41 |
```bash
|
42 |
nexa run qwen2audio
|
43 |
```
|
44 |
|
45 |
+
This will run default q4_K_M quantization.
|
46 |
|
47 |
For terminal:
|
48 |
1. Drag and drop your audio file into the terminal (or enter file path on Linux)
|
49 |
2. Add text prompt to guide analysis or leave empty for direct voice input
|
50 |
|
51 |
+
**or to use with local UI (streamlit)**:
|
52 |
+
|
53 |
+
```bash
|
54 |
+
nexa run qwen2audio -st
|
55 |
+
```
|
56 |
|
57 |
+
## Choose Quantizations for your device
|
58 |
+
Run [different quantization versions here](https://nexa.ai/Qwen/Qwen2-Audio-7.8B-Instruct/gguf-q4_K_M/readme) and check RAM requirements in our list.
|
|
|
59 |
|
60 |
+
> The default q4_K_M version requires 4.2GB of RAM.
|
|
|
|
|
61 |
|
62 |
## Use Cases
|
63 |
|
|
|
84 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/2vACK_gD_MAuZ7Hn4Yfiv.png" alt="Example" style="width:700px;"/>
|
85 |
|
86 |
|
87 |
+
## Blog
|
88 |
+
Learn more in our [blogs](https://nexa.ai/blogs)
|
89 |
+
|
90 |
+
## Join Community
|
91 |
+
[Discord](https://discord.gg/nexa-ai) | [X(Twitter)](https://x.com/nexa_ai)
|