NexaAIDev
/

Qwen2-Audio-7B-GGUF

Audio-Text-to-Text

Inference Endpoints

Model card Files Files and versions Community

Qwen2-Audio-7B-GGUF / README.md

alanzhuly's picture

Update README.md

578b28b verified 28 days ago

|

3.2 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- audio-text-to-text
	- chat
	- audio
	- GGUF
	---
	# Qwen2-Audio

	<img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/ThcKJj7LcWCZPwN1So05f.png" alt="Example" style="width:700px;"/>

	## We're bringing Qwen2-Audio to run locally on edge devices with Nexa-SDK, offering various GGUF quantization options.

	Qwen2-Audio is a SOTA small-scale multimodal model (AudioLM) that handles audio and text inputs, allowing you to have voice interactions without ASR modules. Qwen2-Audio supports English, Chinese, and major European languages,and provides voice chat and audio analysis capabilities for local use cases like:
	- Speaker identification and response
	- Speech translation and transcription
	- Mixed audio and noise detection
	- Music and sound analysis

	### Demo

	<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/02XDwJe3bhZHYptor-b2_.mp4"></video>

	Learn more in our [blogs](https://nexa.ai/blogs)

	## How to Run Locally On Device

	In the following, we demonstrate how to run Qwen2-Audio locally on your device.

	Step 1: Install Nexa-SDK (local on-device inference framework)

	[Install Nexa-SDK](https://github.com/NexaAI/nexa-sdk?tab=readme-ov-file#install-option-1-executable-installer)

	> Nexa-SDK is a open-sourced, local on-device inference framework, supporting text generation, image generation, vision-language models (VLM), audio-language models, speech-to-text (ASR), and text-to-speech (TTS) capabilities. Installable via Python Package or Executable Installer.

	Step 2: Then run the following code in your terminal

	```bash
	nexa run qwen2audio
	```

	This will run default q4_K_M quantization.

	For terminal:
	1. Drag and drop your audio file into the terminal (or enter file path on Linux)
	2. Add text prompt to guide analysis or leave empty for direct voice input

	or to use with local UI (streamlit):

	```bash
	nexa run qwen2audio -st
	```

	## Choose Quantizations for your device
	Run [different quantization versions here](https://nexa.ai/Qwen/Qwen2-Audio-7.8B-Instruct/gguf-q4_K_M/readme) and check RAM requirements in our list.

	> The default q4_K_M version requires 4.2GB of RAM.

	## Use Cases

	### Voice Chat
	- Answer daily questions
	- Offer suggestions
	- Speaker identification and response
	- Speech translation
	- Detecting background noise and responding accordingly

	### Audio Analysis
	- Information Extraction
	- Audio summary
	- Speech Transcription and Expansion
	- Mixed audio and noise detection
	- Music and sound analysis

	## Performance Benchmark

	<img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/lax8bLpR5uK2_Za0G6G3j.png" alt="Example" style="width:700px;"/>

	Results demonstrate that Qwen2-Audio significantly outperforms either previous SOTAs or Qwen-Audio across all tasks.

	<img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/2vACK_gD_MAuZ7Hn4Yfiv.png" alt="Example" style="width:700px;"/>


	## Blog
	Learn more in our [blogs](https://nexa.ai/blogs)

	## Join Community
	[Discord](https://discord.gg/nexa-ai) \| [X(Twitter)](https://x.com/nexa_ai)