Request for Instructions

by Ailelix - opened Oct 18

Oct 18

First of all, thanks for your impeccable work.
Currently Ryzen AI platform is quite experimental and limited. I'm glad to have a llm to play on my NPU.

However, llama3.1-8b seems to be too heavy for a NPU I'm using HX370
I think that a lighter model (e.g. llama3.2-1b/gemma2-2b etc) would be more suitable for NPU to run
So I tried to quantitize Llama-3.2-1b on my own
However, I failed to translate it to NPU platform with Vivits AI

So I wonder what you did to translate the model in this project
I'd appreciate it if you can reply.

dahara1

Owner Oct 19

Hello.

I was not able to convert with high accuracy using Vivits AI either.

Here is an overview.
https://www.hackster.io/gharada2013/running-llm-on-amd-npu-hardware-19322f

Ailelix

Oct 19

•

edited Oct 19

Yeah AMD has not developed a rich soft environment...
I may stop now to wait for either llama.cpp support in LM-Studio, or an ollama support

Anyways, thanks again for your contribution

dahara1

Owner Oct 20

I tried to awq convert and run 1B and 3B. I was able to convert the model, but unfortunately the output of both was broken.
The model is small enough that it might be better to consider running it on the CPU rather than on the NPU.

Ailelix

Oct 20

Yep I just wanted to try running llm on my NPU without AC Power, to see if a lightweight local AI copilot is possible in daily productivity
BTW I just certificated Github Education to use Copilot so... : )
I still think that using NPU can be more power efficiency tho

Anyways, I'm currently using your llama3.1-8b-instruct-npu
Thx for the contribution

Ailelix changed discussion status to closed Oct 20

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment