How to use this model with onnxruntime-genai

#1
by hmartinez82 - opened

Is it possible to use this model with onnxruntime-genai (model-qa.py).

If so, how?

If the model architecture is supported . Then I guess there should not be a problem with that.
Most probably if the last release was a bit old. You might need to build the latest upstream code from GitHub and try up with it .

I do build . Onnxruntime for specifically ROCm for my purpose . The Onnx Runtime is base package. The extensions seem like additional wrappers .
Do use a nightly or build from source.
This might be of help.
https://onnxruntime.ai/docs/genai/tutorials/phi2-python.html

That's the tutorial I was following :) But to no avail.
I guess I'll try building it locally or using the nightly.

Thank you.

On my test with just onnx runtime . And loading onnx model using huggingface api from optimum works and generates results. Kindly check if it’s because lack of fp16 support on your specific hardware accelerator or memory bottleneck for loading a big model.

It does support directml and CUDA backend at this moment.
I’ll probably add support for ROCm later .
The cpu code is very generic c/cpp. As I don’t see some sort of provider usage and neural network framework.
You can try with some other models and tell me if there’s something wrong in it.

https://github.com/microsoft/onnxruntime-genai/blob/main/src/sequences_cuda.cpp

Yes, the lack of FP16 is getting in the way :(. I hope that the Snapdragon Elite X NPU will support it.

hmartinez82 changed discussion status to closed

I Hope our great Engineers could solve your problem, As in the Supreme Leader we believe in .

Sign up or log in to comment