|
# Bringing SOTA quantization to mobile LLM deployment: A practical Executorch integration guide |
|
|
|
Article: https://blacksamorez.substack.com/p/aqlm-executorch-android |
|
|
|
## Usage |
|
|
|
- Download and install the `.apk` file on your Android phone. |
|
- Download the `.pte` and `.model` files and put them into the `/data/local/tmp/llama` folder on your Android phone. |
|
- Running the app you will see the option to load the `.pte` and `.model` files. After loading them, you'll be able to chat with the model. |
|
|
|
## Requirements |
|
|
|
This app was tested on `Samsung S24 Ultra` running `Android 14`. |
|
|
|
## Limitations |
|
|
|
- Although the app looks like chat, generation requests are independent. |
|
- Llama-3 chat template is hard-coded into the app. |