Nguyễn Thành Đô

nguyenthanhdo

AI & ML interests

I'm building open source LLMs that work for Vietnamese

Recent Activity

Organizations

Viettel Cyberspace Center (Viettel AI)'s profile picture Chatbot Vui Vẻ's profile picture Việt Nam Tự Cường's profile picture Roleplay Model Hub's profile picture Continual Pretraining's profile picture

nguyenthanhdo's activity

Reacted to mlabonne's post with ❤️ 6 months ago
view post
Post
9239
⚡ AutoQuant

AutoQuant is the evolution of my previous AutoGGUF notebook (https://colab.research.google.com/drive/1P646NEg33BZy4BfLDNpTz0V0lwIU3CHu). It allows you to quantize your models in five different formats:

- GGUF: perfect for inference on CPUs (and LM Studio)
- GPTQ/EXL2: fast inference on GPUs
- AWQ: super fast inference on GPUs with vLLM (https://github.com/vllm-project/vllm)
- HQQ: extreme quantization with decent 2-bit and 3-bit models

Once the model is converted, it automatically uploads it on the Hugging Face Hub. To quantize a 7B model, GGUF only needs a T4 GPU, while the other methods require an A100 GPU.

Here's an example of a model I quantized using HQQ and AutoQuant: mlabonne/AlphaMonarch-7B-2bit-HQQ

I hope you'll enjoy it and quantize lots of models! :)

💻 AutoQuant: https://colab.research.google.com/drive/1b6nqC7UZVt8bx4MksX7s656GXPM-eWw4
·