AdaptLLM
/

biomed-Qwen2-VL-2B-Instruct

@@ -26,7 +26,30 @@ We investigate domain adaptation of MLLMs through post-training, focusing on dat
     <img src="https://cdn-uploads.huggingface.co/production/uploads/650801ced5578ef7e20b33d4/-Jp7pAsCR2Tj4WwfwsbCo.png" width="600">
 </p>
-## How to use
 1. Set up
 ```bash
 pip install qwen-vl-utils
@@ -57,6 +80,8 @@ processor = AutoProcessor.from_pretrained("AdaptLLM/medicine-Qwen2-VL-2B-Instruc
 # max_pixels = 1280*28*28
 # processor = AutoProcessor.from_pretrained("AdaptLLM/medicine-Qwen2-VL-2B-Instruct", min_pixels=min_pixels, max_pixels=max_pixels)
 messages = [
     {
         "role": "user",
@@ -94,8 +119,13 @@ output_text = processor.batch_decode(
 )
 print(output_text)
 ```
-Since our model architecture aligns with the base model, you can refer to the official repository of [Qwen-2-VL](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct/edit/main/README.md) for more advanced usage instructions.
 ## Citation
 If you find our work helpful, please cite us.

     <img src="https://cdn-uploads.huggingface.co/production/uploads/650801ced5578ef7e20b33d4/-Jp7pAsCR2Tj4WwfwsbCo.png" width="600">
 </p>
+## Resources
+**🤗 We share our data and models with example usages, feel free to open any issues or discussions! 🤗**
+| Model                                                                       | Repo ID in HF 🤗                           | Domain       | Base Model              | Training Data                                                                                  | Evaluation Benchmark |
+|:----------------------------------------------------------------------------|:--------------------------------------------|:--------------|:-------------------------|:------------------------------------------------------------------------------------------------|-----------------------|
+| [Visual Instruction Synthesizer](https://huggingface.co/AdaptLLM/visual-instruction-synthesizer) | AdaptLLM/visual-instruction-synthesizer     | -  | open-llava-next-llama3-8b    | VisionFLAN and ALLaVA | -                   |
+| [AdaMLLM-med-2B](https://huggingface.co/AdaptLLM/biomed-Qwen2-VL-2B-Instruct) | AdaptLLM/biomed-Qwen2-VL-2B-Instruct     | Biomedicine  | Qwen2-VL-2B-Instruct    | [biomed-visual-instructions](https://huggingface.co/datasets/AdaptLLM/biomed-visual-instructions) | [biomed-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/biomed-VQA-benchmark)                   |
+| [AdaMLLM-food-2B](https://huggingface.co/AdaptLLM/food-Qwen2-VL-2B-Instruct) | AdaptLLM/food-Qwen2-VL-2B-Instruct     | Food  | Qwen2-VL-2B-Instruct    | [food-visual-instructions](https://huggingface.co/datasets/AdaptLLM/food-visual-instructions) | [food-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/food-VQA-benchmark)                   |
+| [AdaMLLM-med-8B](https://huggingface.co/AdaptLLM/biomed-LLaVA-NeXT-Llama3-8B) | AdaptLLM/biomed-LLaVA-NeXT-Llama3-8B     | Biomedicine  | open-llava-next-llama3-8b    | [biomed-visual-instructions](https://huggingface.co/datasets/AdaptLLM/biomed-visual-instructions) | [biomed-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/biomed-VQA-benchmark)                   |
+| [AdaMLLM-food-8B](https://huggingface.co/AdaptLLM/food-LLaVA-NeXT-Llama3-8B) |AdaptLLM/food-LLaVA-NeXT-Llama3-8B     | Food  | open-llava-next-llama3-8b    | [food-visual-instructions](https://huggingface.co/datasets/AdaptLLM/food-visual-instructions) |  [food-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/food-VQA-benchmark)                   |
+| [AdaMLLM-med-11B](https://huggingface.co/AdaptLLM/biomed-Llama-3.2-11B-Vision-Instruct) | AdaptLLM/biomed-Llama-3.2-11B-Vision-Instruct     | Biomedicine  | Llama-3.2-11B-Vision-Instruct    | [biomed-visual-instructions](https://huggingface.co/datasets/AdaptLLM/biomed-visual-instructions) | [biomed-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/biomed-VQA-benchmark)                   |
+| [AdaMLLM-food-11B](https://huggingface.co/AdaptLLM/food-Llama-3.2-11B-Vision-Instruct) | AdaptLLM/food-Llama-3.2-11B-Vision-Instruct     | Food | Llama-3.2-11B-Vision-Instruct    | [food-visual-instructions](https://huggingface.co/datasets/AdaptLLM/food-visual-instructions) |  [food-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/food-VQA-benchmark)                   |
+**Code**: [https://github.com/bigai-ai/QA-Synthesizer](https://github.com/bigai-ai/QA-Synthesizer)
+## 1. To Chat with AdaMLLM
+Our model architecture aligns with the base model: Qwen-2-VL-Instruct. Below, we provide a usage example. For more advanced usage instructions, please refer to the official [Qwen-2-VL-Instruct repository](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct/edit/main/README.md).
+**Note:** For AdaMLLM, always place the image at the beginning of the input instruction in the messages.
+<details>
+<summary> Click to expand </summary>
 1. Set up
 ```bash
 pip install qwen-vl-utils
 # max_pixels = 1280*28*28
 # processor = AutoProcessor.from_pretrained("AdaptLLM/medicine-Qwen2-VL-2B-Instruct", min_pixels=min_pixels, max_pixels=max_pixels)
+# NOTE: For AdaMLLM, always place the image at the beginning of the input instruction in the messages.
 messages = [
     {
         "role": "user",
 )
 print(output_text)
 ```
+</details>
+## 2. To Evaluate AdaMLLM on Domain-Specific Benchmarks
+Refer to the [biomed-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/biomed-VQA-benchmark) to reproduce our results and evaluate many other MLLMs on domain-specific benchmarks.
 ## Citation
 If you find our work helpful, please cite us.