AdaptLLM commited on
Commit
fb1b9f8
1 Parent(s): 336cbe1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -2
README.md CHANGED
@@ -23,7 +23,29 @@ We investigate domain adaptation of MLLMs through post-training, focusing on dat
23
  <img src="https://cdn-uploads.huggingface.co/production/uploads/650801ced5578ef7e20b33d4/bRu85CWwP9129bSCRzos2.png" width="1000">
24
  </p>
25
 
26
- ## How to use
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
  Starting with transformers >= 4.45.0 onward, you can run inference using conversational messages that may include an image you can query about.
29
 
@@ -47,6 +69,7 @@ processor = AutoProcessor.from_pretrained(model_id)
47
  url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
48
  image = Image.open(requests.get(url, stream=True).raw)
49
 
 
50
  messages = [
51
  {"role": "user", "content": [
52
  {"type": "image"},
@@ -65,7 +88,12 @@ output = model.generate(**inputs, max_new_tokens=30)
65
  print(processor.decode(output[0]))
66
  ```
67
 
68
- Since our model architecture aligns with the base model, you can refer to the official repository of [Llama-3.2-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) for more advanced usage instructions.
 
 
 
 
 
69
 
70
  ## Citation
71
  If you find our work helpful, please cite us.
 
23
  <img src="https://cdn-uploads.huggingface.co/production/uploads/650801ced5578ef7e20b33d4/bRu85CWwP9129bSCRzos2.png" width="1000">
24
  </p>
25
 
26
+ ## Resources
27
+ **🤗 We share our data and models with example usages, feel free to open any issues or discussions! 🤗**
28
+
29
+ | Model | Repo ID in HF 🤗 | Domain | Base Model | Training Data | Evaluation Benchmark |
30
+ |:----------------------------------------------------------------------------|:--------------------------------------------|:--------------|:-------------------------|:------------------------------------------------------------------------------------------------|-----------------------|
31
+ | [Visual Instruction Synthesizer](https://huggingface.co/AdaptLLM/visual-instruction-synthesizer) | AdaptLLM/visual-instruction-synthesizer | - | open-llava-next-llama3-8b | VisionFLAN and ALLaVA | - |
32
+ | [AdaMLLM-med-2B](https://huggingface.co/AdaptLLM/biomed-Qwen2-VL-2B-Instruct) | AdaptLLM/biomed-Qwen2-VL-2B-Instruct | Biomedicine | Qwen2-VL-2B-Instruct | [biomed-visual-instructions](https://huggingface.co/datasets/AdaptLLM/biomed-visual-instructions) | [biomed-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/biomed-VQA-benchmark) |
33
+ | [AdaMLLM-food-2B](https://huggingface.co/AdaptLLM/food-Qwen2-VL-2B-Instruct) | AdaptLLM/food-Qwen2-VL-2B-Instruct | Food | Qwen2-VL-2B-Instruct | [food-visual-instructions](https://huggingface.co/datasets/AdaptLLM/food-visual-instructions) | [food-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/food-VQA-benchmark) |
34
+ | [AdaMLLM-med-8B](https://huggingface.co/AdaptLLM/biomed-LLaVA-NeXT-Llama3-8B) | AdaptLLM/biomed-LLaVA-NeXT-Llama3-8B | Biomedicine | open-llava-next-llama3-8b | [biomed-visual-instructions](https://huggingface.co/datasets/AdaptLLM/biomed-visual-instructions) | [biomed-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/biomed-VQA-benchmark) |
35
+ | [AdaMLLM-food-8B](https://huggingface.co/AdaptLLM/food-LLaVA-NeXT-Llama3-8B) |AdaptLLM/food-LLaVA-NeXT-Llama3-8B | Food | open-llava-next-llama3-8b | [food-visual-instructions](https://huggingface.co/datasets/AdaptLLM/food-visual-instructions) | [food-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/food-VQA-benchmark) |
36
+ | [AdaMLLM-med-11B](https://huggingface.co/AdaptLLM/biomed-Llama-3.2-11B-Vision-Instruct) | AdaptLLM/biomed-Llama-3.2-11B-Vision-Instruct | Biomedicine | Llama-3.2-11B-Vision-Instruct | [biomed-visual-instructions](https://huggingface.co/datasets/AdaptLLM/biomed-visual-instructions) | [biomed-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/biomed-VQA-benchmark) |
37
+ | [AdaMLLM-food-11B](https://huggingface.co/AdaptLLM/food-Llama-3.2-11B-Vision-Instruct) | AdaptLLM/food-Llama-3.2-11B-Vision-Instruct | Food | Llama-3.2-11B-Vision-Instruct | [food-visual-instructions](https://huggingface.co/datasets/AdaptLLM/food-visual-instructions) | [food-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/food-VQA-benchmark) |
38
+
39
+ **Code**: [https://github.com/bigai-ai/QA-Synthesizer](https://github.com/bigai-ai/QA-Synthesizer)
40
+
41
+ ## 1. To Chat with AdaMLLM
42
+
43
+ Our model architecture aligns with the base model: Llama-3.2-Vision-Instruct. We provide a usage example below, and you may refer to the official [Llama-3.2-Vision-Instruct Repository](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) for more advanced usage instructions,
44
+
45
+ **Note:** For AdaMLLM, always place the image at the beginning of the input instruction in the messages.
46
+
47
+ <details>
48
+ <summary> Click to expand </summary>
49
 
50
  Starting with transformers >= 4.45.0 onward, you can run inference using conversational messages that may include an image you can query about.
51
 
 
69
  url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
70
  image = Image.open(requests.get(url, stream=True).raw)
71
 
72
+ # NOTE: For AdaMLLM, always place the image at the beginning of the input instruction in the messages.
73
  messages = [
74
  {"role": "user", "content": [
75
  {"type": "image"},
 
88
  print(processor.decode(output[0]))
89
  ```
90
 
91
+ </details>
92
+
93
+ ## 2. To Evaluate AdaMLLM on Domain-Specific Benchmarks
94
+
95
+ Refer to the [food-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/food-VQA-benchmark) to reproduce our results and evaluate many other MLLMs on domain-specific benchmarks.
96
+
97
 
98
  ## Citation
99
  If you find our work helpful, please cite us.