Instructions to use OpenGVLab/InternVL2_5-2B-MPO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use OpenGVLab/InternVL2_5-2B-MPO with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="OpenGVLab/InternVL2_5-2B-MPO", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("OpenGVLab/InternVL2_5-2B-MPO", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use OpenGVLab/InternVL2_5-2B-MPO with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "OpenGVLab/InternVL2_5-2B-MPO"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenGVLab/InternVL2_5-2B-MPO",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/OpenGVLab/InternVL2_5-2B-MPO

SGLang

How to use OpenGVLab/InternVL2_5-2B-MPO with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "OpenGVLab/InternVL2_5-2B-MPO" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenGVLab/InternVL2_5-2B-MPO",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "OpenGVLab/InternVL2_5-2B-MPO" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenGVLab/InternVL2_5-2B-MPO",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use OpenGVLab/InternVL2_5-2B-MPO with Docker Model Runner:
```
docker model run hf.co/OpenGVLab/InternVL2_5-2B-MPO
```

czczup commited on Dec 22, 2024

Commit

ca6e03d

verified ·

1 Parent(s): 658afcd

Upload folder using huggingface_hub

Browse files

Files changed (1) hide show

README.md +10 -10

README.md CHANGED Viewed

@@ -65,8 +65,8 @@ To construct this dataset, we propose an efficient data construction pipeline. S
 - **For samples with clear ground truths:**
   the model is prompted to first provide the reasoning process and then give the final answer in the format like `Final Answer: ***`.
-  Responses matching the ground truth answer constitute the positive set $\mathcal{Y}_p$, while those that do not match make up the negative set $\mathcal{Y}_n$. Additionally, responses that fail to provide a clear final answer are also merged into $\mathcal{Y}_n$.
-  Given these responses labeled as positive or negative, we build the preference pairs by selecting a chosen response $y_c$ from $\mathcal{Y}_p$ and a negative response $y_r$ from $\mathcal{Y}_n$.
 - **For samples without clear ground truths:**
   we propose a simple yet effective method: Dropout Next-Token Prediction (Dropout NTP).
@@ -85,16 +85,16 @@ The data construction pipeline is open-sourced, see more details in our [documen
 ### Mixed Preference Optimization
 The key insight behind MPO is that *an effective PO process should enable the model to learn the relative preference between pairs of responses, the absolute quality of individual responses, and the process for generating preferred responses.* We define the training objective as a combination of
-preference loss $\mathcal{L}_{\text{p}}$,
-quality loss $\mathcal{L}_{\text{q}}$,
-and generation loss $\mathcal{L}_{\text{g}}$,
 referred to as Mixed Preference Optimization:
 $$
 \mathcal{L}=w_{p}\cdot\mathcal{L}_{\text{p}} + w_{q}\cdot\mathcal{L}_{\text{q}} + w_{g}\cdot\mathcal{L}_{\text{g}},
 $$
-where $w_{*}$ represents the weight assigned to each loss component.
 In this work, we empirically compare different variants of preference loss.
 Based on the experimental results, we use DPO as our preference loss and BCO as our quality loss.
@@ -106,8 +106,8 @@ $$
 \mathcal{L}_{\text{p}}=-\log \sigma\left(\beta \log \frac{\pi_\theta\left(y_c \mid x\right)}{\pi_0\left(y_c \mid x\right)}-\beta \log \frac{\pi_\theta\left(y_r \mid x\right)}{\pi_0\left(y_r \mid x\right)}\right),
 $$
-where $\beta$ is the KL penalty coefficient, and $x$, $y_c$, and $y_r$ are user query, chosen response, and rejected response, respectively.
-The policy model $\pi_\theta$ is initialized from model $\pi_0$.
 Additionally, the BCO loss is employed as the quality loss, which helps the model to understand the absolute quality of individual responses.
 The loss function is defined as:
@@ -116,7 +116,7 @@ $$
 \mathcal{L}_{\text{q}}=\mathcal{L}_{\text{q}}^+ + \mathcal{L}_{\text{q}}^-,
 $$
-where $\mathcal{L}_{\text{q}}^{+}$ and $\mathcal{L}_{\text{q}}^{+}$ represent the loss for chosen and rejected responses, respectively.
 Each response type's loss is calculated independently, requiring the model to differentiate the absolute quality of individual responses. The loss terms are given by:
 $$
@@ -127,7 +127,7 @@ $$
 \mathcal{L}_{\text{q}}^-=-\log \sigma\left(-\left(\beta \log \frac{\pi_\theta\left(y_r \mid x\right)}{\pi_0\left(y_r \mid x\right)} - \delta\right) \right),
 $$
-where $\delta$ represents the reward shift, calculated as the moving average of previous rewards to stabilize training.
 Finally, the SFT loss is used as the generation loss to help the model learn the generation process of preferred responses.
 The loss function is defined as:

 - **For samples with clear ground truths:**
   the model is prompted to first provide the reasoning process and then give the final answer in the format like `Final Answer: ***`.
+  Responses matching the ground truth answer constitute the positive set \\(mathcal{Y}_p\\), while those that do not match make up the negative set \\(\mathcal{Y}_n\\). Additionally, responses that fail to provide a clear final answer are also merged into \\(\mathcal{Y}_n\\).
+  Given these responses labeled as positive or negative, we build the preference pairs by selecting a chosen response \\(y_c\\) from \\(\mathcal{Y}_p\\) and a negative response \\(y_r\\) from \\(\mathcal{Y}_n\\).
 - **For samples without clear ground truths:**
   we propose a simple yet effective method: Dropout Next-Token Prediction (Dropout NTP).
 ### Mixed Preference Optimization
 The key insight behind MPO is that *an effective PO process should enable the model to learn the relative preference between pairs of responses, the absolute quality of individual responses, and the process for generating preferred responses.* We define the training objective as a combination of
+preference loss \\(\mathcal{L}_{\text{p}}\\),
+quality loss \\(\mathcal{L}_{\text{q}}\\),
+and generation loss \\(\mathcal{L}_{\text{g}}\\),
 referred to as Mixed Preference Optimization:
 $$
 \mathcal{L}=w_{p}\cdot\mathcal{L}_{\text{p}} + w_{q}\cdot\mathcal{L}_{\text{q}} + w_{g}\cdot\mathcal{L}_{\text{g}},
 $$
+where \\(w_{*}\\) represents the weight assigned to each loss component.
 In this work, we empirically compare different variants of preference loss.
 Based on the experimental results, we use DPO as our preference loss and BCO as our quality loss.
 \mathcal{L}_{\text{p}}=-\log \sigma\left(\beta \log \frac{\pi_\theta\left(y_c \mid x\right)}{\pi_0\left(y_c \mid x\right)}-\beta \log \frac{\pi_\theta\left(y_r \mid x\right)}{\pi_0\left(y_r \mid x\right)}\right),
 $$
+where \\(\beta\\) is the KL penalty coefficient, and \\(x\\), \\(y_c\\), and \\(y_r\\) are user query, chosen response, and rejected response, respectively.
+The policy model \\(\pi_\theta\\) is initialized from model \\(\pi_0\\).
 Additionally, the BCO loss is employed as the quality loss, which helps the model to understand the absolute quality of individual responses.
 The loss function is defined as:
 \mathcal{L}_{\text{q}}=\mathcal{L}_{\text{q}}^+ + \mathcal{L}_{\text{q}}^-,
 $$
+where \\(\mathcal{L}_{\text{q}}^{+}\\) and \\(\mathcal{L}_{\text{q}}^{+}\\) represent the loss for chosen and rejected responses, respectively.
 Each response type's loss is calculated independently, requiring the model to differentiate the absolute quality of individual responses. The loss terms are given by:
 $$
 \mathcal{L}_{\text{q}}^-=-\log \sigma\left(-\left(\beta \log \frac{\pi_\theta\left(y_r \mid x\right)}{\pi_0\left(y_r \mid x\right)} - \delta\right) \right),
 $$
+where \\(\delta\\) represents the reward shift, calculated as the moving average of previous rewards to stabilize training.
 Finally, the SFT loss is used as the generation loss to help the model learn the generation process of preferred responses.
 The loss function is defined as: