Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -19,7 +19,7 @@ widget:
19
  ---
20
  ## Model Summary
21
 
22
- Phi-3 Vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures.
23
 
24
  Resources and Technical Documentation:
25
 
@@ -59,7 +59,7 @@ Nothing contained in this Model Card should be interpreted as or deemed a restri
59
 
60
  ## How to Use
61
 
62
- Phi-3-vision-128K-Instruct has been integrated in the development version (4.40.2) of `transformers`. Until the official version is released through `pip`, ensure that you are doing one of the following:
63
  * When loading the model, ensure that `trust_remote_code=True` is passed as an argument of the `from_pretrained()` function.
64
 
65
  * Update your local `transformers` to the development version: `pip uninstall -y transformers && pip install git+https://github.com/huggingface/transformers`. The previous command is an alternative to cloning and installing from the source.
@@ -190,7 +190,7 @@ More details can be found in the [Phi-3 Technical Report](https://aka.ms/phi3-te
190
 
191
  ## Benchmarks
192
 
193
- To understand the capabilities, we compare Phi-3 Vision-128K-Instruct with a set of models over a variety of zero-shot benchmarks using our internal benchmark platform.
194
 
195
  |Benchmark|Phi-3 Vision-128K-In|LlaVA-1.6 Vicuna-7B|QWEN-VL Chat|Llama3-Llava-Next-8B|Claude-3 Haiku|Gemini 1.0 Pro V|GPT-4V-Turbo|
196
  |---------|---------------------|------------------|------------|--------------------|--------------|----------------|------------|
 
19
  ---
20
  ## Model Summary
21
 
22
+ The Phi-3-Vision-128K-Instruct is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures.
23
 
24
  Resources and Technical Documentation:
25
 
 
59
 
60
  ## How to Use
61
 
62
+ Phi-3-Vision-128K-Instruct has been integrated in the development version (4.40.2) of `transformers`. Until the official version is released through `pip`, ensure that you are doing one of the following:
63
  * When loading the model, ensure that `trust_remote_code=True` is passed as an argument of the `from_pretrained()` function.
64
 
65
  * Update your local `transformers` to the development version: `pip uninstall -y transformers && pip install git+https://github.com/huggingface/transformers`. The previous command is an alternative to cloning and installing from the source.
 
190
 
191
  ## Benchmarks
192
 
193
+ To understand the capabilities, we compare Phi-3-Vision-128K-Instruct with a set of models over a variety of zero-shot benchmarks using our internal benchmark platform.
194
 
195
  |Benchmark|Phi-3 Vision-128K-In|LlaVA-1.6 Vicuna-7B|QWEN-VL Chat|Llama3-Llava-Next-8B|Claude-3 Haiku|Gemini 1.0 Pro V|GPT-4V-Turbo|
196
  |---------|---------------------|------------------|------------|--------------------|--------------|----------------|------------|