michaelryoo commited on
Commit
21a1226
·
verified ·
1 Parent(s): b072331

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -8,7 +8,9 @@ pipeline_tag: image-text-to-text
8
  # Model description
9
  `xGen-MM-Vid (BLIP-3-Video)` is an efficient compact vision-language model (VLM) with an explicit temporal encoder, specifically designed to understand videos. It is developed by Salesforce AI Research. Incorporation of a learanable temporal encoder modules within the original (image-based) BLIP-3 architecture is its key aspect.
10
 
11
- In this initial release (12/2024), we are sharing the 128 token version trained to take 8-frame video inputs.
 
 
12
 
13
  For more details, check out our [tech report](https://arxiv.org/pdf/2410.16267). More detailed explanation could also be found in the [blog article](https://www.salesforceairesearch.com/opensource/xGen-MM-Vid/index.html).
14
 
 
8
  # Model description
9
  `xGen-MM-Vid (BLIP-3-Video)` is an efficient compact vision-language model (VLM) with an explicit temporal encoder, specifically designed to understand videos. It is developed by Salesforce AI Research. Incorporation of a learanable temporal encoder modules within the original (image-based) BLIP-3 architecture is its key aspect.
10
 
11
+ Here, we are sharing the 128 token version trained to take 8-frame video inputs. In principle, it is able to take any number of frames, but it was trained with 8-frame videos.
12
+
13
+ The 32 token version of the same model could be found at: [BLIP-3-Video 32 token model](https://huggingface.co/Salesforce/xgen-mm-vid-phi3-mini-r-v1.5-32tokens-8frames/).
14
 
15
  For more details, check out our [tech report](https://arxiv.org/pdf/2410.16267). More detailed explanation could also be found in the [blog article](https://www.salesforceairesearch.com/opensource/xGen-MM-Vid/index.html).
16