boxin-wbx commited on
Commit
4e2ec6f
·
verified ·
1 Parent(s): b925209

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -3
README.md CHANGED
@@ -22,8 +22,13 @@ library_name: transformers
22
  ## Description
23
  This family of models performs vision-language and text-only tasks including optical character recognition, multimodal reasoning, localization, common sense reasoning, world knowledge utilization, and coding.
24
 
 
 
25
  ## License/Terms of Use
26
- [Creative Commons Attribution: Non-Commercial 4.0 International](https://spdx.org/licenses/CC-BY-NC-4.0) <br>
 
 
 
27
 
28
  # Model Details
29
 
@@ -84,7 +89,20 @@ Results (as of September 17th, 2024) in the multimodal benchmarks are as follows
84
 
85
  ## Model Architectures
86
 
87
- **Network Architecture:** Decoder-Only Transformer
 
 
 
 
 
 
 
 
 
 
 
 
 
88
 
89
  ### Input
90
  **Input Type(s):** Text, Image <br>
@@ -411,7 +429,7 @@ Wenliang Dai* (wdai@nvidia.com), Nayeon Lee* (nayeonl@nvidia.com), Boxin Wang* (
411
 
412
 
413
  ## Ethical Considerations
414
- NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
415
 
416
  Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
417
 
 
22
  ## Description
23
  This family of models performs vision-language and text-only tasks including optical character recognition, multimodal reasoning, localization, common sense reasoning, world knowledge utilization, and coding.
24
 
25
+ This model is ready for non-commercial use.
26
+
27
  ## License/Terms of Use
28
+
29
+ Governing Terms: Deed - [Attribution-NonCommercial 4.0 International - Creative Commons](https://creativecommons.org/licenses/by-nc/4.0/deed.en).
30
+
31
+ Additional Information: [LICENSE · Qwen/Qwen2-72B-Instruct at main](https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/LICENSE) for Qwen2-72B-Instruct and [The MIT License – Open Source Initiative](https://opensource.org/license/mit) for InternViT-6B-448px-V1-2.
32
 
33
  # Model Details
34
 
 
89
 
90
  ## Model Architectures
91
 
92
+ **Network Architecture:** Decoder-Only Transformer
93
+
94
+ **Text-only LLM backbone:** [Qwen2-72B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct)
95
+
96
+ **Vision encoder:** [InternViT-6B](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2)
97
+
98
+ ### Robustness
99
+
100
+ The model trained on this dataset cannot regenerate its training data:
101
+
102
+ 1. The model has no image generation capability since its output is only text. Hence it cannot regenerate any image it would have seen during training.
103
+
104
+ 2. The model cannot regenerate training text data: during training, the model takes text and images as inputs, and the model output (text) is conditioned on both inputs. During inference, without training images as input, the models would not be able to reproduce any part of the training text data.
105
+
106
 
107
  ### Input
108
  **Input Type(s):** Text, Image <br>
 
429
 
430
 
431
  ## Ethical Considerations
432
+ NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
433
 
434
  Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
435