poiuy741741
commited on
Commit
•
e1b372f
1
Parent(s):
6f16f00
Rename README.md to 总感觉有点道理
Browse files
README.md
DELETED
@@ -1,128 +0,0 @@
|
|
1 |
-
---
|
2 |
-
license: other
|
3 |
-
license_name: deepseek
|
4 |
-
license_link: LICENSE
|
5 |
-
pipeline_tag: image-text-to-text
|
6 |
-
---
|
7 |
-
|
8 |
-
## 1. Introduction
|
9 |
-
|
10 |
-
Introducing DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. DeepSeek-VL possesses general multimodal understanding capabilities, capable of processing logical diagrams, web pages, formula recognition, scientific literature, natural images, and embodied intelligence in complex scenarios.
|
11 |
-
|
12 |
-
[DeepSeek-VL: Towards Real-World Vision-Language Understanding](https://arxiv.org/abs/2403.05525)
|
13 |
-
|
14 |
-
[**Github Repository**](https://github.com/deepseek-ai/DeepSeek-VL)
|
15 |
-
|
16 |
-
Haoyu Lu*, Wen Liu*, Bo Zhang**, Bingxuan Wang, Kai Dong, Bo Liu, Jingxiang Sun, Tongzheng Ren, Zhuoshu Li, Hao Yang, Yaofeng Sun, Chengqi Deng, Hanwei Xu, Zhenda Xie, Chong Ruan (*Equal Contribution, **Project Lead)
|
17 |
-
|
18 |
-
![](https://github.com/deepseek-ai/DeepSeek-VL/blob/main/images/sample.jpg)
|
19 |
-
|
20 |
-
|
21 |
-
### 2. Model Summary
|
22 |
-
|
23 |
-
DeepSeek-VL-7b-base uses the [SigLIP-L](https://huggingface.co/timm/ViT-L-16-SigLIP-384) and [SAM-B](https://huggingface.co/facebook/sam-vit-base) as the hybrid vision encoder supporting 1024 x 1024 image input
|
24 |
-
and is constructed based on the DeepSeek-LLM-7b-base which is trained on an approximate corpus of 2T text tokens. The whole DeepSeek-VL-7b-base model is finally trained around 400B vision-language tokens.
|
25 |
-
DeekSeel-VL-7b-chat is an instructed version based on [DeepSeek-VL-7b-base](https://huggingface.co/deepseek-ai/deepseek-vl-7b-base).
|
26 |
-
|
27 |
-
|
28 |
-
## 3. Quick Start
|
29 |
-
|
30 |
-
### Installation
|
31 |
-
|
32 |
-
On the basis of `Python >= 3.8` environment, install the necessary dependencies by running the following command:
|
33 |
-
|
34 |
-
|
35 |
-
```shell
|
36 |
-
git clone https://github.com/deepseek-ai/DeepSeek-VL
|
37 |
-
cd DeepSeek-VL
|
38 |
-
|
39 |
-
pip install -e .
|
40 |
-
```
|
41 |
-
|
42 |
-
### Simple Inference Example
|
43 |
-
|
44 |
-
```python
|
45 |
-
import torch
|
46 |
-
from transformers import AutoModelForCausalLM
|
47 |
-
|
48 |
-
from deepseek_vl.models import VLChatProcessor, MultiModalityCausalLM
|
49 |
-
from deepseek_vl.utils.io import load_pil_images
|
50 |
-
|
51 |
-
|
52 |
-
# specify the path to the model
|
53 |
-
model_path = "deepseek-ai/deepseek-vl-7b-chat"
|
54 |
-
vl_chat_processor: VLChatProcessor = VLChatProcessor.from_pretrained(model_path)
|
55 |
-
tokenizer = vl_chat_processor.tokenizer
|
56 |
-
|
57 |
-
vl_gpt: MultiModalityCausalLM = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
|
58 |
-
vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()
|
59 |
-
|
60 |
-
conversation = [
|
61 |
-
{
|
62 |
-
"role": "User",
|
63 |
-
"content": "<image_placeholder>Describe each stage of this image.",
|
64 |
-
"images": ["./images/training_pipelines.png"]
|
65 |
-
},
|
66 |
-
{
|
67 |
-
"role": "Assistant",
|
68 |
-
"content": ""
|
69 |
-
}
|
70 |
-
]
|
71 |
-
|
72 |
-
# load images and prepare for inputs
|
73 |
-
pil_images = load_pil_images(conversation)
|
74 |
-
prepare_inputs = vl_chat_processor(
|
75 |
-
conversations=conversation,
|
76 |
-
images=pil_images,
|
77 |
-
force_batchify=True
|
78 |
-
).to(vl_gpt.device)
|
79 |
-
|
80 |
-
# run image encoder to get the image embeddings
|
81 |
-
inputs_embeds = vl_gpt.prepare_inputs_embeds(**prepare_inputs)
|
82 |
-
|
83 |
-
# run the model to get the response
|
84 |
-
outputs = vl_gpt.language_model.generate(
|
85 |
-
inputs_embeds=inputs_embeds,
|
86 |
-
attention_mask=prepare_inputs.attention_mask,
|
87 |
-
pad_token_id=tokenizer.eos_token_id,
|
88 |
-
bos_token_id=tokenizer.bos_token_id,
|
89 |
-
eos_token_id=tokenizer.eos_token_id,
|
90 |
-
max_new_tokens=512,
|
91 |
-
do_sample=False,
|
92 |
-
use_cache=True
|
93 |
-
)
|
94 |
-
|
95 |
-
answer = tokenizer.decode(outputs[0].cpu().tolist(), skip_special_tokens=True)
|
96 |
-
print(f"{prepare_inputs['sft_format'][0]}", answer)
|
97 |
-
```
|
98 |
-
|
99 |
-
### CLI Chat
|
100 |
-
```bash
|
101 |
-
|
102 |
-
python cli_chat.py --model_path "deepseek-ai/deepseek-vl-7b-chat"
|
103 |
-
|
104 |
-
# or local path
|
105 |
-
python cli_chat.py --model_path "local model path"
|
106 |
-
|
107 |
-
```
|
108 |
-
|
109 |
-
## 4. License
|
110 |
-
|
111 |
-
This code repository is licensed under [the MIT License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-CODE). The use of DeepSeek-VL Base/Chat models is subject to [DeepSeek Model License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-MODEL). DeepSeek-VL series (including Base and Chat) supports commercial use.
|
112 |
-
|
113 |
-
## 5. Citation
|
114 |
-
|
115 |
-
```
|
116 |
-
@misc{lu2024deepseekvl,
|
117 |
-
title={DeepSeek-VL: Towards Real-World Vision-Language Understanding},
|
118 |
-
author={Haoyu Lu and Wen Liu and Bo Zhang and Bingxuan Wang and Kai Dong and Bo Liu and Jingxiang Sun and Tongzheng Ren and Zhuoshu Li and Yaofeng Sun and Chengqi Deng and Hanwei Xu and Zhenda Xie and Chong Ruan},
|
119 |
-
year={2024},
|
120 |
-
eprint={2403.05525},
|
121 |
-
archivePrefix={arXiv},
|
122 |
-
primaryClass={cs.AI}
|
123 |
-
}
|
124 |
-
```
|
125 |
-
|
126 |
-
## 6. Contact
|
127 |
-
|
128 |
-
If you have any questions, please raise an issue or contact us at [service@deepseek.com](mailto:service@deepseek.com).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
总感觉有点道理
ADDED
@@ -0,0 +1,136 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
许可证: 其他
|
3 |
+
许可证_名称: Deepseek
|
4 |
+
许可证_链接: 许可证
|
5 |
+
管道_标签: 图像文本到文本
|
6 |
+
license: apache-2.0
|
7 |
+
datasets:
|
8 |
+
- HuggingFaceTB/cosmopedia
|
9 |
+
language:
|
10 |
+
- ar
|
11 |
+
metrics:
|
12 |
+
- accuracy
|
13 |
+
library_name: asteroid
|
14 |
+
---
|
15 |
+
|
16 |
+
## 1. 导言
|
17 |
+
|
18 |
+
介绍 DeepSeek-VL ,一种用于实际世界视觉和语言理解应用程序的开源视觉语言( VL )模型。 DeepSeek-VL 具有一般的多模式理解能力,能够处理逻辑图、网页、公式识别、科学文献、自然图像,并在复杂场景中体现智力。
|
19 |
+
|
20 |
+
[DeepSeek-VL:实现真正的世界愿景语言理解](https://arxiv.org/abs/2403.05525)
|
21 |
+
|
22 |
+
[**Github存储库**](https://github.com/deepseek-ai/DeepSeek-VL)
|
23 |
+
|
24 |
+
浩宇路*、文刘*、波张**、王炳轩、开东、波刘、景祥孙、同正人、卓树李、浩阳、姚峰孙、程琦邓、韩伟徐、振大谢、崇阮(*等贡献,**项目领先)
|
25 |
+
|
26 |
+
!](https://github.com/deepseek-ai/DeepSeek-VL/blob/main/images/sample.jpg)
|
27 |
+
|
28 |
+
|
29 |
+
### 2. 示范摘要
|
30 |
+
|
31 |
+
DeepSeek-VL-7b [Siglip L](https://huggingface.co/timm/ViT-L-16-SigLIP-384) 和 [SAM-B](https://huggingface.co/facebook/sam-vit-base) 作为支持1024x1024图像输入的混合视觉编码器
|
32 |
+
并且基于 DeePSeek-LLM-7b 基结构,该基体在 2 T 文本令牌的近似语料库上进行训练。 整个 DeepSeek-VL-7b 基模型最终在 400 B 视觉语言令牌上进行了训练。
|
33 |
+
DeekSeel-VL-7bzä [DeepSeek-VL-7b](https://huggingface.co/deepseek-ai/deepseek-vl-7b-base).
|
34 |
+
|
35 |
+
|
36 |
+
## 3. 快速开始
|
37 |
+
|
38 |
+
### 安装
|
39 |
+
|
40 |
+
根据 `Python > = 3.8` 环境,通过运行以下命令安装必要的依赖项:
|
41 |
+
|
42 |
+
|
43 |
+
```贝壳
|
44 |
+
GIT 克隆 https://github.com/deepseek-ai/DeepSeek-VL
|
45 |
+
CD DeepSeek-VL
|
46 |
+
|
47 |
+
PIP安装-E。
|
48 |
+
```
|
49 |
+
|
50 |
+
### 简单推理示例
|
51 |
+
|
52 |
+
```蟒蛇
|
53 |
+
进口火炬
|
54 |
+
汽车模型
|
55 |
+
|
56 |
+
Tepsek_vl
|
57 |
+
从 deepseek_vl.utils.io 导入负载_pil_ages
|
58 |
+
|
59 |
+
|
60 |
+
# 指定通往模型的路径
|
61 |
+
Model_path = "Deepseek-I/Deepseek-VL-7b-chat"
|
62 |
+
VL_chat_croptor: VLCHATProcessor = VLCHATProcessor.fropreed (Model_path)
|
63 |
+
托格纳
|
64 |
+
|
65 |
+
VL_gpt: 多模态 CausalLM = AutoModelForCausalLM.ropreed (Model_path 、Trust_remote_code=True)
|
66 |
+
VL_GPT = VL_gpt.to (Torch.bfloat16).cuda (.eeval ()
|
67 |
+
|
68 |
+
对话=[
|
69 |
+
{
|
70 |
+
“角色”:“用户”,
|
71 |
+
“内容”:“《形象_占位符>描述这个图像的每个阶段。”
|
72 |
+
"图像":["./images/training_pipelines.png"
|
73 |
+
},
|
74 |
+
{
|
75 |
+
“角色”:“助理”,
|
76 |
+
"内容":"
|
77 |
+
}
|
78 |
+
]
|
79 |
+
|
80 |
+
# 加载图像并准备输入
|
81 |
+
Pil_mages = load_pil_mages (iïg)
|
82 |
+
▲照片=美联社、NEWSIS
|
83 |
+
对话
|
84 |
+
▲照片=Pilmages
|
85 |
+
力量_batchify=True
|
86 |
+
)to(vl_gpt.device)
|
87 |
+
|
88 |
+
# 运行图像编码器以获取图像嵌入
|
89 |
+
输入_embeds =vl_gpt.prepare_inputs_embeds (**prepare_inputs)
|
90 |
+
|
91 |
+
# 运行模型以获取响应
|
92 |
+
产出=vl_gpt.language_model.generate(
|
93 |
+
输入 embeds=inputs_embeds,
|
94 |
+
注意_mask=prepare_inputs.注意_ mask,
|
95 |
+
dad_token_id= tokener.eos_token_id,
|
96 |
+
博斯_token_id=Tokenizer.bos_token_id,
|
97 |
+
eos_token_id=Tokenizer.eos_token_id,
|
98 |
+
Max_new_tokens=512
|
99 |
+
Do_sample=False
|
100 |
+
使用_cache=True
|
101 |
+
)
|
102 |
+
|
103 |
+
回答 = tokenizer.decode (输出 [0].cpu (.tolist (), skep_cpecial_tokens=True)
|
104 |
+
打印(f”{prepre_inputs[freft_format'][0]},回答)
|
105 |
+
```
|
106 |
+
|
107 |
+
### CLI Chat
|
108 |
+
```巴什
|
109 |
+
|
110 |
+
Python cli_chat.py - model_path "deepseek-i/deepseek-VL-7b-chat"
|
111 |
+
|
112 |
+
#或本地路径
|
113 |
+
Python Cli_chat.py - Model_path “局部模型路径”
|
114 |
+
|
115 |
+
```
|
116 |
+
|
117 |
+
## 4. 许可证
|
118 |
+
|
119 |
+
此代码存储库在 [MIT 许可证](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-CODE)1. DeepSeek-VL基地/Chat模型的使用须视 [DeepSeek Model License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-MODEL)DeepSeek-VL系列(包括Base和Cat)支持商业使用。
|
120 |
+
|
121 |
+
## 5. 引文
|
122 |
+
|
123 |
+
```
|
124 |
+
@misc {卢2024 深层
|
125 |
+
标题 = {深色 VL: 向真正的世界视觉语言理解},
|
126 |
+
作者 = {浩宇路、文刘、鲍章、王炳轩、鲍刘、孙景祥、李俊俊、李耀淑、孙耀凤、邓承琦、韩伟、徐振大、蔡瑞}},
|
127 |
+
年 = {2024},
|
128 |
+
Eprint={2403.05525},
|
129 |
+
存档Prefix={arxiv},
|
130 |
+
初级类 = {cs.AI}
|
131 |
+
}
|
132 |
+
```
|
133 |
+
|
134 |
+
## 6. 联系
|
135 |
+
|
136 |
+
如有疑问,请提出问题或在 [service@deepseek.com](mailto:service@deepseek.com).
|