wzk1015 commited on
Commit
cd529ff
1 Parent(s): a4c0f7d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -12,6 +12,7 @@ tags:
12
  - vision
13
  - ocr
14
  - custom_code
 
15
  ---
16
 
17
  # Mono-InternVL-2B
@@ -30,7 +31,7 @@ tags:
30
 
31
  ## Introduction
32
 
33
- We release Mono-InternVL, a **monolithic** multimodal large language model (MLLM) that integrates visual encoding and textual decoding into a single LLM. In Mono-InternVL, a set of visual experts is embedded into the pre-trained LLM via a mixture-of-experts mechanism. By freezing the LLM, Mono-InternVL ensures that visual capabilities are optimized without compromising the pre-trained language knowledge. Based on this structure, an innovative Endogenous Visual Pretraining (EViP) is introduced to realize coarse-to-fine visual learning.
34
 
35
 
36
 
@@ -38,7 +39,7 @@ Mono-InternVL achieves superior performance compared to state-of-the-art MLLM Mi
38
 
39
 
40
 
41
- This repository contains the instruction-tuned Mono-InternVL-2B model. It is built upon [internlm2-chat-1_8b](https://huggingface.co/internlm/internlm2-chat-1_8b). For more details, please refer to our [paper](https://arxiv.org/abs/2410.08202).
42
 
43
 
44
 
 
12
  - vision
13
  - ocr
14
  - custom_code
15
+ - moe
16
  ---
17
 
18
  # Mono-InternVL-2B
 
31
 
32
  ## Introduction
33
 
34
+ We release Mono-InternVL, a **monolithic** multimodal large language model (MLLM) that integrates visual encoding and textual decoding into a single LLM. In Mono-InternVL, a set of visual experts is embedded into the pre-trained LLM via a mixture-of-experts (MoE) mechanism. By freezing the LLM, Mono-InternVL ensures that visual capabilities are optimized without compromising the pre-trained language knowledge. Based on this structure, an innovative Endogenous Visual Pretraining (EViP) is introduced to realize coarse-to-fine visual learning.
35
 
36
 
37
 
 
39
 
40
 
41
 
42
+ This repository contains the instruction-tuned Mono-InternVL-2B model, which has 1.8B activated parameters (3B in total). It is built upon [internlm2-chat-1_8b](https://huggingface.co/internlm/internlm2-chat-1_8b). For more details, please refer to our [paper](https://arxiv.org/abs/2410.08202).
43
 
44
 
45