chen.zheng1 commited on
Commit
e3d3c97
1 Parent(s): e65434e

update files

Browse files
Files changed (1) hide show
  1. README.md +14 -7
README.md CHANGED
@@ -6,18 +6,19 @@ license: mit
6
 
7
  ## Model Details
8
 
9
- Mistral-C2F is a chat assistant trained by Reinforcement Learning from Human Feedback (RLHF) using the Mistral-7B base model as the backbone.
10
 
11
- - Mistral-C2F adopts an innovative approach by two-step **Coarse to Fine** Analytical and Reasoning Enhancement LLM approach.
12
- - The goal of **Coarse to Fine** LLM is to transform a base LLM, which currently lacks integrated analytical and reasoning capabilities, into a RLHF LLM that aligns with human preferences and significantly enhances its analytical and reasoning abilities.
13
  - First step: **Coarse Actor** Analytical and Reasoning LLM.
14
- - we introduce the concept of "Continuous Maximization" (CM) in the direct application of RLHF to the base model.
15
- - the Coarse Actor cannot be used directly for responses or answers because serves as a knowledge-rich ``pool'', designed to enhance analytical and reasoning abilities. Extending the output to its length limits often results in excessive redundant information, as the model continues generating similar text without adequate termination.
16
  - Second step: **Fine Actor** Knowledge Refining LLM.
17
- - After the output from the Coarse Actor is generated, it is merged with existing Instruction model through new strategy 'Knowledge Residue Merger'.
18
- - Knowledge Residue Merger allows the Coarse Actor to integrate its detailed analytical reasoning into the existing LLM model.
19
  - The Coarse-to-Fine actor approach retains the inherent advantages of the existing model while significantly enhancing its analytical and reasoning capabilities in dialogue.
20
 
 
21
 
22
  - Mistral-C2F uses the [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) model as its backbone.
23
  - License: Mistral-C2F is licensed under the same license as the [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) model.
@@ -57,6 +58,11 @@ More importantly, Mistral-C2F is publicly available through HuggingFace for prom
57
 
58
  ## Case Study
59
 
 
 
 
 
 
60
  ### Case Study for Coarse Actor Generation.
61
 
62
  ![coarse](images/coarse.png)
@@ -65,4 +71,5 @@ More importantly, Mistral-C2F is publicly available through HuggingFace for prom
65
 
66
  ![fine1](images/fine1.png)
67
 
 
68
  ![fine2](images/fine2.png)
 
6
 
7
  ## Model Details
8
 
9
+ **Mistral-C2F** is a chat assistant trained by Reinforcement Learning from Human Feedback (RLHF) using the Mistral-7B base model as the backbone.
10
 
11
+ - Mistral-C2F adopts an innovative approach with a two-step **Coarse to Fine** Analytical and Reasoning Enhancement LLM approach.
12
+ - The goal of the **Coarse to Fine** LLM is to transform a base LLM, which currently lacks integrated analytical and reasoning capabilities, into an RLHF LLM that aligns with human preferences and significantly enhances its analytical and reasoning abilities.
13
  - First step: **Coarse Actor** Analytical and Reasoning LLM.
14
+ - We introduce the concept of "Continuous Maximization" (CM) in the direct application of RLHF to the base model.
15
+ - The Coarse Actor cannot be used directly for responses or answers because it serves as a knowledge-rich "pool," designed to enhance analytical and reasoning abilities. Extending the output to its length limits often results in excessive redundant information, as the model continues generating similar text without adequate termination.
16
  - Second step: **Fine Actor** Knowledge Refining LLM.
17
+ - After the output from the Coarse Actor is generated, it is merged with the existing Instruction model through a new strategy called 'Knowledge Residue Merger'.
18
+ - The Knowledge Residue Merger allows the Coarse Actor to integrate its detailed analytical reasoning into the existing LLM model.
19
  - The Coarse-to-Fine actor approach retains the inherent advantages of the existing model while significantly enhancing its analytical and reasoning capabilities in dialogue.
20
 
21
+
22
 
23
  - Mistral-C2F uses the [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) model as its backbone.
24
  - License: Mistral-C2F is licensed under the same license as the [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) model.
 
58
 
59
  ## Case Study
60
 
61
+ ## Case Study in General Dialogue.
62
+
63
+ ![case](images/case.png)
64
+
65
+
66
  ### Case Study for Coarse Actor Generation.
67
 
68
  ![coarse](images/coarse.png)
 
71
 
72
  ![fine1](images/fine1.png)
73
 
74
+
75
  ![fine2](images/fine2.png)