Update README.md
Browse files
README.md
CHANGED
@@ -17,6 +17,13 @@ language:
|
|
17 |
|
18 |
Infinity-Instruct-3M-0613-Llama3-70B is an opensource supervised instruction tuning model without reinforcement learning from human feedback (RLHF). This model is just finetuned on [Infinity-Instruct-3M and Infinity-Instruct-0613](https://huggingface.co/datasets/BAAI/Infinity-Instruct) and showing favorable results on AlpacaEval 2.0 compared to GPT4-0613.
|
19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
## **Training Details**
|
21 |
|
22 |
<p align="center">
|
@@ -53,7 +60,7 @@ Thanks to [FlagScale](https://github.com/FlagOpen/FlagScale), we could concatena
|
|
53 |
|
54 |
*denote the model is finetuned without reinforcement learning from human feedback (RLHF).
|
55 |
|
56 |
-
We evaluate Infinity-Instruct-3M-0613-Llama3-70B on the two most popular instructions following benchmarks. Mt-Bench is a set of challenging multi-turn questions including code, math and routine dialogue. AlpacaEval2.0 is based on AlpacaFarm evaluation set. Both of these two benchmarks use GPT-4 to judge the model answer. AlpacaEval2.0 displays a high agreement rate with human-annotated benchmark, Chatbot Arena. The result shows that InfInstruct-3M-0613-Llama3-70B achieved 31.2 in AlpacaEval2.0, which is higher than the 30.4 of GPT4-0613 Turbo although it does not yet use RLHF.
|
57 |
|
58 |
## **How to use**
|
59 |
|
|
|
17 |
|
18 |
Infinity-Instruct-3M-0613-Llama3-70B is an opensource supervised instruction tuning model without reinforcement learning from human feedback (RLHF). This model is just finetuned on [Infinity-Instruct-3M and Infinity-Instruct-0613](https://huggingface.co/datasets/BAAI/Infinity-Instruct) and showing favorable results on AlpacaEval 2.0 compared to GPT4-0613.
|
19 |
|
20 |
+
## **News**
|
21 |
+
- π₯π₯π₯[2024/06/28] We release the model weight of [InfInstruct-Llama3-70B 0613](https://huggingface.co/BAAI/Infinity-Instruct-3M-0613-Llama3-70B). It shows favorable results on AlpacaEval 2.0 compared to GPT4-0613 without RLHF.
|
22 |
+
|
23 |
+
- π₯π₯π₯[2024/06/21] We release the model weight of [InfInstruct-Mistral-7B 0613](https://huggingface.co/BAAI/Infinity-Instruct-3M-0613-Mistral-7B). It shows favorable results on AlpacaEval 2.0 compared to Mixtral 8x7B v0.1, Gemini Pro, and GPT-3.5 without RLHF.
|
24 |
+
|
25 |
+
- π₯π₯π₯[2024/06/13] We share the intermediate result of our data construction process (corresponding to the [InfInstruct-3M](https://huggingface.co/datasets/BAAI/Infinity-Instruct) in the table below). Our ongoing efforts focus on risk assessment and data generation. The finalized version with 10 million instructions is scheduled for release in late June.
|
26 |
+
|
27 |
## **Training Details**
|
28 |
|
29 |
<p align="center">
|
|
|
60 |
|
61 |
*denote the model is finetuned without reinforcement learning from human feedback (RLHF).
|
62 |
|
63 |
+
We evaluate Infinity-Instruct-3M-0613-Llama3-70B on the two most popular instructions following benchmarks. Mt-Bench is a set of challenging multi-turn questions including code, math and routine dialogue. AlpacaEval2.0 is based on AlpacaFarm evaluation set. Both of these two benchmarks use GPT-4 to judge the model answer. AlpacaEval2.0 displays a high agreement rate with human-annotated benchmark, Chatbot Arena. The result shows that InfInstruct-3M-0613-Llama3-70B achieved 31.2 in AlpacaEval2.0, which is higher than the 30.4 of GPT4-0613 Turbo although it does not yet use RLHF.
|
64 |
|
65 |
## **How to use**
|
66 |
|