add training recipe
#1
by
qanthony
- opened
README.md
CHANGED
@@ -5,7 +5,10 @@ license: apache-2.0
|
|
5 |
|
6 |
# Model Card for Zamba2-1.2B
|
7 |
|
8 |
-
Zamba2-1.2B-instruct is obtained from Zamba2-1.2B by fine-tuning on instruction-following and chat datasets.
|
|
|
|
|
|
|
9 |
|
10 |
Zamba2-1.2B-Instruct is a hybrid model composed of state-space ([Mamba2](https://github.com/state-spaces/mamba)) and transformer blocks. It is based on the [Zamba2-1.2B](https://huggingface.co/Zyphra/Zamba2-1.2B) architecture.
|
11 |
|
|
|
5 |
|
6 |
# Model Card for Zamba2-1.2B
|
7 |
|
8 |
+
Zamba2-1.2B-instruct is obtained from Zamba2-1.2B by fine-tuning on instruction-following and chat datasets. Specifically:
|
9 |
+
|
10 |
+
1. SFT of the base [Zamba2-1.2B](https://huggingface.co/Zyphra/Zamba2-1.2B) model on [ultrachat_200k](HuggingFaceH4/ultrachat_200k) and [Infinity-Instruct](https://huggingface.co/datasets/BAAI/Infinity-Instruct)
|
11 |
+
2. DPO of the SFT checkpoint on [ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized), [orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs), and [OpenHermesPreferences](https://huggingface.co/datasets/argilla/OpenHermesPreferences)
|
12 |
|
13 |
Zamba2-1.2B-Instruct is a hybrid model composed of state-space ([Mamba2](https://github.com/state-spaces/mamba)) and transformer blocks. It is based on the [Zamba2-1.2B](https://huggingface.co/Zyphra/Zamba2-1.2B) architecture.
|
14 |
|