about data
#1
by
timelogger
- opened
Can I get information about the data used for fine-tuning? Is that data open source?
In the instruction tuning, we used the Open-Orca/SlimOrca dataset after applying dedup and sampling. Similarly, in the DPO tuning, we used the Intel/orca_dpo_pairs dataset after applying dedup and sampling.
Then, did you not use a Korean dataset for this LDCC-SOLAR-10.7B?
During the instruction tuning phase, we utilized data that had been translated. However, for the DPO tuning, we used the data in its original, untranslated form.
Thanks a lot :)
timelogger
changed discussion status to
closed