Training code
#2
by
kristaller486
- opened
Are you planning on posting the code for pretrain or fine-tuning?
@kristaller486 it's just frankenstein of llama2 and mistral, which was further trained after mix. You don't need anything special for fine-tuning.
hunkim
changed discussion status to
closed
@wcde First of all there is no standard frankenstein. You could combine layers in various ways, optionally use lower triangular matrix, do some more advanced maths etc.
Second can you expand further trained. Trained on what data? For how many tokens?