This model vs PsyOrca2
@jebcarter have you tried royallab/PsyOrca2-13b-DARE? Was wondering if you had any thoughts on it since it merges the same two models together albeit in a different way (and being 13B instead of 20B like yours).
Howdy there @OrangeApples -
I’ve tried PsyOrca2-13b’s DARE merge - I think it goes in the right direction but my collaborator and I are working on a different recipe that brings in Orca2 a little heavier. The logic boost that comes with a 20B stack won’t be present, of course, but the attentiveness and writing style should be able to be brought in.
I’d rather have a 13B than a 20B since they are much more accessible to run, of course. :)
Overall, I’m excited to see Orca2 propagate out into the merging spaces - it’s good fuel, even if the base writing is as (expectedly) dry as it is.
Thanks for the complete answer, @jebcarter ! 13Bs are definitely more accessible! I noticed that 20Bs tend to use an ungodly amount of kvcache in vram compared to others. Looking forward to trying out your new recipe for a heavier Orca2 merge once it's out. :)
Edit: Some have noticed that some DARE ties models require exactly 4096 context size to use. My above observation may not be a 20B issue but a DARE one.