Llama 3.1 Instruct, continually pretrained with a full epoch (1169 steps @ total batch 115) of the same 1.5gb private dataset that underpins Iambe

Instruction is broken, needs to be reSFT'd


Why do this? I have a niche use case where I cannot increase compute over 8b, and L3/3.1 are the only models in this size category that meet my needs for logic. However, both versions of L3/3.1 have the damn repetition/token overconfidence problem, and this is meant to disrupt that certainty without disrupting the model's ability to function.

By the way, I think it's the lm_head that is causing the looping, but it might be the embeddings being too separated. I'm not going to pay two more times to test them separately, however :p

Downloads last month
8
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for athirdpath/Llama-3.1-Instruct_NSFW-pretrained_e1

Finetunes
1 model