Reiterate3680/Aura-NeMo-12B-GGUF · LoRA creator needs help...

Aug 17

Hello! I am the lora author, but i basically gave up on it because mergekit could not merge it into the base model. What method did you use to merge the lora? If possible, can you provide F16 or F32 GGUF so I can quant it into 4_0_4x8 i8mm ARM optimized model myself? If you cannot provide file, please instruct on best method to merge the lora. Thank you!

Reiterate3680

Owner Aug 17

•

edited Aug 17

Sure, I've used this code on a T4 instance on Colab:

from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "jeiku/Aura-NeMo-12B",
    max_seq_length = 8192,
    dtype = None,
    load_in_4bit = True,
)
model.push_to_hub_merged("Reiterate3680/jeiku-Aura-NeMo-12B-merged", tokenizer, save_method = "merged_16bit", token="redacted", private=True)

You shouldn't need a GPU to merge but unsloth doesn't work without it and I don't really want to dig up why.

I unprivated the merge repo here - https://huggingface.co/Reiterate3680/jeiku-Aura-NeMo-12B-merged

I can make a F16 GGUF again if you need but I already deleted them off my drive since they take a lot of space lol

jeiku

Aug 17

Thank you so much for unprivating the merge repo, I can download, convert and quant myself. Thank you so much, this is exactly what I've been looking for!

jeiku changed discussion status to closed Aug 17