Model Card for Model ID

LoRA fine-tuned version of mistralai/Mixtral-8x7B-Instruct-v0.1 targeting all the modules.

Training Hyperparameters

Training regime:

quantization_config = transformers.BitsAndBytesConfig(load_in_4bit=True)

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mixtral-8x7B-Instruct-v0.1", truncation=True, padding=True, padding_side="right")
model = AutoModelForCausalLM.from_pretrained("mistralai/Mixtral-8x7B-Instruct-v0.1", quantization_config=quantization_config)
tokenizer.add_special_tokens({'pad_token': '[PAD]'})

model = prepare_model_for_kbit_training(model)

config = LoraConfig(r = 4, 
                    lora_alpha=4, 
                    target_modules = ["gate", "q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj"], 
                    lora_dropout=0.1
                    )

lora_model = get_peft_model(model, config)

lora_model.print_trainable_parameters()

dataset = load_dataset("Na0s/sft-ready-Text-Generation-Augmented-Data", split="train")

trainer = SFTTrainer(
    model = lora_model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    packing = True,
    args = TrainingArguments(
        per_device_train_batch_size = 1,
        gradient_accumulation_steps = 16,
        group_by_length = True,
        warmup_steps = 5,
        bf16 = True,
        max_steps=10000,
        learning_rate = 2e-4,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "cosine",
        seed = 3407,
        eval_strategy="no",
        do_eval=False,
        output_dir = "./outputs",
        push_to_hub=True,
        remove_unused_columns=False,
    )
)

torch.cuda.empty_cache()

trainer.train()

Metrics and results:

Upcoming.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Technical Specifications

Model Architecture and Objective

The objective of the fine-tuning of this MoE based transformer is to implement the expert pruning detailed in the following paper: A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts

Na0s
/

Mixtral-8x7B-Instruct-v0.1-exhaustive-LoRA

Model Card for Model ID

Training Hyperparameters

Metrics and results:

Environmental Impact

Technical Specifications

Model Architecture and Objective

Model tree for Na0s/Mixtral-8x7B-Instruct-v0.1-exhaustive-LoRA

Dataset used to train Na0s/Mixtral-8x7B-Instruct-v0.1-exhaustive-LoRA

Collection including Na0s/Mixtral-8x7B-Instruct-v0.1-exhaustive-LoRA

Pruned MoEs (Mixtral-8x7B-Instruct-v0.1)