File size: 5,794 Bytes
221e30a b44e736 221e30a b44e736 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
---
# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
# Doc / guide: https://huggingface.co/docs/hub/model-cards
{{ card_data }}
---
# Model Card for mamba-2.8b-slimpj-OpenOrca_1ep
<!-- Provide a quick summary of what the model is/does. -->
This is a finetune of mamba-2.8b-slimpj for instruction following using the OpenOrca dataset.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
This is a finetune of the mamba reference model mamba-2.8b-slimpj from the paper https://arxiv.org/abs/2312.00752
It has been fine-tuned for instruction following using the OpenOrca dataset and training for 1 epoch.
- **Model type:** Mamba State Space Model (mamba_ssm)
- **Finetuned from model:** https://huggingface.co/state-spaces/mamba-2.8b-slimpj
## Uses
This model is intended to evaluate fine-tuning results on mamba models.
## Training Details
### Training Data
https://huggingface.co/datasets/Open-Orca/OpenOrca
### Training Procedure
Trained using text-generation-webui with code from the mamba_ssm pull request.
#### Training Hyperparameters
- **Training regime:** Trained in bfloat16 with the following parameters:
```
{
"trained_model_name": "mamba-2.8b-slimpj-OpenOrc_1ep",
"save_steps": 500000.0,
"micro_batch_size": 4,
"batch_size": 128,
"epochs": 1.0,
"learning_rate": "3e-4",
"lr_scheduler_type": "linear",
"cutoff_len": 256,
"dataset": "OpenOrca",
"eval_dataset": "None",
"format": "openorca-format",
"warmup_steps": 100.0,
"optimizer": "paged_adamw_8bit",
"hard_cut_string": "\\n\\n\\n",
"add_eos_token": false,
"min_chars": 0.0,
}
```
Reported train_loss was 0.6762700151924311
### Results
#### lm-evaluation-harness results for final model
mamba_ssm (pretrained=mamba-2.8b-slimpj-OpenOrca), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)
| Tasks |Version|Filter|n-shot| Metric | Value | |Stderr|
|--------------|------:|------|-----:|----------|------:|---|-----:|
|arc_challenge | 1|none | 0|acc | 0.2594|± |0.0128|
| | |none | 0|acc_norm | 0.2935|± |0.0133|
|arc_easy | 1|none | 0|acc | 0.4390|± |0.0102|
| | |none | 0|acc_norm | 0.4032|± |0.0101|
|boolq | 2|none | 0|acc | 0.5801|± |0.0086|
|lambada_openai| 1|none | 0|perplexity|27.8582|± |1.1183|
| | |none | 0|acc | 0.3683|± |0.0067|
|openbookqa | 1|none | 0|acc | 0.2500|± |0.0194|
| | |none | 0|acc_norm | 0.3700|± |0.0216|
|piqa | 1|none | 0|acc | 0.6817|± |0.0109|
| | |none | 0|acc_norm | 0.6839|± |0.0108|
|winogrande | 1|none | 0|acc | 0.5770|± |0.0139|
#### lm-evaluation-harness results after half epoch
mamba_ssm (pretrained=mamba-2.8b-slimpj-OpenOrca_1ep-checkpoints/checkpoint-500000), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)
| Tasks |Version|Filter|n-shot| Metric | Value | |Stderr|
|--------------|------:|------|-----:|----------|------:|---|-----:|
|arc_challenge | 1|none | 0|acc | 0.2602|± |0.0128|
| | |none | 0|acc_norm | 0.2833|± |0.0132|
|arc_easy | 1|none | 0|acc | 0.4533|± |0.0102|
| | |none | 0|acc_norm | 0.4125|± |0.0101|
|boolq | 2|none | 0|acc | 0.4095|± |0.0086|
|lambada_openai| 1|none | 0|perplexity|30.4832|± |1.2403|
| | |none | 0|acc | 0.3551|± |0.0067|
|openbookqa | 1|none | 0|acc | 0.2420|± |0.0192|
| | |none | 0|acc_norm | 0.3640|± |0.0215|
|piqa | 1|none | 0|acc | 0.6812|± |0.0109|
| | |none | 0|acc_norm | 0.6730|± |0.0109|
|winogrande | 1|none | 0|acc | 0.5588|± |0.0140|
#### Reference lm-evaluation-harness results for the base model mamba-2.8b-slimpj without fine-tuning
mamba_ssm (pretrained=mamba-2.8b-slimpj), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|--------------|------:|------|-----:|----------|-----:|---|-----:|
|arc_challenge | 1|none | 0|acc |0.3882|± |0.0142|
| | |none | 0|acc_norm |0.4155|± |0.0144|
|arc_easy | 1|none | 0|acc |0.7264|± |0.0091|
| | |none | 0|acc_norm |0.6814|± |0.0096|
|boolq | 2|none | 0|acc |0.7107|± |0.0079|
|lambada_openai| 1|none | 0|perplexity|5.8770|± |0.1881|
| | |none | 0|acc |0.6427|± |0.0067|
|openbookqa | 1|none | 0|acc |0.2860|± |0.0202|
| | |none | 0|acc_norm |0.3980|± |0.0219|
|piqa | 1|none | 0|acc |0.7709|± |0.0098|
| | |none | 0|acc_norm |0.7813|± |0.0096|
|winogrande | 1|none | 0|acc |0.6614|± |0.0133|
#### Summary
The models measured perplexity and accuracy got worse, but it's known that that can be an effect of fine-tuning. Perplexity and accuracy improved in the second half of the training, so it's likely that the inital worsening was caused by forcing a prompt structure onto the base model, which was trained only on unstructured text.
The answer quality as percieved by users is yet to be evaluated.
## Environmental Impact
- **Hardware Type:** RTX 3090
- **Hours used:** 118
|