|
--- |
|
|
|
|
|
{{ card_data }} |
|
--- |
|
|
|
# Model Card for mamba-2.8b-slimpj-OpenOrca_1ep |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
This is a finetune of mamba-2.8b-slimpj for instruction following using the OpenOrca dataset. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
This is a finetune of the mamba reference model mamba-2.8b-slimpj from the paper https://arxiv.org/abs/2312.00752 |
|
|
|
It has been fine-tuned for instruction following using the OpenOrca dataset and training for 1 epoch. |
|
|
|
- **Model type:** Mamba State Space Model (mamba_ssm) |
|
- **Finetuned from model:** https://huggingface.co/state-spaces/mamba-2.8b-slimpj |
|
|
|
|
|
## Uses |
|
|
|
This model is intended to evaluate fine-tuning results on mamba models. |
|
|
|
## Usage |
|
|
|
### Prompt structure |
|
|
|
The prompt structure used in fine-tuning is alpaca format: |
|
|
|
"### Human:\n%question%\n\n### AI response:\n%response%" |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
https://huggingface.co/datasets/Open-Orca/OpenOrca |
|
|
|
### Training Procedure |
|
|
|
Trained using text-generation-webui with code from the mamba_ssm pull request. |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** Trained in bfloat16 with the following parameters: |
|
|
|
``` |
|
{ |
|
"trained_model_name": "mamba-2.8b-slimpj-OpenOrc_1ep", |
|
"save_steps": 500000.0, |
|
"micro_batch_size": 4, |
|
"batch_size": 128, |
|
"epochs": 1.0, |
|
"learning_rate": "3e-4", |
|
"lr_scheduler_type": "linear", |
|
"cutoff_len": 256, |
|
"dataset": "OpenOrca", |
|
"eval_dataset": "None", |
|
"format": "openorca-format", |
|
"warmup_steps": 100.0, |
|
"optimizer": "paged_adamw_8bit", |
|
"hard_cut_string": "\\n\\n\\n", |
|
"add_eos_token": false, |
|
"min_chars": 0.0, |
|
} |
|
|
|
``` |
|
Reported train_loss was 0.6762700151924311 |
|
|
|
### Results |
|
|
|
#### lm-evaluation-harness results for final model |
|
|
|
mamba_ssm (pretrained=mamba-2.8b-slimpj-OpenOrca), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32) |
|
| Tasks |Version|Filter|n-shot| Metric | Value | |Stderr| |
|
|--------------|------:|------|-----:|----------|------:|---|-----:| |
|
|arc_challenge | 1|none | 0|acc | 0.2594|± |0.0128| |
|
| | |none | 0|acc_norm | 0.2935|± |0.0133| |
|
|arc_easy | 1|none | 0|acc | 0.4390|± |0.0102| |
|
| | |none | 0|acc_norm | 0.4032|± |0.0101| |
|
|boolq | 2|none | 0|acc | 0.5801|± |0.0086| |
|
|lambada_openai| 1|none | 0|perplexity|27.8582|± |1.1183| |
|
| | |none | 0|acc | 0.3683|± |0.0067| |
|
|openbookqa | 1|none | 0|acc | 0.2500|± |0.0194| |
|
| | |none | 0|acc_norm | 0.3700|± |0.0216| |
|
|piqa | 1|none | 0|acc | 0.6817|± |0.0109| |
|
| | |none | 0|acc_norm | 0.6839|± |0.0108| |
|
|winogrande | 1|none | 0|acc | 0.5770|± |0.0139| |
|
|
|
#### lm-evaluation-harness results after half epoch |
|
|
|
mamba_ssm (pretrained=mamba-2.8b-slimpj-OpenOrca_1ep-checkpoints/checkpoint-500000), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32) |
|
| Tasks |Version|Filter|n-shot| Metric | Value | |Stderr| |
|
|--------------|------:|------|-----:|----------|------:|---|-----:| |
|
|arc_challenge | 1|none | 0|acc | 0.2602|± |0.0128| |
|
| | |none | 0|acc_norm | 0.2833|± |0.0132| |
|
|arc_easy | 1|none | 0|acc | 0.4533|± |0.0102| |
|
| | |none | 0|acc_norm | 0.4125|± |0.0101| |
|
|boolq | 2|none | 0|acc | 0.4095|± |0.0086| |
|
|lambada_openai| 1|none | 0|perplexity|30.4832|± |1.2403| |
|
| | |none | 0|acc | 0.3551|± |0.0067| |
|
|openbookqa | 1|none | 0|acc | 0.2420|± |0.0192| |
|
| | |none | 0|acc_norm | 0.3640|± |0.0215| |
|
|piqa | 1|none | 0|acc | 0.6812|± |0.0109| |
|
| | |none | 0|acc_norm | 0.6730|± |0.0109| |
|
|winogrande | 1|none | 0|acc | 0.5588|± |0.0140| |
|
|
|
#### Reference lm-evaluation-harness results for the base model mamba-2.8b-slimpj without fine-tuning |
|
|
|
mamba_ssm (pretrained=mamba-2.8b-slimpj), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32) |
|
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr| |
|
|--------------|------:|------|-----:|----------|-----:|---|-----:| |
|
|arc_challenge | 1|none | 0|acc |0.3882|± |0.0142| |
|
| | |none | 0|acc_norm |0.4155|± |0.0144| |
|
|arc_easy | 1|none | 0|acc |0.7264|± |0.0091| |
|
| | |none | 0|acc_norm |0.6814|± |0.0096| |
|
|boolq | 2|none | 0|acc |0.7107|± |0.0079| |
|
|lambada_openai| 1|none | 0|perplexity|5.8770|± |0.1881| |
|
| | |none | 0|acc |0.6427|± |0.0067| |
|
|openbookqa | 1|none | 0|acc |0.2860|± |0.0202| |
|
| | |none | 0|acc_norm |0.3980|± |0.0219| |
|
|piqa | 1|none | 0|acc |0.7709|± |0.0098| |
|
| | |none | 0|acc_norm |0.7813|± |0.0096| |
|
|winogrande | 1|none | 0|acc |0.6614|± |0.0133| |
|
|
|
|
|
|
|
#### Summary |
|
|
|
The models measured perplexity and accuracy got worse, but it's known that that can be an effect of fine-tuning. Perplexity and accuracy improved in the second half of the training, so it's likely that the inital worsening was caused by forcing a prompt structure onto the base model, which was trained only on unstructured text. |
|
|
|
The answer quality as percieved by users is yet to be evaluated. |
|
|
|
## Environmental Impact |
|
|
|
- **Hardware Type:** RTX 3090 |
|
- **Hours used:** 118 |
|
|
|
|