IggoOnCode's picture
Update README.md
7399485 verified
---
# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
# Doc / guide: https://huggingface.co/docs/hub/model-cards
{{ card_data }}
---
# Model Card for mamba-2.8b-slimpj-OpenOrca_1ep
<!-- Provide a quick summary of what the model is/does. -->
This is a finetune of mamba-2.8b-slimpj for instruction following using the OpenOrca dataset.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
This is a finetune of the mamba reference model mamba-2.8b-slimpj from the paper https://arxiv.org/abs/2312.00752
It has been fine-tuned for instruction following using the OpenOrca dataset and training for 1 epoch.
- **Model type:** Mamba State Space Model (mamba_ssm)
- **Finetuned from model:** https://huggingface.co/state-spaces/mamba-2.8b-slimpj
## Uses
This model is intended to evaluate fine-tuning results on mamba models.
## Usage
### Prompt structure
The prompt structure used in fine-tuning is alpaca format:
"### Human:\n%question%\n\n### AI response:\n%response%"
## Training Details
### Training Data
https://huggingface.co/datasets/Open-Orca/OpenOrca
### Training Procedure
Trained using text-generation-webui with code from the mamba_ssm pull request.
#### Training Hyperparameters
- **Training regime:** Trained in bfloat16 with the following parameters:
```
{
"trained_model_name": "mamba-2.8b-slimpj-OpenOrc_1ep",
"save_steps": 500000.0,
"micro_batch_size": 4,
"batch_size": 128,
"epochs": 1.0,
"learning_rate": "3e-4",
"lr_scheduler_type": "linear",
"cutoff_len": 256,
"dataset": "OpenOrca",
"eval_dataset": "None",
"format": "openorca-format",
"warmup_steps": 100.0,
"optimizer": "paged_adamw_8bit",
"hard_cut_string": "\\n\\n\\n",
"add_eos_token": false,
"min_chars": 0.0,
}
```
Reported train_loss was 0.6762700151924311
### Results
#### lm-evaluation-harness results for final model
mamba_ssm (pretrained=mamba-2.8b-slimpj-OpenOrca), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)
| Tasks |Version|Filter|n-shot| Metric | Value | |Stderr|
|--------------|------:|------|-----:|----------|------:|---|-----:|
|arc_challenge | 1|none | 0|acc | 0.2594|± |0.0128|
| | |none | 0|acc_norm | 0.2935|± |0.0133|
|arc_easy | 1|none | 0|acc | 0.4390|± |0.0102|
| | |none | 0|acc_norm | 0.4032|± |0.0101|
|boolq | 2|none | 0|acc | 0.5801|± |0.0086|
|lambada_openai| 1|none | 0|perplexity|27.8582|± |1.1183|
| | |none | 0|acc | 0.3683|± |0.0067|
|openbookqa | 1|none | 0|acc | 0.2500|± |0.0194|
| | |none | 0|acc_norm | 0.3700|± |0.0216|
|piqa | 1|none | 0|acc | 0.6817|± |0.0109|
| | |none | 0|acc_norm | 0.6839|± |0.0108|
|winogrande | 1|none | 0|acc | 0.5770|± |0.0139|
#### lm-evaluation-harness results after half epoch
mamba_ssm (pretrained=mamba-2.8b-slimpj-OpenOrca_1ep-checkpoints/checkpoint-500000), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)
| Tasks |Version|Filter|n-shot| Metric | Value | |Stderr|
|--------------|------:|------|-----:|----------|------:|---|-----:|
|arc_challenge | 1|none | 0|acc | 0.2602|± |0.0128|
| | |none | 0|acc_norm | 0.2833|± |0.0132|
|arc_easy | 1|none | 0|acc | 0.4533|± |0.0102|
| | |none | 0|acc_norm | 0.4125|± |0.0101|
|boolq | 2|none | 0|acc | 0.4095|± |0.0086|
|lambada_openai| 1|none | 0|perplexity|30.4832|± |1.2403|
| | |none | 0|acc | 0.3551|± |0.0067|
|openbookqa | 1|none | 0|acc | 0.2420|± |0.0192|
| | |none | 0|acc_norm | 0.3640|± |0.0215|
|piqa | 1|none | 0|acc | 0.6812|± |0.0109|
| | |none | 0|acc_norm | 0.6730|± |0.0109|
|winogrande | 1|none | 0|acc | 0.5588|± |0.0140|
#### Reference lm-evaluation-harness results for the base model mamba-2.8b-slimpj without fine-tuning
mamba_ssm (pretrained=mamba-2.8b-slimpj), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|--------------|------:|------|-----:|----------|-----:|---|-----:|
|arc_challenge | 1|none | 0|acc |0.3882|± |0.0142|
| | |none | 0|acc_norm |0.4155|± |0.0144|
|arc_easy | 1|none | 0|acc |0.7264|± |0.0091|
| | |none | 0|acc_norm |0.6814|± |0.0096|
|boolq | 2|none | 0|acc |0.7107|± |0.0079|
|lambada_openai| 1|none | 0|perplexity|5.8770|± |0.1881|
| | |none | 0|acc |0.6427|± |0.0067|
|openbookqa | 1|none | 0|acc |0.2860|± |0.0202|
| | |none | 0|acc_norm |0.3980|± |0.0219|
|piqa | 1|none | 0|acc |0.7709|± |0.0098|
| | |none | 0|acc_norm |0.7813|± |0.0096|
|winogrande | 1|none | 0|acc |0.6614|± |0.0133|
#### Summary
The models measured perplexity and accuracy got worse, but it's known that that can be an effect of fine-tuning. Perplexity and accuracy improved in the second half of the training, so it's likely that the inital worsening was caused by forcing a prompt structure onto the base model, which was trained only on unstructured text.
The answer quality as percieved by users is yet to be evaluated.
## Environmental Impact
- **Hardware Type:** RTX 3090
- **Hours used:** 118