File size: 5,794 Bytes
221e30a
b44e736
 
 
221e30a
b44e736
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
# Doc / guide: https://huggingface.co/docs/hub/model-cards
{{ card_data }}
---

# Model Card for mamba-2.8b-slimpj-OpenOrca_1ep

<!-- Provide a quick summary of what the model is/does. -->

This is a finetune of mamba-2.8b-slimpj for instruction following using the OpenOrca dataset.

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->
This is a finetune of the mamba reference model mamba-2.8b-slimpj from the paper https://arxiv.org/abs/2312.00752

It has been fine-tuned for instruction following using the OpenOrca dataset and training for 1 epoch.

- **Model type:** Mamba State Space Model (mamba_ssm)
- **Finetuned from model:** https://huggingface.co/state-spaces/mamba-2.8b-slimpj


## Uses

This model is intended to evaluate fine-tuning results on mamba models.

## Training Details

### Training Data

https://huggingface.co/datasets/Open-Orca/OpenOrca

### Training Procedure

Trained using text-generation-webui with code from the mamba_ssm pull request.


#### Training Hyperparameters

- **Training regime:** Trained in bfloat16 with the following parameters:

```
{
  "trained_model_name": "mamba-2.8b-slimpj-OpenOrc_1ep",
  "save_steps": 500000.0,
  "micro_batch_size": 4,
  "batch_size": 128,
  "epochs": 1.0,
  "learning_rate": "3e-4",
  "lr_scheduler_type": "linear",
  "cutoff_len": 256,
  "dataset": "OpenOrca",
  "eval_dataset": "None",
  "format": "openorca-format",
  "warmup_steps": 100.0,
  "optimizer": "paged_adamw_8bit",
  "hard_cut_string": "\\n\\n\\n",
  "add_eos_token": false,
  "min_chars": 0.0,
}

```
Reported train_loss was 0.6762700151924311

### Results

#### lm-evaluation-harness results for final model

mamba_ssm (pretrained=mamba-2.8b-slimpj-OpenOrca), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)
|    Tasks     |Version|Filter|n-shot|  Metric  | Value |   |Stderr|
|--------------|------:|------|-----:|----------|------:|---|-----:|
|arc_challenge |      1|none  |     0|acc       | 0.2594|±  |0.0128|
|              |       |none  |     0|acc_norm  | 0.2935|±  |0.0133|
|arc_easy      |      1|none  |     0|acc       | 0.4390|±  |0.0102|
|              |       |none  |     0|acc_norm  | 0.4032|±  |0.0101|
|boolq         |      2|none  |     0|acc       | 0.5801|±  |0.0086|
|lambada_openai|      1|none  |     0|perplexity|27.8582|±  |1.1183|
|              |       |none  |     0|acc       | 0.3683|±  |0.0067|
|openbookqa    |      1|none  |     0|acc       | 0.2500|±  |0.0194|
|              |       |none  |     0|acc_norm  | 0.3700|±  |0.0216|
|piqa          |      1|none  |     0|acc       | 0.6817|±  |0.0109|
|              |       |none  |     0|acc_norm  | 0.6839|±  |0.0108|
|winogrande    |      1|none  |     0|acc       | 0.5770|±  |0.0139|

#### lm-evaluation-harness results after half epoch

mamba_ssm (pretrained=mamba-2.8b-slimpj-OpenOrca_1ep-checkpoints/checkpoint-500000), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)
|    Tasks     |Version|Filter|n-shot|  Metric  | Value |   |Stderr|
|--------------|------:|------|-----:|----------|------:|---|-----:|
|arc_challenge |      1|none  |     0|acc       | 0.2602|±  |0.0128|
|              |       |none  |     0|acc_norm  | 0.2833|±  |0.0132|
|arc_easy      |      1|none  |     0|acc       | 0.4533|±  |0.0102|
|              |       |none  |     0|acc_norm  | 0.4125|±  |0.0101|
|boolq         |      2|none  |     0|acc       | 0.4095|±  |0.0086|
|lambada_openai|      1|none  |     0|perplexity|30.4832|±  |1.2403|
|              |       |none  |     0|acc       | 0.3551|±  |0.0067|
|openbookqa    |      1|none  |     0|acc       | 0.2420|±  |0.0192|
|              |       |none  |     0|acc_norm  | 0.3640|±  |0.0215|
|piqa          |      1|none  |     0|acc       | 0.6812|±  |0.0109|
|              |       |none  |     0|acc_norm  | 0.6730|±  |0.0109|
|winogrande    |      1|none  |     0|acc       | 0.5588|±  |0.0140|

#### Reference lm-evaluation-harness results for the base model mamba-2.8b-slimpj without fine-tuning

mamba_ssm (pretrained=mamba-2.8b-slimpj), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)
|    Tasks     |Version|Filter|n-shot|  Metric  |Value |   |Stderr|
|--------------|------:|------|-----:|----------|-----:|---|-----:|
|arc_challenge |      1|none  |     0|acc       |0.3882|±  |0.0142|
|              |       |none  |     0|acc_norm  |0.4155|±  |0.0144|
|arc_easy      |      1|none  |     0|acc       |0.7264|±  |0.0091|
|              |       |none  |     0|acc_norm  |0.6814|±  |0.0096|
|boolq         |      2|none  |     0|acc       |0.7107|±  |0.0079|
|lambada_openai|      1|none  |     0|perplexity|5.8770|±  |0.1881|
|              |       |none  |     0|acc       |0.6427|±  |0.0067|
|openbookqa    |      1|none  |     0|acc       |0.2860|±  |0.0202|
|              |       |none  |     0|acc_norm  |0.3980|±  |0.0219|
|piqa          |      1|none  |     0|acc       |0.7709|±  |0.0098|
|              |       |none  |     0|acc_norm  |0.7813|±  |0.0096|
|winogrande    |      1|none  |     0|acc       |0.6614|±  |0.0133|



#### Summary

The models measured perplexity and accuracy got worse, but it's known that that can be an effect of fine-tuning. Perplexity and accuracy improved in the second half of the training, so it's likely that the inital worsening was caused by forcing a prompt structure onto the base model, which was trained only on unstructured text.

The answer quality as percieved by users is yet to be evaluated.

## Environmental Impact

- **Hardware Type:** RTX 3090
- **Hours used:** 118