First version of the mamba-2.8b-slimpj-OpenOrca_1ep model and tokenizer (copy of EleutherAI/gpt-neox-20b).

Files changed (9) hide show

README.md +134 -1
config.json +1 -0
pytorch_model.bin +3 -0
special_tokens_map.json +18 -0
tokenizer.json +0 -0
tokenizer_config.json +212 -0
training_log.json +13 -0
training_parameters.json +18 -0
training_prompt.json +9 -0

README.md CHANGED Viewed

@@ -1,3 +1,136 @@
 ---
-license: apache-2.0
 ---

 ---
+# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
+# Doc / guide: https://huggingface.co/docs/hub/model-cards
+{{ card_data }}
 ---
+# Model Card for mamba-2.8b-slimpj-OpenOrca_1ep
+<!-- Provide a quick summary of what the model is/does. -->
+This is a finetune of mamba-2.8b-slimpj for instruction following using the OpenOrca dataset.
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+This is a finetune of the mamba reference model mamba-2.8b-slimpj from the paper https://arxiv.org/abs/2312.00752
+It has been fine-tuned for instruction following using the OpenOrca dataset and training for 1 epoch.
+- **Model type:** Mamba State Space Model (mamba_ssm)
+- **Finetuned from model:** https://huggingface.co/state-spaces/mamba-2.8b-slimpj
+## Uses
+This model is intended to evaluate fine-tuning results on mamba models.
+## Training Details
+### Training Data
+https://huggingface.co/datasets/Open-Orca/OpenOrca
+### Training Procedure
+Trained using text-generation-webui with code from the mamba_ssm pull request.
+#### Training Hyperparameters
+- **Training regime:** Trained in bfloat16 with the following parameters:
+```
+{
+  "trained_model_name": "mamba-2.8b-slimpj-OpenOrc_1ep",
+  "save_steps": 500000.0,
+  "micro_batch_size": 4,
+  "batch_size": 128,
+  "epochs": 1.0,
+  "learning_rate": "3e-4",
+  "lr_scheduler_type": "linear",
+  "cutoff_len": 256,
+  "dataset": "OpenOrca",
+  "eval_dataset": "None",
+  "format": "openorca-format",
+  "warmup_steps": 100.0,
+  "optimizer": "paged_adamw_8bit",
+  "hard_cut_string": "\\n\\n\\n",
+  "add_eos_token": false,
+  "min_chars": 0.0,
+}
+```
+Reported train_loss was 0.6762700151924311
+### Results
+#### lm-evaluation-harness results for final model
+mamba_ssm (pretrained=mamba-2.8b-slimpj-OpenOrca), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)
+|    Tasks     |Version|Filter|n-shot|  Metric  | Value |   |Stderr|
+|--------------|------:|------|-----:|----------|------:|---|-----:|
+|arc_challenge |      1|none  |     0|acc       | 0.2594|±  |0.0128|
+|              |       |none  |     0|acc_norm  | 0.2935|±  |0.0133|
+|arc_easy      |      1|none  |     0|acc       | 0.4390|±  |0.0102|
+|              |       |none  |     0|acc_norm  | 0.4032|±  |0.0101|
+|boolq         |      2|none  |     0|acc       | 0.5801|±  |0.0086|
+|lambada_openai|      1|none  |     0|perplexity|27.8582|±  |1.1183|
+|              |       |none  |     0|acc       | 0.3683|±  |0.0067|
+|openbookqa    |      1|none  |     0|acc       | 0.2500|±  |0.0194|
+|              |       |none  |     0|acc_norm  | 0.3700|±  |0.0216|
+|piqa          |      1|none  |     0|acc       | 0.6817|±  |0.0109|
+|              |       |none  |     0|acc_norm  | 0.6839|±  |0.0108|
+|winogrande    |      1|none  |     0|acc       | 0.5770|±  |0.0139|
+#### lm-evaluation-harness results after half epoch
+mamba_ssm (pretrained=mamba-2.8b-slimpj-OpenOrca_1ep-checkpoints/checkpoint-500000), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)
+|    Tasks     |Version|Filter|n-shot|  Metric  | Value |   |Stderr|
+|--------------|------:|------|-----:|----------|------:|---|-----:|
+|arc_challenge |      1|none  |     0|acc       | 0.2602|±  |0.0128|
+|              |       |none  |     0|acc_norm  | 0.2833|±  |0.0132|
+|arc_easy      |      1|none  |     0|acc       | 0.4533|±  |0.0102|
+|              |       |none  |     0|acc_norm  | 0.4125|±  |0.0101|
+|boolq         |      2|none  |     0|acc       | 0.4095|±  |0.0086|
+|lambada_openai|      1|none  |     0|perplexity|30.4832|±  |1.2403|
+|              |       |none  |     0|acc       | 0.3551|±  |0.0067|
+|openbookqa    |      1|none  |     0|acc       | 0.2420|±  |0.0192|
+|              |       |none  |     0|acc_norm  | 0.3640|±  |0.0215|
+|piqa          |      1|none  |     0|acc       | 0.6812|±  |0.0109|
+|              |       |none  |     0|acc_norm  | 0.6730|±  |0.0109|
+|winogrande    |      1|none  |     0|acc       | 0.5588|±  |0.0140|
+#### Reference lm-evaluation-harness results for the base model mamba-2.8b-slimpj without fine-tuning
+mamba_ssm (pretrained=mamba-2.8b-slimpj), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)
+|    Tasks     |Version|Filter|n-shot|  Metric  |Value |   |Stderr|
+|--------------|------:|------|-----:|----------|-----:|---|-----:|
+|arc_challenge |      1|none  |     0|acc       |0.3882|±  |0.0142|
+|              |       |none  |     0|acc_norm  |0.4155|±  |0.0144|
+|arc_easy      |      1|none  |     0|acc       |0.7264|±  |0.0091|
+|              |       |none  |     0|acc_norm  |0.6814|±  |0.0096|
+|boolq         |      2|none  |     0|acc       |0.7107|±  |0.0079|
+|lambada_openai|      1|none  |     0|perplexity|5.8770|±  |0.1881|
+|              |       |none  |     0|acc       |0.6427|±  |0.0067|
+|openbookqa    |      1|none  |     0|acc       |0.2860|±  |0.0202|
+|              |       |none  |     0|acc_norm  |0.3980|±  |0.0219|
+|piqa          |      1|none  |     0|acc       |0.7709|±  |0.0098|
+|              |       |none  |     0|acc_norm  |0.7813|±  |0.0096|
+|winogrande    |      1|none  |     0|acc       |0.6614|±  |0.0133|
+#### Summary
+The models measured perplexity and accuracy got worse, but it's known that that can be an effect of fine-tuning. Perplexity and accuracy improved in the second half of the training, so it's likely that the inital worsening was caused by forcing a prompt structure onto the base model, which was trained only on unstructured text.
+The answer quality as percieved by users is yet to be evaluated.
+## Environmental Impact
+- **Hardware Type:** RTX 3090
+- **Hours used:** 118

config.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"d_model": 2560, "n_layer": 64, "vocab_size": 50277, "ssm_cfg": {}, "rms_norm": true, "residual_in_fp32": true, "fused_add_norm": true, "pad_vocab_size_multiple": 8}

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:630951f04627b75b525ca5fc90d189154f8d971d504cedd140c52de096cbc6c8
+size 5548078554

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,18 @@

+{
+  "bos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": "<|endoftext|>",
+  "pad_token": "<|endoftext|>",
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,212 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<|padding|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50254": {
+      "content": "                        ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50255": {
+      "content": "                       ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50256": {
+      "content": "                      ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50257": {
+      "content": "                     ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50258": {
+      "content": "                    ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50259": {
+      "content": "                   ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50260": {
+      "content": "                  ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50261": {
+      "content": "                 ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50262": {
+      "content": "                ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50263": {
+      "content": "               ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50264": {
+      "content": "              ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50265": {
+      "content": "             ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50266": {
+      "content": "            ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50267": {
+      "content": "           ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50268": {
+      "content": "          ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50269": {
+      "content": "         ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50270": {
+      "content": "        ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50271": {
+      "content": "       ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50272": {
+      "content": "      ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50273": {
+      "content": "     ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50274": {
+      "content": "    ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50275": {
+      "content": "   ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50276": {
+      "content": "  ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "bos_token": "<|endoftext|>",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|endoftext|>",
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<|endoftext|>",
+  "tokenizer_class": "GPTNeoXTokenizer",
+  "unk_token": "<|endoftext|>"
+}

training_log.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+  "base_model_name": "UNTRAINED/mamba-2.8b-slimpj",
+  "base_model_class": "MambaSsmModel",
+  "loss": 0.4871,
+  "learning_rate": 1.814168657212832e-08,
+  "epoch": 1.0,
+  "current_steps": 1058463,
+  "train_runtime": 423405.7021,
+  "train_samples_per_second": 10.0,
+  "train_steps_per_second": 0.078,
+  "total_flos": 0.0,
+  "train_loss": 0.6762700151924311
+}

training_parameters.json ADDED Viewed

	@@ -0,0 +1,18 @@

+{
+  "trained_model_name": "mamba-2.8b-slimpj-OpenOrc_1ep",
+  "save_steps": 500000.0,
+  "micro_batch_size": 4,
+  "batch_size": 128,
+  "epochs": 1.0,
+  "learning_rate": "3e-4",
+  "lr_scheduler_type": "linear",
+  "cutoff_len": 256,
+  "dataset": "OpenOrca",
+  "eval_dataset": "None",
+  "format": "openorca-format",
+  "warmup_steps": 100.0,
+  "optimizer": "paged_adamw_8bit",
+  "hard_cut_string": "\\n\\n\\n",
+  "add_eos_token": false,
+  "min_chars": 0.0,
+}

training_prompt.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "template_type": "dataset",
+  "template_1": "### Human:\n%question%\n\n### AI response:\n%response%",
+  "template_2": "### System instructions:\n%system_prompt%\n\n### Human:\n%question%\n\n### AI response:\n%response%",
+  "template_3": "### Human:\n%question%\n\n### AI response:\n",
+  "template_4": "### System instructions:\n%system_prompt%\n\n### Human:\n%question%\n\n### AI response:\n",
+  "template_5": "### AI response:\n%response%",
+  "template_6": "### System instructions:\n%system_prompt%\n\n### AI response:\n%response%"
+}