upload model

Browse files

Files changed (13) hide show

README.md +175 -0
all_results.json +33 -0
config.json +35 -0
eval_results.json +15 -0
merges.txt +0 -0
pytorch_model.bin +3 -0
special_tokens_map.json +23 -0
tokenizer.json +0 -0
tokenizer_config.json +33 -0
train_results.json +21 -0
trainer_state.json +0 -0
training_args.bin +3 -0
vocab.json +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,175 @@

+---
+tags:
+- generated_from_trainer
+datasets:
+- roneneldan/TinyStories
+metrics:
+- accuracy
+model-index:
+- name: output_main
+  results:
+  - task:
+      name: Causal Language Modeling
+      type: text-generation
+    dataset:
+      name: roneneldan/TinyStories
+      type: roneneldan/TinyStories
+    metrics:
+    - name: Accuracy
+      type: accuracy
+      value: 0.5791389432485323
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# output_main
+This model is a fine-tuned version of [roneneldan/TinyStories-1Layer-21M](https://huggingface.co/roneneldan/TinyStories-1Layer-21M) on the roneneldan/TinyStories dataset.
+It achieves the following results on the evaluation set:
+- Loss: 1.6604
+- Accuracy: 0.5791
+- Multicode K: 1
+- Dead Code Fraction/layer0: 0.1982
+- Mse/layer0: 6073.8637
+- Input Norm/layer0: 0.7182
+- Output Norm/layer0: 76.7891
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0005
+- train_batch_size: 96
+- eval_batch_size: 64
+- seed: 42
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_ratio: 0.05
+- training_steps: 100000
+### Training results
+| Training Loss | Epoch | Step   | Validation Loss | Accuracy | Multicode K | Dead Code Fraction/layer0 | Mse/layer0 | Input Norm/layer0 | Output Norm/layer0 |
+|:-------------:|:-----:|:------:|:---------------:|:--------:|:-----------:|:-------------------------:|:----------:|:-----------------:|:------------------:|
+| 2.2319        | 0.1   | 1000   | 1.9134          | 0.5317   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.8521        | 0.21  | 2000   | 1.7990          | 0.5495   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.7879        | 0.31  | 3000   | 1.7739          | 0.5557   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.7728        | 0.42  | 4000   | 1.7666          | 0.5564   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.7686        | 0.52  | 5000   | 1.7609          | 0.5595   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.7635        | 0.63  | 6000   | 1.7555          | 0.5598   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.7523        | 0.73  | 7000   | 1.7383          | 0.5632   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.7471        | 0.83  | 8000   | 1.7368          | 0.5643   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.7404        | 0.94  | 9000   | 1.7277          | 0.5659   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.728         | 1.04  | 10000  | 1.7290          | 0.5647   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.7195        | 1.15  | 11000  | 1.7244          | 0.5667   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.7198        | 1.25  | 12000  | 1.7230          | 0.5671   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.7171        | 1.36  | 13000  | 1.7177          | 0.5689   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.7185        | 1.46  | 14000  | 1.7150          | 0.5688   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.7149        | 1.56  | 15000  | 1.7125          | 0.5695   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.7105        | 1.67  | 16000  | 1.7097          | 0.5695   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.7107        | 1.77  | 17000  | 1.7073          | 0.5689   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.7113        | 1.88  | 18000  | 1.7025          | 0.5712   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.7078        | 1.98  | 19000  | 1.7048          | 0.5702   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.693         | 2.09  | 20000  | 1.7045          | 0.5696   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6935        | 2.19  | 21000  | 1.7068          | 0.5695   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6962        | 2.29  | 22000  | 1.7046          | 0.5687   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6954        | 2.4   | 23000  | 1.7019          | 0.5706   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6933        | 2.5   | 24000  | 1.7002          | 0.5725   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6942        | 2.61  | 25000  | 1.6983          | 0.5717   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6935        | 2.71  | 26000  | 1.6938          | 0.5730   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6928        | 2.82  | 27000  | 1.6978          | 0.5719   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6927        | 2.92  | 28000  | 1.6935          | 0.5715   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6855        | 3.02  | 29000  | 1.6978          | 0.5726   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6773        | 3.13  | 30000  | 1.6951          | 0.5732   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6788        | 3.23  | 31000  | 1.6926          | 0.5728   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6813        | 3.34  | 32000  | 1.6920          | 0.5726   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6782        | 3.44  | 33000  | 1.6926          | 0.5733   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6801        | 3.55  | 34000  | 1.6894          | 0.5719   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6796        | 3.65  | 35000  | 1.6890          | 0.5728   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6768        | 3.75  | 36000  | 1.6882          | 0.5722   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6802        | 3.86  | 37000  | 1.6872          | 0.5732   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6809        | 3.96  | 38000  | 1.6855          | 0.5750   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6701        | 4.07  | 39000  | 1.6886          | 0.5742   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6646        | 4.17  | 40000  | 1.6890          | 0.5734   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.669         | 4.28  | 41000  | 1.6859          | 0.5747   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6713        | 4.38  | 42000  | 1.6867          | 0.5740   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6693        | 4.48  | 43000  | 1.6821          | 0.5750   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6693        | 4.59  | 44000  | 1.6822          | 0.5747   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6692        | 4.69  | 45000  | 1.6801          | 0.5745   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6703        | 4.8   | 46000  | 1.6834          | 0.5761   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6677        | 4.9   | 47000  | 1.6819          | 0.5756   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6682        | 5.01  | 48000  | 1.6778          | 0.5752   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6547        | 5.11  | 49000  | 1.6825          | 0.5751   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6566        | 5.21  | 50000  | 1.6825          | 0.5758   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6605        | 5.32  | 51000  | 1.6814          | 0.5746   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6603        | 5.42  | 52000  | 1.6768          | 0.5755   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6595        | 5.53  | 53000  | 1.6757          | 0.5753   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6603        | 5.63  | 54000  | 1.6769          | 0.5738   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.662         | 5.74  | 55000  | 1.6758          | 0.5759   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6602        | 5.84  | 56000  | 1.6771          | 0.5757   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6624        | 5.94  | 57000  | 1.6749          | 0.5770   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6527        | 6.05  | 58000  | 1.6791          | 0.5758   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6474        | 6.15  | 59000  | 1.6763          | 0.5773   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6494        | 6.26  | 60000  | 1.6765          | 0.5761   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6539        | 6.36  | 61000  | 1.6741          | 0.5764   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6539        | 6.47  | 62000  | 1.6752          | 0.5768   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6529        | 6.57  | 63000  | 1.6737          | 0.5775   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6533        | 6.67  | 64000  | 1.6725          | 0.5758   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.653         | 6.78  | 65000  | 1.6722          | 0.5774   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6522        | 6.88  | 66000  | 1.6726          | 0.5762   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6528        | 6.99  | 67000  | 1.6726          | 0.5768   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6439        | 7.09  | 68000  | 1.6728          | 0.5771   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6403        | 7.19  | 69000  | 1.6703          | 0.5758   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6447        | 7.3   | 70000  | 1.6697          | 0.5772   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6458        | 7.4   | 71000  | 1.6694          | 0.5777   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6447        | 7.51  | 72000  | 1.6716          | 0.5771   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6449        | 7.61  | 73000  | 1.6680          | 0.5779   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6458        | 7.72  | 74000  | 1.6683          | 0.5779   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6447        | 7.82  | 75000  | 1.6681          | 0.5778   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6451        | 7.92  | 76000  | 1.6677          | 0.5781   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6418        | 8.03  | 77000  | 1.6665          | 0.5789   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6361        | 8.13  | 78000  | 1.6684          | 0.5779   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.636         | 8.24  | 79000  | 1.6687          | 0.5786   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6357        | 8.34  | 80000  | 1.6670          | 0.5790   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6379        | 8.45  | 81000  | 1.6658          | 0.5788   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6405        | 8.55  | 82000  | 1.6661          | 0.5788   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6378        | 8.65  | 83000  | 1.6650          | 0.5789   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6386        | 8.76  | 84000  | 1.6650          | 0.5784   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.638         | 8.86  | 85000  | 1.6644          | 0.5785   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6374        | 8.97  | 86000  | 1.6635          | 0.5777   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6298        | 9.07  | 87000  | 1.6647          | 0.5785   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6302        | 9.18  | 88000  | 1.6649          | 0.5787   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6315        | 9.28  | 89000  | 1.6651          | 0.5782   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.631         | 9.38  | 90000  | 1.6636          | 0.5788   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6316        | 9.49  | 91000  | 1.6627          | 0.5782   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6286        | 9.59  | 92000  | 1.6646          | 0.5783   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6304        | 9.7   | 93000  | 1.6632          | 0.5801   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6298        | 9.8   | 94000  | 1.6623          | 0.5800   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6309        | 9.91  | 95000  | 1.6620          | 0.5800   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6302        | 10.01 | 96000  | 1.6602          | 0.5801   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6242        | 10.11 | 97000  | 1.6610          | 0.5786   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6258        | 10.22 | 98000  | 1.6605          | 0.5795   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6234        | 10.32 | 99000  | 1.6605          | 0.5791   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+| 1.6245        | 10.43 | 100000 | 1.6604          | 0.5791   | 1           | 1.0                       | 0.0        | 0.0               | 0.0                |
+### Framework versions
+- Transformers 4.29.2
+- Pytorch 2.0.1+cu117
+- Datasets 2.12.0
+- Tokenizers 0.13.3

all_results.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+    "MSE": 0.0,
+    "MSE/layer0": 0.0,
+    "dead_code_fraction": 1.0,
+    "dead_code_fraction/layer0": 1.0,
+    "epoch": 10.43,
+    "eval_MSE/layer0": 6073.8636798095695,
+    "eval_accuracy": 0.5791389432485323,
+    "eval_dead_code_fraction/layer0": 0.1981725,
+    "eval_input_norm/layer0": 0.7182212994247673,
+    "eval_loss": 1.6604058742523193,
+    "eval_multicode_k": 1,
+    "eval_output_norm/layer0": 76.78913438796998,
+    "eval_runtime": 6.7146,
+    "eval_samples": 100,
+    "eval_samples_per_second": 14.893,
+    "eval_steps_per_second": 0.298,
+    "input_norm": 0.0,
+    "input_norm/layer0": 0.0,
+    "loss": 1.6774777018260956,
+    "max_norm": 153.29054260253906,
+    "max_norm/layer0": 153.29054260253906,
+    "mean_norm": 75.17323780059814,
+    "mean_norm/layer0": 75.17323780059814,
+    "multicode_k": 1,
+    "output_norm": 0.0,
+    "output_norm/layer0": 0.0,
+    "perplexity": 5.261445896555633,
+    "runtime": 132212.7109,
+    "samples_per_second": 72.61,
+    "steps_per_second": 0.756,
+    "train_samples": 920563
+}

config.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "architectures": [
+    "GPTNeoCodebookModel"
+  ],
+  "codebook_at": [
+    "preproj_attention"
+  ],
+  "codebook_kwargs": {},
+  "codebook_type": [
+    "compositional"
+  ],
+  "k_codebook": [
+    8
+  ],
+  "kmeans_init": false,
+  "kmeans_init_examples": 1000,
+  "kmeans_kwargs": {
+    "batch_size": 24576,
+    "n_init": "auto"
+  },
+  "kmeans_path": "/.cache/cb_volume/huggingface/kmeans_embeddings.pt",
+  "layers_to_snap": [
+    0
+  ],
+  "loss": "aeloss",
+  "model_type": "codebook",
+  "num_codebooks": [
+    16
+  ],
+  "num_codes": 25000,
+  "replace_codes": false,
+  "similarity_metric": "inner_product",
+  "torch_dtype": "float32",
+  "transformers_version": "4.29.2"
+}

eval_results.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+    "epoch": 10.43,
+    "eval_MSE/layer0": 6073.8636798095695,
+    "eval_accuracy": 0.5791389432485323,
+    "eval_dead_code_fraction/layer0": 0.1981725,
+    "eval_input_norm/layer0": 0.7182212994247673,
+    "eval_loss": 1.6604058742523193,
+    "eval_multicode_k": 1,
+    "eval_output_norm/layer0": 76.78913438796998,
+    "eval_runtime": 6.7146,
+    "eval_samples": 100,
+    "eval_samples_per_second": 14.893,
+    "eval_steps_per_second": 0.298,
+    "perplexity": 5.261445896555633
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:17bb9795f1e6739c83f2bbabcb103948ef59f25f577abc1ab5ccb094f65bad95
+size 371248378

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "bos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "bos_token": {
+    "__type": "AddedToken",
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "clean_up_tokenization_spaces": true,
+  "eos_token": {
+    "__type": "AddedToken",
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "errors": "replace",
+  "model_max_length": 2048,
+  "pad_token": null,
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": {
+    "__type": "AddedToken",
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

train_results.json ADDED Viewed

	@@ -0,0 +1,21 @@

+{
+    "MSE": 0.0,
+    "MSE/layer0": 0.0,
+    "dead_code_fraction": 1.0,
+    "dead_code_fraction/layer0": 1.0,
+    "epoch": 10.43,
+    "input_norm": 0.0,
+    "input_norm/layer0": 0.0,
+    "loss": 1.6774777018260956,
+    "max_norm": 153.29054260253906,
+    "max_norm/layer0": 153.29054260253906,
+    "mean_norm": 75.17323780059814,
+    "mean_norm/layer0": 75.17323780059814,
+    "multicode_k": 1,
+    "output_norm": 0.0,
+    "output_norm/layer0": 0.0,
+    "runtime": 132212.7109,
+    "samples_per_second": 72.61,
+    "steps_per_second": 0.756,
+    "train_samples": 920563
+}

trainer_state.json ADDED Viewed

The diff for this file is too large to render. See raw diff

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:694d424d45513c05a84b6e17c1f77f79366a09ef35a8b910241185856f3ede97
+size 4155

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff