taufeeque's picture
Add model
d2a7b67
metadata
tags:
  - generated_from_trainer
datasets:
  - toy_graph
metrics:
  - accuracy
model-index:
  - name: output_toy
    results:
      - task:
          name: Causal Language Modeling
          type: text-generation
        dataset:
          name: toy_graph
          type: toy_graph
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.4525254617525837

output_toy

This model is a fine-tuned version of toy/model on the toy_graph dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2691
  • Accuracy: 0.4525
  • Transition Accuracy: 0.5634
  • First Transition Accuracy: 0.88
  • Multicode K: 1
  • Dead Code Fraction/layer0: 0.9969
  • Mse/layer0: 220380.4595
  • Input Norm/layer0: 333.7717
  • Output Norm/layer0: 12.9360
  • Dead Code Fraction/layer1: 0.9535
  • Mse/layer1: 132.7843
  • Input Norm/layer1: 6.5450
  • Output Norm/layer1: 13.1449
  • Dead Code Fraction/layer2: 0.9349
  • Mse/layer2: 365.9396
  • Input Norm/layer2: 6.1370
  • Output Norm/layer2: 18.3248
  • Dead Code Fraction/layer3: 0.9819
  • Mse/layer3: 415.9804
  • Input Norm/layer3: 7.4097
  • Output Norm/layer3: 18.4665

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 1024
  • eval_batch_size: 512
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • training_steps: 20000

Training results

Training Loss Epoch Step Validation Loss Accuracy Transition Accuracy First Transition Accuracy Multicode K Dead Code Fraction/layer0 Mse/layer0 Input Norm/layer0 Output Norm/layer0 Dead Code Fraction/layer1 Mse/layer1 Input Norm/layer1 Output Norm/layer1 Dead Code Fraction/layer2 Mse/layer2 Input Norm/layer2 Output Norm/layer2 Dead Code Fraction/layer3 Mse/layer3 Input Norm/layer3 Output Norm/layer3
2.2465 0.03 500 1.8386 0.3565 0.3555 0.31 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.5981 0.05 1000 1.4652 0.4204 0.5015 0.58 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.3928 0.07 1500 1.3541 0.4378 0.555 0.79 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.3405 0.1 2000 1.3264 0.4427 0.5756 0.82 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.3189 0.12 2500 1.3187 0.4446 0.5576 0.86 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.308 0.15 3000 1.3064 0.4468 0.5573 0.82 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.3009 0.17 3500 1.2963 0.4493 0.5763 0.87 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2965 0.2 4000 1.2922 0.4494 0.5677 0.9 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2919 0.23 4500 1.2880 0.4499 0.5821 0.91 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2889 0.25 5000 1.2856 0.4501 0.56 0.9 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2855 0.28 5500 1.2816 0.4503 0.6016 0.9 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2828 0.3 6000 1.2844 0.4502 0.5734 0.87 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2805 0.33 6500 1.2777 0.4516 0.6084 0.95 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2793 0.35 7000 1.2796 0.4511 0.5681 0.93 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2785 0.38 7500 1.2748 0.4519 0.5919 0.95 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2764 0.4 8000 1.2767 0.4518 0.5760 0.9 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2763 0.42 8500 1.2801 0.4507 0.5827 0.94 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2755 0.45 9000 1.2755 0.4516 0.5765 0.9 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2746 0.47 9500 1.2736 0.4523 0.5865 0.9 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2734 0.5 10000 1.2740 0.4519 0.5779 0.91 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2732 0.53 10500 1.2744 0.4516 0.5879 0.89 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2723 0.55 11000 1.2690 0.4525 0.5811 0.89 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2712 0.57 11500 1.2705 0.4526 0.5779 0.93 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2716 0.6 12000 1.2701 0.4527 0.5760 0.89 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2708 0.62 12500 1.2716 0.4522 0.5485 0.95 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2705 0.65 13000 1.2676 0.4529 0.5734 0.93 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2696 0.68 13500 1.2717 0.4519 0.5994 0.91 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2687 0.7 14000 1.2687 0.4524 0.5756 0.9 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2685 0.72 14500 1.2709 0.4521 0.6127 0.89 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2685 0.75 15000 1.2706 0.4519 0.5873 0.91 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2675 0.78 15500 1.2691 0.4527 0.6365 0.96 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2677 0.8 16000 1.2686 0.4526 0.5589 0.93 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2676 0.82 16500 1.2639 0.4529 0.5940 0.89 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2662 0.85 17000 1.2655 0.4530 0.5955 0.94 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2666 0.88 17500 1.2636 0.4526 0.6013 0.96 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2664 0.9 18000 1.2681 0.4526 0.6034 0.96 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.266 0.93 18500 1.2624 0.4527 0.5839 0.88 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2653 0.95 19000 1.2688 0.4519 0.5837 0.92 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2654 0.97 19500 1.2619 0.4534 0.5973 0.92 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1.2649 1.0 20000 1.2647 0.4525 0.59 0.93 1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0

Framework versions

  • Transformers 4.28.1
  • Pytorch 2.0.1+cu117
  • Datasets 2.12.0
  • Tokenizers 0.13.3