llama3-42b-v0 / README.md
chargoddard's picture
Update README.md
ae96f79 verified
|
raw
history blame
1.8 kB
metadata
license: other
datasets:
  - JeanKaddour/minipile
language:
  - en
tags:
  - axolotl
  - mergekit
  - llama

Meta's Llama 3 70B pruned to 42B parameters using the methodology described in The Unreasonable Ineffectiveness of the Deeper Layers. Post-pruning trained using QLoRA for ~100M tokens from JeanKaddour/minipile.

Layers to prune selected using PruneMe.

Still evaluating, don't get too excited! Might be incredibly dumb. Check out these zero-shot MMLU numbers though:

Groups Version Filter n-shot Metric Value Stderr
mmlu N/A none 0 acc 0.7319 ± 0.0034
- humanities N/A none 0 acc 0.6582 ± 0.0063
- other N/A none 0 acc 0.7927 ± 0.0069
- social_sciences N/A none 0 acc 0.8466 ± 0.0064
- stem N/A none 0 acc 0.6702 ± 0.0079

5-shot:

Groups Version Filter n-shot Metric Value Stderr
mmlu N/A none 0 acc 0.7669 ± 0.0034
- humanities N/A none 5 acc 0.7296 ± 0.0062
- other N/A none 5 acc 0.8101 ± 0.0067
- social_sciences N/A none 5 acc 0.8668 ± 0.0060
- stem N/A none 5 acc 0.6825 ± 0.0079

Built with Axolotl