llama-161M

Trained on 100B tokens.

1e-3 LR
0.1 wd
WSD scheduler with 10% decay
80% code, 10% NL, 10% instruction data
Dataset decontaminated against popular benchmarks following bigcode
8x3090s 110~ hours

This is a base pretrained model and requires further fine tuning to be useful.

Model Details

openai/openai_humaneval (greedy)	mbpp (greedy)
9.2%	9.8%

Downloads last month: 55

Safetensors

Model size

162M params

Tensor type

BF16

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for abacaj/llama-161M-100B

Finetunes

6 models

Quantizations

4 models

abacaj
/

llama-161M-100B

llama-161M

Model Details

Model tree for abacaj/llama-161M-100B

Spaces using abacaj/llama-161M-100B 2