Jellywibble
/

dalio-pretrain-cleaned-v4

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Edit model card

Model Description

Pre-training on cleaned version of Principles

removing numeric references to footnotes
removing numeric counts, i.e. 1) ... 2) ... 3) ...
correcting gramma, i.e. full stops must be followed by a space
finetuning OPT-30B model on the dataset above
Dataset location: Jellywibble/dalio-principles-cleaned-v3

Metrics

Checkpoint 8 served
Hellaswag Perplexity: 30.65
2.289 eval loss

wandb link: https://wandb.ai/jellywibble/huggingface/runs/2jqc504o?workspace=user-jellywibble

Model Parameters

Trained on 4xA40, effective batchsize = 8

base_model_name facebook/opt-30b
dataset_name Jellywibble/dalio-principles-cleaned-v3
block_size 1024
gradient_accumulation_steps 2
per_device_train_batch_size 1
seed 2
num_train_epochs 1
learning_rate 3e-6

Notes

It is important for the effective batch size to be at least 8
Learning rate higher than 3e-6 will result in massive overfitting, i.e. much worse Hellaswag metrics

Downloads last month: 22

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.