deep-haiku-gpt-j-6b-8bit

This model is a fine-tuned version of gpt-j-6B-8bit on the haiku dataset.

Model description

The model is a fine-tuned version of GPT-J-6B-8Bit for generation of Haikus. The model, data and training procedure is inspired by a blog post by Robert A. Gonsalves.

We used the same multitask training approach as in der post, but significantly extended the dataset (almost double the size of the original one). A prepared version of the dataset can be found here.

Intended uses & limitations

The model is intended to generate Haikus. To do so, it was trained using a multitask learning approach (see Caruana 1997) with the following four different tasks: :

topic2graphemes (keywords = text)
topic2phonemes <keyword_phonemes = text_phonemes>
graphemes2phonemes [text = text_phonemes]
phonemes2graphemes {text_phonemes = text}

To use the model, use an appropriate prompt like "(dog rain =" and let the model generate a Haiku given the keyword.

Training and evaluation data

We used a collection of existing haikus for training. Furthermore, all haikus were used in their graphemes version as well as a phonemes version. In addition, we extracted key word for all haikus using KeyBERT and sorted out haikus with a low text quality according to the GRUEN score.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 10

Training results

Framework versions

Transformers 4.19.2
Pytorch 1.11.0+cu102
Datasets 2.2.1
Tokenizers 0.12.1

fabianmmueller
/

deep-haiku-gpt-j-6b-8bit