deep-haiku-gpt-j-6b-8bit

This model is a fine-tuned version of gpt-j-6B-8bit on the haiku dataset.

Model description

The model is a fine-tuned version of GPT-J-6B-8Bit for generation of Haikus. The model, data and training procedure is inspired by a blog post by Robert A. Gonsalves.

We used the same multitask training approach as in der post, but significantly extended the dataset (almost double the size of the original one). A prepared version of the dataset can be found here.

Intended uses & limitations

The model is intended to generate Haikus. To do so, it was trained using a multitask learning approach (see Caruana 1997) with the following four different tasks: :

  • topic2graphemes (keywords = text)
  • topic2phonemes <keyword_phonemes = text_phonemes>
  • graphemes2phonemes [text = text_phonemes]
  • phonemes2graphemes {text_phonemes = text}

To use the model, use an appropriate prompt like "(dog rain =" and let the model generate a Haiku given the keyword.

Training and evaluation data

We used a collection of existing haikus for training. Furthermore, all haikus were used in their graphemes version as well as a phonemes version. In addition, we extracted key word for all haikus using KeyBERT and sorted out haikus with a low text quality according to the GRUEN score.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 10

Training results

Framework versions

  • Transformers 4.19.2
  • Pytorch 1.11.0+cu102
  • Datasets 2.2.1
  • Tokenizers 0.12.1
Downloads last month
21
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using fabianmmueller/deep-haiku-gpt-j-6b-8bit 1