license: mit
tags:
- generated_from_trainer
model-index:
- name: deep-haiku-gpt-j-6b-8bit
results: []
deep-haiku-gpt-j-6b-8bit
This model is a fine-tuned version of gpt-j-6B-8bit on the haiku dataset.
Model description
The model is a fine-tuned version of GPT-2 for generation of Haikus. The model, data and training procedure is inspired by a blog post by Robert A. Gonsalves. Instead of using a 8bit version of GPT-J 6B, we instead used vanilla GPT-2. From what we saw, the model performance comparable but is much easier to fine-tune.
We used the same multitask training approach as in der post, but significantly extended the dataset (almost double the size of the original on). A prepared version of the dataset can be found here.
Intended uses & limitations
The model is intended to generate Haikus. To do so, it was trained using a multitask learning approach (see Caruana 1997) with the following four different tasks: :
- topic2graphemes
(keywords = text)
- topic2phonemes
<keyword_phonemes = text_phonemes>
- graphemes2phonemes
[text = text_phonemes]
- phonemes2graphemes
{text_phonemes = text}
To use the model, use an appropriate prompt like "(dog rain ="
and let the model generate a Haiku given the keyword.
Training and evaluation data
We used a collection of existing haikus for training. Furthermore, all haikus were used in their graphemes version as well as a phonemes version. In addition, we extracted key word for all haikus using KeyBERT and sorted out haikus with a low text quality according to the GRUEN score.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 100
- num_epochs: 10
Training results
Framework versions
- Transformers 4.19.2
- Pytorch 1.11.0+cu102
- Datasets 2.2.1
- Tokenizers 0.12.1