sadia72
/

gpt2-shakespeare

@@ -5,6 +5,7 @@ tags:
 model-index:
 - name: gpt2-shakespeare
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -12,24 +13,64 @@ should probably proofread and complete it, then remove this comment. -->
 # gpt2-shakespeare
-This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
 - Loss: 2.5738
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
@@ -57,4 +98,4 @@ The following hyperparameters were used during training:
 - Transformers 4.26.1
 - Pytorch 1.13.1+cu116
 - Datasets 2.10.0
-- Tokenizers 0.13.2

 model-index:
 - name: gpt2-shakespeare
   results: []
+pipeline_tag: text-generation
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 # gpt2-shakespeare
+This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on [datasets](https://github.com/sadia-sust/dataset-finetune-gpt2) containing Shakespeare Books.
 It achieves the following results on the evaluation set:
 - Loss: 2.5738
 ## Model description
+GPT-2 model is finetuned with text corpus.
 ## Intended uses & limitations
+Intended use for this model is to write novel in Shakespeare Style. It has limitations to write in other writer's style.
+## Datasets Description
+Text corpus is developed for fine-tuning gpt-2 model. Books are downloaded from [Project Gutenberg](http://www.gutenberg.org/) as plain text files.
+A large text corpus were needed to train the model to be abled to write in Shakespeare style.
+The following books are used to develop text corpus:
+- Macbeth, word count: 38197
+- THE TRAGEDY OF TITUS ANDRONICUS, word count: 40413
+- King Richard II, word count: 48423
+- Shakespeare's Tragedy of Romeo and Juliet, word count: 144935
+- A MIDSUMMER NIGHT’S DREAM, word count: 36597
+- ALL’S WELL THAT ENDS WELL, word count: 49363
+- THE TRAGEDY OF HAMLET, PRINCE OF DENMARK, word count: 57471
+- THE TRAGEDY OF JULIUS CAESAR, word count: 37391
+- THE TRAGEDY OF KING LEAR, word count: 54101
+- THE LIFE AND DEATH OF KING RICHARD III, word count: 55985
+- Romeo and Juliet, word count: 51417
+- Measure for Measure, word count: 62703
+- Much Ado about Nothing, word count: 45577
+- Othello, the Moor of Venice, word count: 53967
+- THE WINTER’S TALE, word count: 52911
+- The Comedy of Errors, word count: 43179
+- The Merchant of Venice, word count: 45903
+- The Taming of the Shrew, word count: 44777
+- The Tempest, word count: 32323
+- TWELFTH NIGHT: OR, WHAT YOU WILL, word count: 42907
+- The Sonnets, word count: 39849
+Corpus has total 1078389 word tokens.
+## Datasets Preprocessig
+- Header text are removed manually.
+- Using sent_tokenize() function from NLTK python library, extra spaces and new-lines were removed programmatically.
 ## Training and evaluation data
+Training dataset has 880447 word tokens and test dataset has 197913 word tokens.
 ## Training procedure
+To train the model, training api from Transformer class is used.
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - Transformers 4.26.1
 - Pytorch 1.13.1+cu116
 - Datasets 2.10.0
+- Tokenizers 0.13.2