Update README.md
Browse files
README.md
CHANGED
@@ -57,7 +57,7 @@ The following books are used to develop text corpus:
|
|
57 |
|
58 |
Corpus has total 1078389 word tokens.
|
59 |
|
60 |
-
## Datasets
|
61 |
|
62 |
- Header text are removed manually.
|
63 |
- Using sent_tokenize() function from NLTK python library, extra spaces and new-lines were removed programmatically.
|
@@ -93,6 +93,17 @@ The following hyperparameters were used during training:
|
|
93 |
| 2.3842 | 2.51 | 1000 | 2.5738 |
|
94 |
|
95 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
96 |
### Framework versions
|
97 |
|
98 |
- Transformers 4.26.1
|
|
|
57 |
|
58 |
Corpus has total 1078389 word tokens.
|
59 |
|
60 |
+
## Datasets Preprocessing
|
61 |
|
62 |
- Header text are removed manually.
|
63 |
- Using sent_tokenize() function from NLTK python library, extra spaces and new-lines were removed programmatically.
|
|
|
93 |
| 2.3842 | 2.51 | 1000 | 2.5738 |
|
94 |
|
95 |
|
96 |
+
## Sample Code Using Transformers Pipeline
|
97 |
+
|
98 |
+
```
|
99 |
+
from transformers import pipeline
|
100 |
+
|
101 |
+
story = pipeline('text-generation',model='./gpt2-shakespeare', tokenizer='gpt2', max_length = 300)
|
102 |
+
story("how art thou")
|
103 |
+
|
104 |
+
```
|
105 |
+
|
106 |
+
|
107 |
### Framework versions
|
108 |
|
109 |
- Transformers 4.26.1
|