sayanbanerjee32
commited on
Commit
•
bb1e3f9
1
Parent(s):
481d4de
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,24 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
---
|
4 |
+
## Dataset - tiny shakespeare, character-level
|
5 |
+
Tiny shakespeare, of the good old char-rnn fame :) Treated on character-level.
|
6 |
+
|
7 |
+
- Tokenization performed on Character level
|
8 |
+
- Vocab size 65. Following are the unique tokens
|
9 |
+
- `!$&',-.3:;?ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz`
|
10 |
+
- Number of total tokens - 1115394
|
11 |
+
- trained on 1,003,854 tokens (90%)
|
12 |
+
- validation is performed on 111,540 tokens (10%)
|
13 |
+
|
14 |
+
## The Huggingface Spaces Gradio App
|
15 |
+
|
16 |
+
This model is used for the following Huggingface Spaces Gradio App.
|
17 |
+
|
18 |
+
The app is available [here](https://huggingface.co/spaces/sayanbanerjee32/nano_text_generator)
|
19 |
+
|
20 |
+
The takes following as input
|
21 |
+
1. Seed Text (Prompt) - This provided as input text to the GPT model, based on which it generates further contents. If no data is provided, the only a space (" ") is provided as input
|
22 |
+
2. Max tokens to generate - This controls the numbers of character tokens it will generate. The default value is 100.
|
23 |
+
3. Temperature - This accepts value between 0 to 1. Higher value introduces more randomness in the next token generation. Default value is set to 0.7.
|
24 |
+
4. Select Top N in each step - This is optional field. If no value is provided (or <= 0), all available tokens are considered for next token prediction based on SoftMax probability. However, if a number is set then only that many top characters will be considered for next token prediction.
|