sayanbanerjee32 commited on
Commit
bb1e3f9
1 Parent(s): 481d4de

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -3
README.md CHANGED
@@ -1,3 +1,24 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+ ## Dataset - tiny shakespeare, character-level
5
+ Tiny shakespeare, of the good old char-rnn fame :) Treated on character-level.
6
+
7
+ - Tokenization performed on Character level
8
+ - Vocab size 65. Following are the unique tokens
9
+ - `!$&',-.3:;?ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz`
10
+ - Number of total tokens - 1115394
11
+ - trained on 1,003,854 tokens (90%)
12
+ - validation is performed on 111,540 tokens (10%)
13
+
14
+ ## The Huggingface Spaces Gradio App
15
+
16
+ This model is used for the following Huggingface Spaces Gradio App.
17
+
18
+ The app is available [here](https://huggingface.co/spaces/sayanbanerjee32/nano_text_generator)
19
+
20
+ The takes following as input
21
+ 1. Seed Text (Prompt) - This provided as input text to the GPT model, based on which it generates further contents. If no data is provided, the only a space (" ") is provided as input
22
+ 2. Max tokens to generate - This controls the numbers of character tokens it will generate. The default value is 100.
23
+ 3. Temperature - This accepts value between 0 to 1. Higher value introduces more randomness in the next token generation. Default value is set to 0.7.
24
+ 4. Select Top N in each step - This is optional field. If no value is provided (or <= 0), all available tokens are considered for next token prediction based on SoftMax probability. However, if a number is set then only that many top characters will be considered for next token prediction.