|
--- |
|
license: mit |
|
--- |
|
## Dataset - tiny shakespeare, character-level |
|
Tiny shakespeare, of the good old char-rnn fame :) Treated on character-level. |
|
|
|
- Tokenization performed on Character level |
|
- Vocab size 65. Following are the unique tokens |
|
- `!$&',-.3:;?ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz` |
|
- Number of total tokens - 1115394 |
|
- trained on 1,003,854 tokens (90%) |
|
- validation is performed on 111,540 tokens (10%) |
|
|
|
## The Huggingface Spaces Gradio App |
|
|
|
This model is used for the following Huggingface Spaces Gradio App. |
|
|
|
The app is available [here](https://huggingface.co/spaces/sayanbanerjee32/nano_text_generator) |
|
|
|
The takes following as input |
|
1. Seed Text (Prompt) - This provided as input text to the GPT model, based on which it generates further contents. If no data is provided, the only a space (" ") is provided as input |
|
2. Max tokens to generate - This controls the numbers of character tokens it will generate. The default value is 100. |
|
3. Temperature - This accepts value between 0 to 1. Higher value introduces more randomness in the next token generation. Default value is set to 0.7. |
|
4. Select Top N in each step - This is optional field. If no value is provided (or <= 0), all available tokens are considered for next token prediction based on SoftMax probability. However, if a number is set then only that many top characters will be considered for next token prediction. |
|
|