Update README.md
Browse files
README.md
CHANGED
@@ -10,4 +10,20 @@ pinned: false
|
|
10 |
license: mit
|
11 |
---
|
12 |
|
13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
license: mit
|
11 |
---
|
12 |
|
13 |
+
## Dataset
|
14 |
+
Collection of William Shakespeare plays
|
15 |
+
- tiktoken - gpt2 tokenizer is used for tokenization
|
16 |
+
- Number of total tokens - 338025
|
17 |
+
|
18 |
+
## Model
|
19 |
+
|
20 |
+
The model is available [here](https://huggingface.co/sayanbanerjee32/nanogpt2_test)
|
21 |
+
|
22 |
+
## The HuggingFace Spaces Gradio App
|
23 |
+
|
24 |
+
The App takes following as input
|
25 |
+
1. Seed Text (Prompt) - This is provided as input text to the GPT model, based on which it generates further contents. If no data is provided, the only a space (" ") is provided as input
|
26 |
+
2. Max tokens to generate - This controls the numbers of tokens it will generate. The default value is 100.
|
27 |
+
3. Temperature - This accepts values between 0 to 1. Higher value introduces more randomness in the next token generation. Default value is set to 0.7.
|
28 |
+
4. Select Top N in each step - This is an optional field. If no value is provided (or <= 0), all available tokens are considered for the next token prediction based on SoftMax probability. However, if a number is set then only that many top tokes will be considered for the next token prediction.
|
29 |
+
|