Update README.md
Browse files
README.md
CHANGED
@@ -72,13 +72,19 @@ print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
|
|
72 |
|
73 |
### Training Data
|
74 |
|
75 |
-
The models have been trained for 7.8B tokens from [The Pile](https://huggingface.co/datasets/the_pile) dataset.
|
76 |
|
77 |
### Training Procedure
|
78 |
|
79 |
Please refer to the [training script](https://github.com/microsoft/archai/blob/main/tasks/text_generation/train.py).
|
80 |
|
81 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
82 |
|
83 |
Pre-training a language model using The Pile dataset may have several limitations and potential biases that need to be considered. The following are some of the technical and sociotechnical limitations associated with using this dataset for pre-training:
|
84 |
|
|
|
72 |
|
73 |
### Training Data
|
74 |
|
75 |
+
The models have been trained for only 7.8B tokens from [The Pile](https://huggingface.co/datasets/the_pile) dataset. Such number might imply in repetitive text when generating a large amount of tokens (32+ tokens).
|
76 |
|
77 |
### Training Procedure
|
78 |
|
79 |
Please refer to the [training script](https://github.com/microsoft/archai/blob/main/tasks/text_generation/train.py).
|
80 |
|
81 |
+
## Limitations
|
82 |
+
|
83 |
+
Comparing smaller-sized transformers to large language models can be misleading and inaccurate, as they are fundamentally different in terms of the number of parameters, computational power, and capabilities. While smaller models may perform well on certain tasks, they lack the complexity and depth of larger models, which can lead to significant differences in their overall performance.
|
84 |
+
|
85 |
+
It is important to note that smaller models have their advantages. They require less computational resources, have a smaller memory footprint, and faster inference latency, which can be beneficial for real-time applications and devices with limited computing power. Additionally, research with smaller-sized transformers may lead to the discovery of more efficient architectures, which better utilizes computational resources and provide insights for training/deploying larger models.
|
86 |
+
|
87 |
+
## Bias and Risks
|
88 |
|
89 |
Pre-training a language model using The Pile dataset may have several limitations and potential biases that need to be considered. The following are some of the technical and sociotechnical limitations associated with using this dataset for pre-training:
|
90 |
|