gugarosa commited on
Commit
df8e23b
1 Parent(s): 8d5bdf1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -2
README.md CHANGED
@@ -72,13 +72,19 @@ print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
72
 
73
  ### Training Data
74
 
75
- The models have been trained for 7.8B tokens from [The Pile](https://huggingface.co/datasets/the_pile) dataset.
76
 
77
  ### Training Procedure
78
 
79
  Please refer to the [training script](https://github.com/microsoft/archai/blob/main/tasks/text_generation/train.py).
80
 
81
- ## Bias, Risks, and Limitations
 
 
 
 
 
 
82
 
83
  Pre-training a language model using The Pile dataset may have several limitations and potential biases that need to be considered. The following are some of the technical and sociotechnical limitations associated with using this dataset for pre-training:
84
 
 
72
 
73
  ### Training Data
74
 
75
+ The models have been trained for only 7.8B tokens from [The Pile](https://huggingface.co/datasets/the_pile) dataset. Such number might imply in repetitive text when generating a large amount of tokens (32+ tokens).
76
 
77
  ### Training Procedure
78
 
79
  Please refer to the [training script](https://github.com/microsoft/archai/blob/main/tasks/text_generation/train.py).
80
 
81
+ ## Limitations
82
+
83
+ Comparing smaller-sized transformers to large language models can be misleading and inaccurate, as they are fundamentally different in terms of the number of parameters, computational power, and capabilities. While smaller models may perform well on certain tasks, they lack the complexity and depth of larger models, which can lead to significant differences in their overall performance.
84
+
85
+ It is important to note that smaller models have their advantages. They require less computational resources, have a smaller memory footprint, and faster inference latency, which can be beneficial for real-time applications and devices with limited computing power. Additionally, research with smaller-sized transformers may lead to the discovery of more efficient architectures, which better utilizes computational resources and provide insights for training/deploying larger models.
86
+
87
+ ## Bias and Risks
88
 
89
  Pre-training a language model using The Pile dataset may have several limitations and potential biases that need to be considered. The following are some of the technical and sociotechnical limitations associated with using this dataset for pre-training:
90