I believe that the readme says 175B model when it should be 175M.
#41
by
noobmaster29
- opened
README.md
CHANGED
@@ -77,8 +77,8 @@ unfiltered content from the internet, which is far from neutral the model is str
|
|
77 |
|
78 |
> Like other large language models for which the diversity (or lack thereof) of training
|
79 |
> data induces downstream impact on the quality of our model, OPT-175B has limitations in terms
|
80 |
-
> of bias and safety. OPT-
|
81 |
-
> hallucination. In general, OPT-
|
82 |
> large language models.
|
83 |
|
84 |
This bias will also affect all fine-tuned versions of this model.
|
@@ -118,7 +118,7 @@ re-formatting practices, including removing repetitive/non-informative text like
|
|
118 |
The texts are tokenized using the **GPT2** byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a
|
119 |
vocabulary size of 50272. The inputs are sequences of 2048 consecutive tokens.
|
120 |
|
121 |
-
The
|
122 |
|
123 |
### BibTeX entry and citation info
|
124 |
|
|
|
77 |
|
78 |
> Like other large language models for which the diversity (or lack thereof) of training
|
79 |
> data induces downstream impact on the quality of our model, OPT-175B has limitations in terms
|
80 |
+
> of bias and safety. OPT-175M can also have quality issues in terms of generation diversity and
|
81 |
+
> hallucination. In general, OPT-175M is not immune from the plethora of issues that plague modern
|
82 |
> large language models.
|
83 |
|
84 |
This bias will also affect all fine-tuned versions of this model.
|
|
|
118 |
The texts are tokenized using the **GPT2** byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a
|
119 |
vocabulary size of 50272. The inputs are sequences of 2048 consecutive tokens.
|
120 |
|
121 |
+
The 175M model was trained on 992 *80GB A100 GPUs*. The training duration was roughly ~33 days of continuous training.
|
122 |
|
123 |
### BibTeX entry and citation info
|
124 |
|