I believe that the readme says 175B model when it should be 175M.

#41
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -77,8 +77,8 @@ unfiltered content from the internet, which is far from neutral the model is str
77
 
78
  > Like other large language models for which the diversity (or lack thereof) of training
79
  > data induces downstream impact on the quality of our model, OPT-175B has limitations in terms
80
- > of bias and safety. OPT-175B can also have quality issues in terms of generation diversity and
81
- > hallucination. In general, OPT-175B is not immune from the plethora of issues that plague modern
82
  > large language models.
83
 
84
  This bias will also affect all fine-tuned versions of this model.
@@ -118,7 +118,7 @@ re-formatting practices, including removing repetitive/non-informative text like
118
  The texts are tokenized using the **GPT2** byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a
119
  vocabulary size of 50272. The inputs are sequences of 2048 consecutive tokens.
120
 
121
- The 175B model was trained on 992 *80GB A100 GPUs*. The training duration was roughly ~33 days of continuous training.
122
 
123
  ### BibTeX entry and citation info
124
 
 
77
 
78
  > Like other large language models for which the diversity (or lack thereof) of training
79
  > data induces downstream impact on the quality of our model, OPT-175B has limitations in terms
80
+ > of bias and safety. OPT-175M can also have quality issues in terms of generation diversity and
81
+ > hallucination. In general, OPT-175M is not immune from the plethora of issues that plague modern
82
  > large language models.
83
 
84
  This bias will also affect all fine-tuned versions of this model.
 
118
  The texts are tokenized using the **GPT2** byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a
119
  vocabulary size of 50272. The inputs are sequences of 2048 consecutive tokens.
120
 
121
+ The 175M model was trained on 992 *80GB A100 GPUs*. The training duration was roughly ~33 days of continuous training.
122
 
123
  ### BibTeX entry and citation info
124