robertmyers nazneen commited on
Commit
107d556
1 Parent(s): 940df10

model documentation (#2)

Browse files

- model documentation (b3f381c28e7b08721a232864f8facba9fd7252af)


Co-authored-by: Nazneen Rajani <nazneen@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +199 -0
README.md ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+
3
+ tags:
4
+ - text-generation
5
+
6
+ ---
7
+ # Model Card for bt-opt-1.3b
8
+
9
+
10
+ # Model Details
11
+
12
+ ## Model Description
13
+
14
+ - **Developed by:** Opentensor
15
+ - **Shared by [Optional]:** Hugging Face and Meta
16
+ - **Model type:** Text Generation
17
+ - **Language(s) (NLP):** More information needed
18
+ - **License:** More information needed
19
+ - **Related Models:**
20
+ - **Parent Model:** OPT
21
+ - **Resources for more information:**
22
+ - [Associated Paper](https://arxiv.org/abs/2205.01068)
23
+
24
+ # Uses
25
+
26
+ ## Direct Use
27
+
28
+ This model can be used for the task of Text Generation
29
+
30
+ ## Downstream Use [Optional]
31
+
32
+ In addition, the model can be fine-tuned on a downstream task using the [CLM example](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling)
33
+
34
+ ## Out-of-Scope Use
35
+
36
+ The model should not be used to intentionally create hostile or alienating environments for people.
37
+
38
+ # Bias, Risks, and Limitations
39
+
40
+ As mentioned in Meta AI's model card, given that the training data used for this model contains a lot of
41
+ unfiltered content from the internet, which is far from neutral the model is strongly biased :
42
+
43
+ > Like other large language models for which the diversity (or lack thereof) of training
44
+ > data induces downstream impact on the quality of our model, OPT-175B has limitations in terms
45
+ > of bias and safety. OPT-175B can also have quality issues in terms of generation diversity and
46
+ > hallucination. In general, OPT-175B is not immune from the plethora of issues that plague modern
47
+ > large language models.
48
+
49
+ See model [facebook/opt-1.3b model card](https://huggingface.co/facebook/opt-1.3b) for example biased predictions
50
+
51
+ The model creators noted in the [associated paper](https://arxiv.org/pdf/2205.01068.pdf)
52
+ > we found OPT-175B does not work well with declarative instructions or point-blank interrogatives. Prompting with such instructions tends to produce a simulation of a dialogue beginning with such an instruction, rather than an execution of the instruction. Future work into instruction learning, in the vein of InstructGPT (Ouyang et al., 2022), may alleviate these limitations. OPT-175B also tends to be repetitive and can easily get stuck in a loop.
53
+
54
+
55
+ ## Recommendations
56
+
57
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
58
+
59
+
60
+ # Training Details
61
+
62
+ ## Training Data
63
+
64
+ The Meta AI team wanted to train this model on a corpus as large as possible. It is composed of the union of the following 5 filtered datasets of textual documents:
65
+
66
+ - BookCorpus, which consists of more than 10K unpublished books,
67
+ - CC-Stories, which contains a subset of CommonCrawl data filtered to match the
68
+ story-like style of Winograd schemas,
69
+ - The Pile, from which * Pile-CC, OpenWebText2, USPTO, Project Gutenberg, OpenSubtitles, Wikipedia, DM Mathematics and HackerNews* were included.
70
+ - Pushshift.io Reddit dataset that was developed in Baumgartner et al. (2020) and processed in
71
+ Roller et al. (2021)
72
+ - CCNewsV2 containing an updated version of the English portion of the CommonCrawl News
73
+ dataset that was used in RoBERTa (Liu et al., 2019b)
74
+
75
+ The final training data contains 180B tokens corresponding to 800GB of data. The validation split was made of 200MB of the pretraining data, sampled proportionally
76
+ to each dataset’s size in the pretraining corpus.
77
+
78
+ The dataset might contains offensive content as parts of the dataset are a subset of
79
+ public Common Crawl data, along with a subset of public Reddit data, which could contain sentences
80
+ that, if viewed directly, can be insulting, threatening, or might otherwise cause anxiety.
81
+
82
+ Alo see the dataset card in the [associated paper](https://arxiv.org/pdf/2205.01068.pdf).
83
+
84
+ ## Training Procedure
85
+
86
+
87
+ ### Preprocessing
88
+
89
+ The texts are tokenized using the **GPT2** byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a
90
+ vocabulary size of 50272. The inputs are sequences of 2048 consecutive tokens.
91
+
92
+ The 175B model was trained on 992 *80GB A100 GPUs*. The training duration was roughly ~33 days of continuous training
93
+
94
+
95
+ ### Speeds, Sizes, Times
96
+
97
+ More information needed
98
+
99
+ # Evaluation
100
+
101
+
102
+ ## Testing Data, Factors & Metrics
103
+
104
+ ### Testing Data
105
+
106
+ More information needed
107
+
108
+ ### Factors
109
+
110
+
111
+ ### Metrics
112
+
113
+ More information needed
114
+ ## Results
115
+
116
+ More information needed
117
+
118
+ # Model Examination
119
+
120
+ More information needed
121
+
122
+ # Environmental Impact
123
+
124
+
125
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
126
+
127
+ - **Hardware Type:** 992 *80GB A100 GPUs
128
+ - **Hours used:** 792 (~33 dyas)
129
+ - **Cloud Provider:** More information needed
130
+ - **Compute Region:** More information needed
131
+ - **Carbon Emitted:** More information needed
132
+
133
+ # Technical Specifications [optional]
134
+
135
+ ## Model Architecture and Objective
136
+
137
+ OPTForCausalLM
138
+
139
+ ## Compute Infrastructure
140
+
141
+ More information needed
142
+
143
+ ### Hardware
144
+
145
+ More information needed
146
+
147
+ ### Software
148
+ Transformers_version: 4.22.1
149
+
150
+ # Citation
151
+
152
+
153
+ **BibTeX:**
154
+ ```bibtex
155
+ @misc{zhang2022opt,
156
+ title={OPT: Open Pre-trained Transformer Language Models},
157
+ author={Susan Zhang and Stephen Roller and Naman Goyal and Mikel Artetxe and Moya Chen and Shuohui Chen and Christopher Dewan and Mona Diab and Xian Li and Xi Victoria Lin and Todor Mihaylov and Myle Ott and Sam Shleifer and Kurt Shuster and Daniel Simig and Punit Singh Koura and Anjali Sridhar and Tianlu Wang and Luke Zettlemoyer},
158
+ year={2022},
159
+ eprint={2205.01068},
160
+ archivePrefix={arXiv},
161
+ primaryClass={cs.CL}
162
+ }
163
+ ```
164
+
165
+
166
+
167
+ # Glossary [optional]
168
+ More information needed
169
+
170
+ # More Information [optional]
171
+
172
+ More information needed
173
+
174
+ # Model Card Authors [optional]
175
+
176
+
177
+ Opentensor in collaboration with Ezi Ozoani and the Hugging Face team
178
+
179
+ # Model Card Contact
180
+
181
+ More information needed
182
+
183
+ # How to Get Started with the Model
184
+
185
+ Use the code below to get started with the model.
186
+
187
+ <details>
188
+ <summary> Click to expand </summary>
189
+
190
+ ```python
191
+ from transformers import AutoTokenizer, AutoModelForCausalLM
192
+
193
+ tokenizer = AutoTokenizer.from_pretrained("opentensor/bt-opt-1.3b")
194
+
195
+ model = AutoModelForCausalLM.from_pretrained("opentensor/bt-opt-1.3b")
196
+
197
+ ```
198
+ </details>
199
+