Update README.md
Browse files
README.md
CHANGED
@@ -15,7 +15,6 @@ This model was a joint collaboration of [Stanford CRFM](https://crfm.stanford.ed
|
|
15 |
- [Model Details](#model-details)
|
16 |
- [Model Description](#model-description)
|
17 |
- [Uses](#uses)
|
18 |
-
- [Direct Use](#direct-use)
|
19 |
- [Downstream Use](#downstream-use)
|
20 |
- [Out-of-Scope Use](#out-of-scope-use)
|
21 |
- [Bias, Risks, and Limitations](#bias-risks-and-limitations)
|
@@ -24,23 +23,11 @@ This model was a joint collaboration of [Stanford CRFM](https://crfm.stanford.ed
|
|
24 |
- [Training Data](#training-data)
|
25 |
- [Training Procedure](#training-procedure)
|
26 |
- [Preprocessing](#preprocessing)
|
27 |
-
- [Speeds, Sizes, Times](#speeds-sizes-times)
|
28 |
-
- [Evaluation](#evaluation)
|
29 |
-
- [Testing Data, Factors & Metrics](#testing-data-factors--metrics)
|
30 |
-
- [Testing Data](#testing-data)
|
31 |
-
- [Factors](#factors)
|
32 |
-
- [Metrics](#metrics)
|
33 |
-
- [Results](#results)
|
34 |
-
- [Model Examination](#model-examination)
|
35 |
- [Environmental Impact](#environmental-impact)
|
36 |
- [Technical Specifications](#technical-specifications)
|
37 |
- [Model Architecture and Objective](#model-architecture-and-objective)
|
38 |
- [Compute Infrastructure](#compute-infrastructure)
|
39 |
-
|
40 |
-
- [Software](#software)
|
41 |
-
- [Citation](#citation)
|
42 |
-
- [Model Card Contact](#model-card-contact)
|
43 |
-
- [How to Get Started with the Model](#how-to-get-started-with-the-model)
|
44 |
|
45 |
|
46 |
# Model Details
|
@@ -61,6 +48,8 @@ This model was a joint collaboration of [Stanford CRFM](https://crfm.stanford.ed
|
|
61 |
- **Language(s) (NLP):** en
|
62 |
- **License:** openrail
|
63 |
|
|
|
|
|
64 |
## Direct Use
|
65 |
|
66 |
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
@@ -83,8 +72,6 @@ The main way we have used this model is finetuning for downstream question answe
|
|
83 |
We do not recommend using this model for natural language generation in a production environment, finetuned or otherwise.
|
84 |
|
85 |
|
86 |
-
|
87 |
-
|
88 |
# Bias, Risks, and Limitations
|
89 |
|
90 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
@@ -155,19 +142,12 @@ This allows the model to encode information about these concepts in their indivi
|
|
155 |
|
156 |
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
157 |
|
158 |
-
- **Hardware Type:** More information needed
|
159 |
-
- **Hours used:** More information needed
|
160 |
-
- **Cloud Provider:** More information needed
|
161 |
-
- **Compute Region:** More information needed
|
162 |
-
- **Carbon Emitted:** More information needed
|
163 |
-
|
164 |
# Technical Specifications
|
165 |
|
166 |
## Model Architecture and Objective
|
167 |
|
168 |
Pubmed GPT 2.7B is a standard GPT-2 implementation (trained with Flash Attention) with the following hyperparameters:
|
169 |
|
170 |
-
|
171 |
| | |
|
172 |
| ----------- | ----- |
|
173 |
| hidden size | 2560 |
|
@@ -176,7 +156,6 @@ Pubmed GPT 2.7B is a standard GPT-2 implementation (trained with Flash Attention
|
|
176 |
| vocab size | 28896 |
|
177 |
| sequence length| 1024 |
|
178 |
|
179 |
-
|
180 |
## Compute Infrastructure
|
181 |
|
182 |
The model was trained on [MosaicML Cloud](https://www.mosaicml.com/cloud), a platform designed for large workloads like LLMs. Using the [Composer](https://github.com/mosaicml/composer) training library and [PyTorch FSDP](https://pytorch.org/docs/stable/fsdp.html), it was easy to enable multi-node training across 128 A100-40GB GPUs, and the total run was completed in ~6.25 days.
|
|
|
15 |
- [Model Details](#model-details)
|
16 |
- [Model Description](#model-description)
|
17 |
- [Uses](#uses)
|
|
|
18 |
- [Downstream Use](#downstream-use)
|
19 |
- [Out-of-Scope Use](#out-of-scope-use)
|
20 |
- [Bias, Risks, and Limitations](#bias-risks-and-limitations)
|
|
|
23 |
- [Training Data](#training-data)
|
24 |
- [Training Procedure](#training-procedure)
|
25 |
- [Preprocessing](#preprocessing)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
- [Environmental Impact](#environmental-impact)
|
27 |
- [Technical Specifications](#technical-specifications)
|
28 |
- [Model Architecture and Objective](#model-architecture-and-objective)
|
29 |
- [Compute Infrastructure](#compute-infrastructure)
|
30 |
+
|
|
|
|
|
|
|
|
|
31 |
|
32 |
|
33 |
# Model Details
|
|
|
48 |
- **Language(s) (NLP):** en
|
49 |
- **License:** openrail
|
50 |
|
51 |
+
# Uses
|
52 |
+
|
53 |
## Direct Use
|
54 |
|
55 |
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
|
|
72 |
We do not recommend using this model for natural language generation in a production environment, finetuned or otherwise.
|
73 |
|
74 |
|
|
|
|
|
75 |
# Bias, Risks, and Limitations
|
76 |
|
77 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
|
|
142 |
|
143 |
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
144 |
|
|
|
|
|
|
|
|
|
|
|
|
|
145 |
# Technical Specifications
|
146 |
|
147 |
## Model Architecture and Objective
|
148 |
|
149 |
Pubmed GPT 2.7B is a standard GPT-2 implementation (trained with Flash Attention) with the following hyperparameters:
|
150 |
|
|
|
151 |
| | |
|
152 |
| ----------- | ----- |
|
153 |
| hidden size | 2560 |
|
|
|
156 |
| vocab size | 28896 |
|
157 |
| sequence length| 1024 |
|
158 |
|
|
|
159 |
## Compute Infrastructure
|
160 |
|
161 |
The model was trained on [MosaicML Cloud](https://www.mosaicml.com/cloud), a platform designed for large workloads like LLMs. Using the [Composer](https://github.com/mosaicml/composer) training library and [PyTorch FSDP](https://pytorch.org/docs/stable/fsdp.html), it was easy to enable multi-node training across 128 A100-40GB GPUs, and the total run was completed in ~6.25 days.
|