File size: 23,148 Bytes
ca4fc4d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
ata: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 180M/180M [00:03<00:00, 46.3MB/s]
Downloading data: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 180M/180M [00:04<00:00, 37.2MB/s]
Downloading data: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 180M/180M [00:03<00:00, 47.5MB/s]
Downloading data: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 180M/180M [00:03<00:00, 46.0MB/s]
Downloading data: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 180M/180M [00:04<00:00, 41.2MB/s]
Downloading data files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1Downloading data files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [27:46<00:00, 1666.10s/it]
Extracting data files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 12.85it/s]
Dataset parquet downloaded and prepared to /home/commune/.cache/huggingface/datasets/conceptofmind___parquet/conceptofmind--c4_0-to-20_neox_with_eos_8k-dd8655ce54e7b6cc/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec. Subsequent calls will reuse this data.
Found cached dataset parquet (/home/commune/.cache/huggingface/datasets/conceptofmind___parquet/conceptofmind--c4_0-to-20_neox_with_eos_8k-dd8655ce54e7b6cc/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
Found cached dataset parquet (/home/commune/.cache/huggingface/datasets/conceptofmind___parquet/conceptofmind--c4_0-to-20_neox_with_eos_8k-dd8655ce54e7b6cc/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
Found cached dataset parquet (/home/commune/.cache/huggingface/datasets/conceptofmind___parquet/conceptofmind--c4_0-to-20_neox_with_eos_8k-dd8655ce54e7b6cc/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
Found cached dataset parquet (/home/commune/.cache/huggingface/datasets/conceptofmind___parquet/conceptofmind--c4_0-to-20_neox_with_eos_8k-dd8655ce54e7b6cc/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
Found cached dataset parquet (/home/commune/.cache/huggingface/datasets/conceptofmind___parquet/conceptofmind--c4_0-to-20_neox_with_eos_8k-dd8655ce54e7b6cc/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
[2023-07-24 15:58:13,787] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.9.5, git-hash=unknown, git-branch=unknown
[2023-07-24 15:58:13,787] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.9.5, git-hash=unknown, git-branch=unknown
[2023-07-24 15:58:13,787] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-24 15:58:13,787] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-24 15:58:13,787] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-24 15:58:13,787] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-24 15:58:13,789] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.9.5, git-hash=unknown, git-branch=unknown
[2023-07-24 15:58:13,790] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-24 15:58:13,790] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-24 15:58:13,790] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.9.5, git-hash=unknown, git-branch=unknown
[2023-07-24 15:58:13,790] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.9.5, git-hash=unknown, git-branch=unknown
[2023-07-24 15:58:13,790] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-24 15:58:13,790] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-24 15:58:13,790] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-24 15:58:13,790] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-24 15:58:13,791] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.9.5, git-hash=unknown, git-branch=unknown
[2023-07-24 15:58:13,792] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-24 15:58:13,792] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-24 15:58:17,032] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-07-24 15:58:17,035] [INFO] [logging.py:96:log_dist] [Rank 0] Creating ZeRO Offload
Traceback (most recent call last):
  File "/home/commune/Andromeda/Andromeda/train.py", line 667, in <module>
  File "/home/commune/Andromeda/Andromeda/train.py", line 664, in main
  File "/home/commune/Andromeda/Andromeda/train.py", line 519, in Train
    beta_2=0.95, 
  File "/home/commune/Andromeda/Andromeda/train.py", line 294, in decoupled_optimizer
    # Create an empty list to store the names of the LayerNorm and Embedding layer weights with no weight decay.
AttributeError: 'tuple' object has no attribute 'named_parameters'
[2023-07-24 15:58:17,268] [INFO] [utils.py:785:see_memory_usage] DeepSpeedZeRoOffload initialize [begin]
[2023-07-24 15:58:17,268] [INFO] [utils.py:786:see_memory_usage] MA 0.68 GB         Max_MA 0.68 GB         CA 0.69 GB         Max_CA 1 GB 
[2023-07-24 15:58:17,268] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 18.35 GB, percent = 3.6%
Traceback (most recent call last):
  File "/home/commune/Andromeda/Andromeda/train.py", line 667, in <module>
  File "/home/commune/Andromeda/Andromeda/train.py", line 664, in main
  File "/home/commune/Andromeda/Andromeda/train.py", line 519, in Train
    beta_2=0.95, 
  File "/home/commune/Andromeda/Andromeda/train.py", line 294, in decoupled_optimizer
    # Create an empty list to store the names of the LayerNorm and Embedding layer weights with no weight decay.
AttributeError: 'tuple' object has no attribute 'named_parameters'
Traceback (most recent call last):
  File "/home/commune/Andromeda/Andromeda/train.py", line 667, in <module>
  File "/home/commune/Andromeda/Andromeda/train.py", line 664, in main
  File "/home/commune/Andromeda/Andromeda/train.py", line 519, in Train
    beta_2=0.95, 
  File "/home/commune/Andromeda/Andromeda/train.py", line 294, in decoupled_optimizer
    # Create an empty list to store the names of the LayerNorm and Embedding layer weights with no weight decay.
AttributeError: 'tuple' object has no attribute 'named_parameters'
Traceback (most recent call last):
  File "/home/commune/Andromeda/Andromeda/train.py", line 667, in <module>
  File "/home/commune/Andromeda/Andromeda/train.py", line 664, in main
  File "/home/commune/Andromeda/Andromeda/train.py", line 519, in Train
    beta_2=0.95, 
  File "/home/commune/Andromeda/Andromeda/train.py", line 294, in decoupled_optimizer
    # Create an empty list to store the names of the LayerNorm and Embedding layer weights with no weight decay.
AttributeError: 'tuple' object has no attribute 'named_parameters'
Parameter Offload: Total persistent parameters: 108032 in 490 params
Traceback (most recent call last):
  File "/home/commune/Andromeda/Andromeda/train.py", line 667, in <module>
  File "/home/commune/Andromeda/Andromeda/train.py", line 664, in main
  File "/home/commune/Andromeda/Andromeda/train.py", line 519, in Train
    beta_2=0.95, 
  File "/home/commune/Andromeda/Andromeda/train.py", line 294, in decoupled_optimizer
    # Create an empty list to store the names of the LayerNorm and Embedding layer weights with no weight decay.
AttributeError: 'tuple' object has no attribute 'named_parameters'
[2023-07-24 15:58:17,449] [INFO] [utils.py:785:see_memory_usage] DeepSpeedZeRoOffload initialize [end]
[2023-07-24 15:58:17,450] [INFO] [utils.py:786:see_memory_usage] MA 0.8 GB         Max_MA 0.8 GB         CA 0.8 GB         Max_CA 1 GB 
[2023-07-24 15:58:17,450] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 18.39 GB, percent = 3.7%
[2023-07-24 15:58:17,451] [INFO] [config.py:960:print] DeepSpeedEngine configuration:
[2023-07-24 15:58:17,451] [INFO] [config.py:964:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2023-07-24 15:58:17,451] [INFO] [config.py:964:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2023-07-24 15:58:17,451] [INFO] [config.py:964:print]   amp_enabled .................. False
[2023-07-24 15:58:17,451] [INFO] [config.py:964:print]   amp_params ................... False
[2023-07-24 15:58:17,451] [INFO] [config.py:964:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2023-07-24 15:58:17,451] [INFO] [config.py:964:print]   bfloat16_enabled ............. False
[2023-07-24 15:58:17,451] [INFO] [config.py:964:print]   checkpoint_parallel_write_pipeline  False
[2023-07-24 15:58:17,451] [INFO] [config.py:964:print]   checkpoint_tag_validation_enabled  True
[2023-07-24 15:58:17,451] [INFO] [config.py:964:print]   checkpoint_tag_validation_fail  False
[2023-07-24 15:58:17,451] [INFO] [config.py:964:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f06c428a950>
[2023-07-24 15:58:17,451] [INFO] [config.py:964:print]   communication_data_type ...... None
[2023-07-24 15:58:17,451] [INFO] [config.py:964:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2023-07-24 15:58:17,451] [INFO] [config.py:964:print]   curriculum_enabled_legacy .... False
[2023-07-24 15:58:17,451] [INFO] [config.py:964:print]   curriculum_params_legacy ..... False
[2023-07-24 15:58:17,451] [INFO] [config.py:964:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2023-07-24 15:58:17,451] [INFO] [config.py:964:print]   data_efficiency_enabled ...... False
[2023-07-24 15:58:17,451] [INFO] [config.py:964:print]   dataloader_drop_last ......... False
[2023-07-24 15:58:17,451] [INFO] [config.py:964:print]   disable_allgather ............ False
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   dump_state ................... False
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   dynamic_loss_scale_args ...... None
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   eigenvalue_enabled ........... False
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   eigenvalue_gas_boundary_resolution  1
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   eigenvalue_layer_num ......... 0
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   eigenvalue_max_iter .......... 100
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   eigenvalue_stability ......... 1e-06
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   eigenvalue_tol ............... 0.01
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   eigenvalue_verbose ........... False
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   elasticity_enabled ........... False
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   flops_profiler_config ........ {
    "enabled": false, 
    "recompute_fwd_factor": 0.0, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   fp16_auto_cast ............... True
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   fp16_enabled ................. True
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   fp16_master_weights_and_gradients  False
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   global_rank .................. 0
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   grad_accum_dtype ............. None
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   gradient_accumulation_steps .. 1
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   gradient_clipping ............ 0.0
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   gradient_predivide_factor .... 1.0
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   initial_dynamic_scale ........ 65536
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   load_universal_checkpoint .... False
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   loss_scale ................... 0
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   memory_breakdown ............. False
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   mics_hierarchial_params_gather  False
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   mics_shard_size .............. -1
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   optimizer_legacy_fusion ...... False
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   optimizer_name ............... None
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   optimizer_params ............. None
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   pld_enabled .................. False
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   pld_params ................... False
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   prescale_gradients ........... False
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   scheduler_name ............... None
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   scheduler_params ............. None
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   sparse_attention ............. None
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   sparse_gradients_enabled ..... False
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   steps_per_print .............. inf
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   train_batch_size ............. 18
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   train_micro_batch_size_per_gpu  3
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   use_node_local_storage ....... False
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   wall_clock_breakdown ......... False
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   world_size ................... 6
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   zero_allow_untested_optimizer  False
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   zero_config .................. stage=3 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=True stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   zero_enabled ................. True
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   zero_force_ds_cpu_optimizer .. True
[2023-07-24 15:58:17,452] [INFO] [config.py:964:print]   zero_optimization_stage ...... 3
[2023-07-24 15:58:17,453] [INFO] [config.py:950:print_user_config]   json = {
    "train_batch_size": 18, 
    "train_micro_batch_size_per_gpu": 3, 
    "gradient_accumulation_steps": 1, 
    "zero_optimization": {
        "stage": 3, 
        "offload_optimizer": {
            "device": "none", 
            "nvme_path": null
        }, 
        "offload_param": {
            "device": "none", 
            "nvme_path": null
        }, 
        "stage3_gather_16bit_weights_on_model_save": true
    }, 
    "steps_per_print": inf, 
    "fp16": {
        "enabled": true, 
        "auto_cast": true
    }, 
    "bf16": {
        "enabled": false
    }
}
Using stable_adamw optimizer
Traceback (most recent call last):
  File "/home/commune/Andromeda/Andromeda/train.py", line 667, in <module>
  File "/home/commune/Andromeda/Andromeda/train.py", line 664, in main
  File "/home/commune/Andromeda/Andromeda/train.py", line 519, in Train
    beta_2=0.95,
  File "/home/commune/Andromeda/Andromeda/train.py", line 294, in decoupled_optimizer
    # Create an empty list to store the names of the LayerNorm and Embedding layer weights with no weight decay.
AttributeError: 'tuple' object has no attribute 'named_parameters'
Traceback (most recent call last):
  File "/home/commune/Andromeda/Andromeda/train.py", line 667, in <module>
  File "/home/commune/Andromeda/Andromeda/train.py", line 664, in main
  File "/home/commune/Andromeda/Andromeda/train.py", line 519, in Train
    beta_2=0.95, 
  File "/home/commune/Andromeda/Andromeda/train.py", line 294, in decoupled_optimizer
    # Create an empty list to store the names of the LayerNorm and Embedding layer weights with no weight decay.
AttributeError: 'tuple' object has no attribute 'named_parameters'