INFO:nncf:Ignored adding weight sparsifier in scope: BertForSequenceClassification/BertModel[bert]/BertPooler[pooler]/NNCFLinear[dense]/linear_0 INFO:nncf:Ignored adding weight sparsifier in scope: BertForSequenceClassification/NNCFLinear[classifier]/linear_0 INFO:nncf:Not adding activation input quantizer for operation: 6 BertForSequenceClassification/BertModel[bert]/BertEmbeddings[embeddings]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 9 BertForSequenceClassification/BertModel[bert]/BertEmbeddings[embeddings]/NNCFLayerNorm[LayerNorm]/layer_norm_0 INFO:nncf:Not adding activation input quantizer for operation: 23 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[0]/BertAttention[attention]/BertSelfAttention[self]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 26 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[0]/BertAttention[attention]/BertSelfAttention[self]/matmul_1 INFO:nncf:Not adding activation input quantizer for operation: 32 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[0]/BertAttention[attention]/BertSelfOutput[output]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 33 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[0]/BertAttention[attention]/BertSelfOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0 INFO:nncf:Not adding activation input quantizer for operation: 38 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[0]/BertOutput[output]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 39 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[0]/BertOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0 INFO:nncf:Not adding activation input quantizer for operation: 52 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[1]/BertAttention[attention]/BertSelfAttention[self]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 55 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[1]/BertAttention[attention]/BertSelfAttention[self]/matmul_1 INFO:nncf:Not adding activation input quantizer for operation: 61 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[1]/BertAttention[attention]/BertSelfOutput[output]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 62 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[1]/BertAttention[attention]/BertSelfOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0 INFO:nncf:Not adding activation input quantizer for operation: 67 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[1]/BertOutput[output]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 68 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[1]/BertOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0 INFO:nncf:Not adding activation input quantizer for operation: 81 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[2]/BertAttention[attention]/BertSelfAttention[self]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 84 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[2]/BertAttention[attention]/BertSelfAttention[self]/matmul_1 INFO:nncf:Not adding activation input quantizer for operation: 90 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[2]/BertAttention[attention]/BertSelfOutput[output]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 91 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[2]/BertAttention[attention]/BertSelfOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0 INFO:nncf:Not adding activation input quantizer for operation: 96 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[2]/BertOutput[output]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 97 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[2]/BertOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0 INFO:nncf:Not adding activation input quantizer for operation: 110 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[3]/BertAttention[attention]/BertSelfAttention[self]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 113 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[3]/BertAttention[attention]/BertSelfAttention[self]/matmul_1 INFO:nncf:Not adding activation input quantizer for operation: 119 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[3]/BertAttention[attention]/BertSelfOutput[output]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 120 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[3]/BertAttention[attention]/BertSelfOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0 INFO:nncf:Not adding activation input quantizer for operation: 125 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[3]/BertOutput[output]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 126 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[3]/BertOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0 INFO:nncf:Not adding activation input quantizer for operation: 139 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[4]/BertAttention[attention]/BertSelfAttention[self]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 142 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[4]/BertAttention[attention]/BertSelfAttention[self]/matmul_1 INFO:nncf:Not adding activation input quantizer for operation: 148 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[4]/BertAttention[attention]/BertSelfOutput[output]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 149 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[4]/BertAttention[attention]/BertSelfOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0 INFO:nncf:Not adding activation input quantizer for operation: 154 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[4]/BertOutput[output]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 155 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[4]/BertOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0 INFO:nncf:Not adding activation input quantizer for operation: 168 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[5]/BertAttention[attention]/BertSelfAttention[self]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 171 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[5]/BertAttention[attention]/BertSelfAttention[self]/matmul_1 INFO:nncf:Not adding activation input quantizer for operation: 177 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[5]/BertAttention[attention]/BertSelfOutput[output]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 178 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[5]/BertAttention[attention]/BertSelfOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0 INFO:nncf:Not adding activation input quantizer for operation: 183 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[5]/BertOutput[output]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 184 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[5]/BertOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0 INFO:nncf:Not adding activation input quantizer for operation: 197 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[6]/BertAttention[attention]/BertSelfAttention[self]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 200 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[6]/BertAttention[attention]/BertSelfAttention[self]/matmul_1 INFO:nncf:Not adding activation input quantizer for operation: 206 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[6]/BertAttention[attention]/BertSelfOutput[output]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 207 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[6]/BertAttention[attention]/BertSelfOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0 INFO:nncf:Not adding activation input quantizer for operation: 212 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[6]/BertOutput[output]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 213 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[6]/BertOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0 INFO:nncf:Not adding activation input quantizer for operation: 226 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[7]/BertAttention[attention]/BertSelfAttention[self]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 229 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[7]/BertAttention[attention]/BertSelfAttention[self]/matmul_1 INFO:nncf:Not adding activation input quantizer for operation: 235 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[7]/BertAttention[attention]/BertSelfOutput[output]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 236 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[7]/BertAttention[attention]/BertSelfOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0 INFO:nncf:Not adding activation input quantizer for operation: 241 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[7]/BertOutput[output]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 242 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[7]/BertOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0 INFO:nncf:Not adding activation input quantizer for operation: 255 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[8]/BertAttention[attention]/BertSelfAttention[self]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 258 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[8]/BertAttention[attention]/BertSelfAttention[self]/matmul_1 INFO:nncf:Not adding activation input quantizer for operation: 264 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[8]/BertAttention[attention]/BertSelfOutput[output]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 265 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[8]/BertAttention[attention]/BertSelfOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0 INFO:nncf:Not adding activation input quantizer for operation: 270 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[8]/BertOutput[output]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 271 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[8]/BertOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0 INFO:nncf:Not adding activation input quantizer for operation: 284 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[9]/BertAttention[attention]/BertSelfAttention[self]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 287 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[9]/BertAttention[attention]/BertSelfAttention[self]/matmul_1 INFO:nncf:Not adding activation input quantizer for operation: 293 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[9]/BertAttention[attention]/BertSelfOutput[output]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 294 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[9]/BertAttention[attention]/BertSelfOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0 INFO:nncf:Not adding activation input quantizer for operation: 299 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[9]/BertOutput[output]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 300 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[9]/BertOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0 INFO:nncf:Not adding activation input quantizer for operation: 313 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[10]/BertAttention[attention]/BertSelfAttention[self]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 316 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[10]/BertAttention[attention]/BertSelfAttention[self]/matmul_1 INFO:nncf:Not adding activation input quantizer for operation: 322 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[10]/BertAttention[attention]/BertSelfOutput[output]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 323 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[10]/BertAttention[attention]/BertSelfOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0 INFO:nncf:Not adding activation input quantizer for operation: 328 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[10]/BertOutput[output]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 329 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[10]/BertOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0 INFO:nncf:Not adding activation input quantizer for operation: 342 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[11]/BertAttention[attention]/BertSelfAttention[self]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 345 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[11]/BertAttention[attention]/BertSelfAttention[self]/matmul_1 INFO:nncf:Not adding activation input quantizer for operation: 351 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[11]/BertAttention[attention]/BertSelfOutput[output]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 352 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[11]/BertAttention[attention]/BertSelfOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0 INFO:nncf:Not adding activation input quantizer for operation: 357 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[11]/BertOutput[output]/__add___0 INFO:nncf:Not adding activation input quantizer for operation: 358 BertForSequenceClassification/BertModel[bert]/BertEncoder[encoder]/ModuleList[layer]/BertLayer[11]/BertOutput[output]/NNCFLayerNorm[LayerNorm]/layer_norm_0 INFO:nncf:Collecting tensor statistics |████████████████| 1 / 1 INFO:nncf:BatchNorm statistics adaptation |██ | 1 / 7 INFO:nncf:BatchNorm statistics adaptation |████ | 2 / 7 INFO:nncf:BatchNorm statistics adaptation |██████ | 3 / 7 INFO:nncf:BatchNorm statistics adaptation |█████████ | 4 / 7 INFO:nncf:BatchNorm statistics adaptation |███████████ | 5 / 7 INFO:nncf:BatchNorm statistics adaptation |█████████████ | 6 / 7 INFO:nncf:BatchNorm statistics adaptation |████████████████| 7 / 7 WARNING:nncf:Number of potential building blocks is too much. The processing time can be high. Shallow the accepted range for the length of building blocks via max_block_size and min_block_size to accelerate the search process. INFO:nncf:Movement sparsity scheduler updates importance threshold and regularizationfactor per optimizer step, but steps_per_epoch was not set in config. Will measure the actual steps per epoch as signaled by a .epoch_step() call. INFO:nncf:Statistics of the sparsified model: Epoch 0 |+-----------------------------------------+-------+ Epoch 0 || Statistic's name | Value | Epoch 0 |+=========================================+=======+ Epoch 0 || Sparsity level of the whole model | 0.000 | Epoch 0 |+-----------------------------------------+-------+ Epoch 0 || Sparsity level of all sparsified layers | 0 | Epoch 0 |+-----------------------------------------+-------+ Epoch 0 | Epoch 0 |Statistics by sparsified layers: Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || Layer's name | Weight's shape | Sparsity level | Weight's percentage | Epoch 0 |+======================+================+================+=====================+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[0]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[qu | | | | Epoch 0 || ery]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[0]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[qu | | | | Epoch 0 || ery]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[0]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[ke | | | | Epoch 0 || y]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[0]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[ke | | | | Epoch 0 || y]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[0]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[va | | | | Epoch 0 || lue]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[0]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[va | | | | Epoch 0 || lue]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[0]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfOutput[ou | | | | Epoch 0 || tput]/NNCFLinear[den | | | | Epoch 0 || se]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[0]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfOutput[ou | | | | Epoch 0 || tput]/NNCFLinear[den | | | | Epoch 0 || se]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [3072, 768] | 0 | 2.775 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[0]/Be | | | | Epoch 0 || rtIntermediate[inter | | | | Epoch 0 || mediate]/NNCFLinear[ | | | | Epoch 0 || dense]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [3072] | 0 | 0.004 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[0]/Be | | | | Epoch 0 || rtIntermediate[inter | | | | Epoch 0 || mediate]/NNCFLinear[ | | | | Epoch 0 || dense]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 3072] | 0 | 2.775 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[0]/Be | | | | Epoch 0 || rtOutput[output]/NNC | | | | Epoch 0 || FLinear[dense]/linea | | | | Epoch 0 || r_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[0]/Be | | | | Epoch 0 || rtOutput[output]/NNC | | | | Epoch 0 || FLinear[dense]/linea | | | | Epoch 0 || r_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[1]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[qu | | | | Epoch 0 || ery]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[1]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[qu | | | | Epoch 0 || ery]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[1]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[ke | | | | Epoch 0 || y]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[1]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[ke | | | | Epoch 0 || y]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[1]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[va | | | | Epoch 0 || lue]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[1]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[va | | | | Epoch 0 || lue]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[1]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfOutput[ou | | | | Epoch 0 || tput]/NNCFLinear[den | | | | Epoch 0 || se]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[1]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfOutput[ou | | | | Epoch 0 || tput]/NNCFLinear[den | | | | Epoch 0 || se]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [3072, 768] | 0 | 2.775 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[1]/Be | | | | Epoch 0 || rtIntermediate[inter | | | | Epoch 0 || mediate]/NNCFLinear[ | | | | Epoch 0 || dense]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [3072] | 0 | 0.004 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[1]/Be | | | | Epoch 0 || rtIntermediate[inter | | | | Epoch 0 || mediate]/NNCFLinear[ | | | | Epoch 0 || dense]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 3072] | 0 | 2.775 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[1]/Be | | | | Epoch 0 || rtOutput[output]/NNC | | | | Epoch 0 || FLinear[dense]/linea | | | | Epoch 0 || r_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[1]/Be | | | | Epoch 0 || rtOutput[output]/NNC | | | | Epoch 0 || FLinear[dense]/linea | | | | Epoch 0 || r_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[2]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[qu | | | | Epoch 0 || ery]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[2]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[qu | | | | Epoch 0 || ery]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[2]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[ke | | | | Epoch 0 || y]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[2]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[ke | | | | Epoch 0 || y]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[2]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[va | | | | Epoch 0 || lue]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[2]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[va | | | | Epoch 0 || lue]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[2]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfOutput[ou | | | | Epoch 0 || tput]/NNCFLinear[den | | | | Epoch 0 || se]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[2]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfOutput[ou | | | | Epoch 0 || tput]/NNCFLinear[den | | | | Epoch 0 || se]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [3072, 768] | 0 | 2.775 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[2]/Be | | | | Epoch 0 || rtIntermediate[inter | | | | Epoch 0 || mediate]/NNCFLinear[ | | | | Epoch 0 || dense]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [3072] | 0 | 0.004 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[2]/Be | | | | Epoch 0 || rtIntermediate[inter | | | | Epoch 0 || mediate]/NNCFLinear[ | | | | Epoch 0 || dense]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 3072] | 0 | 2.775 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[2]/Be | | | | Epoch 0 || rtOutput[output]/NNC | | | | Epoch 0 || FLinear[dense]/linea | | | | Epoch 0 || r_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[2]/Be | | | | Epoch 0 || rtOutput[output]/NNC | | | | Epoch 0 || FLinear[dense]/linea | | | | Epoch 0 || r_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[3]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[qu | | | | Epoch 0 || ery]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[3]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[qu | | | | Epoch 0 || ery]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[3]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[ke | | | | Epoch 0 || y]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[3]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[ke | | | | Epoch 0 || y]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[3]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[va | | | | Epoch 0 || lue]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[3]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[va | | | | Epoch 0 || lue]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[3]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfOutput[ou | | | | Epoch 0 || tput]/NNCFLinear[den | | | | Epoch 0 || se]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[3]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfOutput[ou | | | | Epoch 0 || tput]/NNCFLinear[den | | | | Epoch 0 || se]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [3072, 768] | 0 | 2.775 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[3]/Be | | | | Epoch 0 || rtIntermediate[inter | | | | Epoch 0 || mediate]/NNCFLinear[ | | | | Epoch 0 || dense]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [3072] | 0 | 0.004 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[3]/Be | | | | Epoch 0 || rtIntermediate[inter | | | | Epoch 0 || mediate]/NNCFLinear[ | | | | Epoch 0 || dense]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 3072] | 0 | 2.775 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[3]/Be | | | | Epoch 0 || rtOutput[output]/NNC | | | | Epoch 0 || FLinear[dense]/linea | | | | Epoch 0 || r_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[3]/Be | | | | Epoch 0 || rtOutput[output]/NNC | | | | Epoch 0 || FLinear[dense]/linea | | | | Epoch 0 || r_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[4]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[qu | | | | Epoch 0 || ery]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[4]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[qu | | | | Epoch 0 || ery]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[4]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[ke | | | | Epoch 0 || y]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[4]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[ke | | | | Epoch 0 || y]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[4]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[va | | | | Epoch 0 || lue]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[4]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[va | | | | Epoch 0 || lue]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[4]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfOutput[ou | | | | Epoch 0 || tput]/NNCFLinear[den | | | | Epoch 0 || se]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[4]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfOutput[ou | | | | Epoch 0 || tput]/NNCFLinear[den | | | | Epoch 0 || se]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [3072, 768] | 0 | 2.775 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[4]/Be | | | | Epoch 0 || rtIntermediate[inter | | | | Epoch 0 || mediate]/NNCFLinear[ | | | | Epoch 0 || dense]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [3072] | 0 | 0.004 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[4]/Be | | | | Epoch 0 || rtIntermediate[inter | | | | Epoch 0 || mediate]/NNCFLinear[ | | | | Epoch 0 || dense]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 3072] | 0 | 2.775 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[4]/Be | | | | Epoch 0 || rtOutput[output]/NNC | | | | Epoch 0 || FLinear[dense]/linea | | | | Epoch 0 || r_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[4]/Be | | | | Epoch 0 || rtOutput[output]/NNC | | | | Epoch 0 || FLinear[dense]/linea | | | | Epoch 0 || r_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[5]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[qu | | | | Epoch 0 || ery]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[5]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[qu | | | | Epoch 0 || ery]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[5]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[ke | | | | Epoch 0 || y]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[5]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[ke | | | | Epoch 0 || y]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[5]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[va | | | | Epoch 0 || lue]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[5]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[va | | | | Epoch 0 || lue]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[5]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfOutput[ou | | | | Epoch 0 || tput]/NNCFLinear[den | | | | Epoch 0 || se]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[5]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfOutput[ou | | | | Epoch 0 || tput]/NNCFLinear[den | | | | Epoch 0 || se]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [3072, 768] | 0 | 2.775 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[5]/Be | | | | Epoch 0 || rtIntermediate[inter | | | | Epoch 0 || mediate]/NNCFLinear[ | | | | Epoch 0 || dense]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [3072] | 0 | 0.004 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[5]/Be | | | | Epoch 0 || rtIntermediate[inter | | | | Epoch 0 || mediate]/NNCFLinear[ | | | | Epoch 0 || dense]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 3072] | 0 | 2.775 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[5]/Be | | | | Epoch 0 || rtOutput[output]/NNC | | | | Epoch 0 || FLinear[dense]/linea | | | | Epoch 0 || r_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[5]/Be | | | | Epoch 0 || rtOutput[output]/NNC | | | | Epoch 0 || FLinear[dense]/linea | | | | Epoch 0 || r_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[6]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[qu | | | | Epoch 0 || ery]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[6]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[qu | | | | Epoch 0 || ery]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[6]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[ke | | | | Epoch 0 || y]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[6]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[ke | | | | Epoch 0 || y]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[6]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[va | | | | Epoch 0 || lue]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[6]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[va | | | | Epoch 0 || lue]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[6]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfOutput[ou | | | | Epoch 0 || tput]/NNCFLinear[den | | | | Epoch 0 || se]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[6]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfOutput[ou | | | | Epoch 0 || tput]/NNCFLinear[den | | | | Epoch 0 || se]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [3072, 768] | 0 | 2.775 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[6]/Be | | | | Epoch 0 || rtIntermediate[inter | | | | Epoch 0 || mediate]/NNCFLinear[ | | | | Epoch 0 || dense]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [3072] | 0 | 0.004 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[6]/Be | | | | Epoch 0 || rtIntermediate[inter | | | | Epoch 0 || mediate]/NNCFLinear[ | | | | Epoch 0 || dense]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 3072] | 0 | 2.775 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[6]/Be | | | | Epoch 0 || rtOutput[output]/NNC | | | | Epoch 0 || FLinear[dense]/linea | | | | Epoch 0 || r_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[6]/Be | | | | Epoch 0 || rtOutput[output]/NNC | | | | Epoch 0 || FLinear[dense]/linea | | | | Epoch 0 || r_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[7]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[qu | | | | Epoch 0 || ery]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[7]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[qu | | | | Epoch 0 || ery]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[7]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[ke | | | | Epoch 0 || y]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[7]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[ke | | | | Epoch 0 || y]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[7]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[va | | | | Epoch 0 || lue]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[7]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[va | | | | Epoch 0 || lue]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[7]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfOutput[ou | | | | Epoch 0 || tput]/NNCFLinear[den | | | | Epoch 0 || se]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[7]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfOutput[ou | | | | Epoch 0 || tput]/NNCFLinear[den | | | | Epoch 0 || se]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [3072, 768] | 0 | 2.775 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[7]/Be | | | | Epoch 0 || rtIntermediate[inter | | | | Epoch 0 || mediate]/NNCFLinear[ | | | | Epoch 0 || dense]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [3072] | 0 | 0.004 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[7]/Be | | | | Epoch 0 || rtIntermediate[inter | | | | Epoch 0 || mediate]/NNCFLinear[ | | | | Epoch 0 || dense]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 3072] | 0 | 2.775 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[7]/Be | | | | Epoch 0 || rtOutput[output]/NNC | | | | Epoch 0 || FLinear[dense]/linea | | | | Epoch 0 || r_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[7]/Be | | | | Epoch 0 || rtOutput[output]/NNC | | | | Epoch 0 || FLinear[dense]/linea | | | | Epoch 0 || r_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[8]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[qu | | | | Epoch 0 || ery]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[8]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[qu | | | | Epoch 0 || ery]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[8]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[ke | | | | Epoch 0 || y]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[8]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[ke | | | | Epoch 0 || y]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[8]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[va | | | | Epoch 0 || lue]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[8]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[va | | | | Epoch 0 || lue]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[8]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfOutput[ou | | | | Epoch 0 || tput]/NNCFLinear[den | | | | Epoch 0 || se]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[8]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfOutput[ou | | | | Epoch 0 || tput]/NNCFLinear[den | | | | Epoch 0 || se]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [3072, 768] | 0 | 2.775 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[8]/Be | | | | Epoch 0 || rtIntermediate[inter | | | | Epoch 0 || mediate]/NNCFLinear[ | | | | Epoch 0 || dense]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [3072] | 0 | 0.004 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[8]/Be | | | | Epoch 0 || rtIntermediate[inter | | | | Epoch 0 || mediate]/NNCFLinear[ | | | | Epoch 0 || dense]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 3072] | 0 | 2.775 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[8]/Be | | | | Epoch 0 || rtOutput[output]/NNC | | | | Epoch 0 || FLinear[dense]/linea | | | | Epoch 0 || r_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[8]/Be | | | | Epoch 0 || rtOutput[output]/NNC | | | | Epoch 0 || FLinear[dense]/linea | | | | Epoch 0 || r_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[9]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[qu | | | | Epoch 0 || ery]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[9]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[qu | | | | Epoch 0 || ery]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[9]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[ke | | | | Epoch 0 || y]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[9]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[ke | | | | Epoch 0 || y]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[9]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[va | | | | Epoch 0 || lue]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[9]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfAttention | | | | Epoch 0 || [self]/NNCFLinear[va | | | | Epoch 0 || lue]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[9]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfOutput[ou | | | | Epoch 0 || tput]/NNCFLinear[den | | | | Epoch 0 || se]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[9]/Be | | | | Epoch 0 || rtAttention[attentio | | | | Epoch 0 || n]/BertSelfOutput[ou | | | | Epoch 0 || tput]/NNCFLinear[den | | | | Epoch 0 || se]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [3072, 768] | 0 | 2.775 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[9]/Be | | | | Epoch 0 || rtIntermediate[inter | | | | Epoch 0 || mediate]/NNCFLinear[ | | | | Epoch 0 || dense]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [3072] | 0 | 0.004 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[9]/Be | | | | Epoch 0 || rtIntermediate[inter | | | | Epoch 0 || mediate]/NNCFLinear[ | | | | Epoch 0 || dense]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 3072] | 0 | 2.775 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[9]/Be | | | | Epoch 0 || rtOutput[output]/NNC | | | | Epoch 0 || FLinear[dense]/linea | | | | Epoch 0 || r_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[9]/Be | | | | Epoch 0 || rtOutput[output]/NNC | | | | Epoch 0 || FLinear[dense]/linea | | | | Epoch 0 || r_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[10]/B | | | | Epoch 0 || ertAttention[attenti | | | | Epoch 0 || on]/BertSelfAttentio | | | | Epoch 0 || n[self]/NNCFLinear[q | | | | Epoch 0 || uery]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[10]/B | | | | Epoch 0 || ertAttention[attenti | | | | Epoch 0 || on]/BertSelfAttentio | | | | Epoch 0 || n[self]/NNCFLinear[q | | | | Epoch 0 || uery]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[10]/B | | | | Epoch 0 || ertAttention[attenti | | | | Epoch 0 || on]/BertSelfAttentio | | | | Epoch 0 || n[self]/NNCFLinear[k | | | | Epoch 0 || ey]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[10]/B | | | | Epoch 0 || ertAttention[attenti | | | | Epoch 0 || on]/BertSelfAttentio | | | | Epoch 0 || n[self]/NNCFLinear[k | | | | Epoch 0 || ey]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[10]/B | | | | Epoch 0 || ertAttention[attenti | | | | Epoch 0 || on]/BertSelfAttentio | | | | Epoch 0 || n[self]/NNCFLinear[v | | | | Epoch 0 || alue]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[10]/B | | | | Epoch 0 || ertAttention[attenti | | | | Epoch 0 || on]/BertSelfAttentio | | | | Epoch 0 || n[self]/NNCFLinear[v | | | | Epoch 0 || alue]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[10]/B | | | | Epoch 0 || ertAttention[attenti | | | | Epoch 0 || on]/BertSelfOutput[o | | | | Epoch 0 || utput]/NNCFLinear[de | | | | Epoch 0 || nse]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[10]/B | | | | Epoch 0 || ertAttention[attenti | | | | Epoch 0 || on]/BertSelfOutput[o | | | | Epoch 0 || utput]/NNCFLinear[de | | | | Epoch 0 || nse]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [3072, 768] | 0 | 2.775 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[10]/B | | | | Epoch 0 || ertIntermediate[inte | | | | Epoch 0 || rmediate]/NNCFLinear | | | | Epoch 0 || [dense]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [3072] | 0 | 0.004 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[10]/B | | | | Epoch 0 || ertIntermediate[inte | | | | Epoch 0 || rmediate]/NNCFLinear | | | | Epoch 0 || [dense]/linear_0/bia | | | | Epoch 0 || s | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 3072] | 0 | 2.775 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[10]/B | | | | Epoch 0 || ertOutput[output]/NN | | | | Epoch 0 || CFLinear[dense]/line | | | | Epoch 0 || ar_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[10]/B | | | | Epoch 0 || ertOutput[output]/NN | | | | Epoch 0 || CFLinear[dense]/line | | | | Epoch 0 || ar_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[11]/B | | | | Epoch 0 || ertAttention[attenti | | | | Epoch 0 || on]/BertSelfAttentio | | | | Epoch 0 || n[self]/NNCFLinear[q | | | | Epoch 0 || uery]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[11]/B | | | | Epoch 0 || ertAttention[attenti | | | | Epoch 0 || on]/BertSelfAttentio | | | | Epoch 0 || n[self]/NNCFLinear[q | | | | Epoch 0 || uery]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[11]/B | | | | Epoch 0 || ertAttention[attenti | | | | Epoch 0 || on]/BertSelfAttentio | | | | Epoch 0 || n[self]/NNCFLinear[k | | | | Epoch 0 || ey]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[11]/B | | | | Epoch 0 || ertAttention[attenti | | | | Epoch 0 || on]/BertSelfAttentio | | | | Epoch 0 || n[self]/NNCFLinear[k | | | | Epoch 0 || ey]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[11]/B | | | | Epoch 0 || ertAttention[attenti | | | | Epoch 0 || on]/BertSelfAttentio | | | | Epoch 0 || n[self]/NNCFLinear[v | | | | Epoch 0 || alue]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[11]/B | | | | Epoch 0 || ertAttention[attenti | | | | Epoch 0 || on]/BertSelfAttentio | | | | Epoch 0 || n[self]/NNCFLinear[v | | | | Epoch 0 || alue]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 768] | 0 | 0.694 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[11]/B | | | | Epoch 0 || ertAttention[attenti | | | | Epoch 0 || on]/BertSelfOutput[o | | | | Epoch 0 || utput]/NNCFLinear[de | | | | Epoch 0 || nse]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[11]/B | | | | Epoch 0 || ertAttention[attenti | | | | Epoch 0 || on]/BertSelfOutput[o | | | | Epoch 0 || utput]/NNCFLinear[de | | | | Epoch 0 || nse]/linear_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [3072, 768] | 0 | 2.775 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[11]/B | | | | Epoch 0 || ertIntermediate[inte | | | | Epoch 0 || rmediate]/NNCFLinear | | | | Epoch 0 || [dense]/linear_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [3072] | 0 | 0.004 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[11]/B | | | | Epoch 0 || ertIntermediate[inte | | | | Epoch 0 || rmediate]/NNCFLinear | | | | Epoch 0 || [dense]/linear_0/bia | | | | Epoch 0 || s | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768, 3072] | 0 | 2.775 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[11]/B | | | | Epoch 0 || ertOutput[output]/NN | | | | Epoch 0 || CFLinear[dense]/line | | | | Epoch 0 || ar_0 | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 || BertForSequenceClass | [768] | 0 | 0.001 | Epoch 0 || ification/BertModel[ | | | | Epoch 0 || bert]/BertEncoder[en | | | | Epoch 0 || coder]/ModuleList[la | | | | Epoch 0 || yer]/BertLayer[11]/B | | | | Epoch 0 || ertOutput[output]/NN | | | | Epoch 0 || CFLinear[dense]/line | | | | Epoch 0 || ar_0/bias | | | | Epoch 0 |+----------------------+----------------+----------------+---------------------+ Epoch 0 | Epoch 0 |Statistics of the movement-sparsity algorithm: Epoch 0 |+----------------------------------+-------+ Epoch 0 || Statistic's name | Value | Epoch 0 |+==================================+=======+ Epoch 0 || Mask Importance Threshold | -inf | Epoch 0 |+----------------------------------+-------+ Epoch 0 || Importance Regularization Factor | 0 | Epoch 0 |+----------------------------------+-------+ Epoch 0 | Epoch 0 |Statistics of the quantization algorithm: Epoch 0 |+--------------------------------+-------+ Epoch 0 || Statistic's name | Value | Epoch 0 |+================================+=======+ Epoch 0 || Ratio of enabled quantizations | 100 | Epoch 0 |+--------------------------------+-------+ Epoch 0 | Epoch 0 |Statistics of the quantization share: Epoch 0 |+----------------------------------+----------------------+ Epoch 0 || Statistic's name | Value | Epoch 0 |+==================================+======================+ Epoch 0 || Symmetric WQs / All placed WQs | 100.00 % (77 / 77) | Epoch 0 |+----------------------------------+----------------------+ Epoch 0 || Asymmetric WQs / All placed WQs | 0.00 % (0 / 77) | Epoch 0 |+----------------------------------+----------------------+ Epoch 0 || Signed WQs / All placed WQs | 100.00 % (77 / 77) | Epoch 0 |+----------------------------------+----------------------+ Epoch 0 || Unsigned WQs / All placed WQs | 0.00 % (0 / 77) | Epoch 0 |+----------------------------------+----------------------+ Epoch 0 || Per-tensor WQs / All placed WQs | 3.90 % (3 / 77) | Epoch 0 |+----------------------------------+----------------------+ Epoch 0 || Per-channel WQs / All placed WQs | 96.10 % (74 / 77) | Epoch 0 |+----------------------------------+----------------------+ Epoch 0 || Placed WQs / Potential WQs | 75.49 % (77 / 102) | Epoch 0 |+----------------------------------+----------------------+ Epoch 0 || Symmetric AQs / All placed AQs | 23.76 % (24 / 101) | Epoch 0 |+----------------------------------+----------------------+ Epoch 0 || Asymmetric AQs / All placed AQs | 76.24 % (77 / 101) | Epoch 0 |+----------------------------------+----------------------+ Epoch 0 || Signed AQs / All placed AQs | 100.00 % (101 / 101) | Epoch 0 |+----------------------------------+----------------------+ Epoch 0 || Unsigned AQs / All placed AQs | 0.00 % (0 / 101) | Epoch 0 |+----------------------------------+----------------------+ Epoch 0 || Per-tensor AQs / All placed AQs | 100.00 % (101 / 101) | Epoch 0 |+----------------------------------+----------------------+ Epoch 0 || Per-channel AQs / All placed AQs | 0.00 % (0 / 101) | Epoch 0 |+----------------------------------+----------------------+ Epoch 0 | Epoch 0 |Statistics of the bitwidth distribution: Epoch 0 |+--------------+---------------------+--------------------+--------------------+ Epoch 0 || Num bits (N) | N-bits WQs / Placed | N-bits AQs / | N-bits Qs / Placed | Epoch 0 || | WQs | Placed AQs | Qs | Epoch 0 |+==============+=====================+====================+====================+ Epoch 0 || 8 | 100.00 % (77 / 77) | 100.00 % (101 / | 100.00 % (178 / | Epoch 0 || | | 101) | 178) | Epoch 0 |+--------------+---------------------+--------------------+--------------------+ INFO:nncf:Movement sparsity scheduler updates importance threshold and regularizationfactor per optimizer step, but steps_per_epoch was not set in config. Will measure the actual steps per epoch as signaled by a .epoch_step() call.