Spaces:
Sleeping
Sleeping
File size: 87,642 Bytes
fd31a8c 0803c45 fd31a8c 0803c45 fd31a8c 0803c45 fd31a8c 0803c45 fd31a8c 0803c45 fd31a8c 0803c45 fd31a8c 0803c45 fd31a8c 0803c45 fd31a8c 0803c45 fd31a8c 0803c45 fd31a8c 0803c45 fd31a8c 0803c45 fd31a8c 0803c45 fd31a8c 0803c45 fd31a8c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 |
[
{
"path": "table_paper/2407.00115v3.json",
"table_id": "2",
"section": "5.1",
"all_context": [
"CIFAR-100: image classification.",
"As shown in Table 1 , we conduct image classification on the CIFAR-100 dataset to demonstrate the generalization performance of our RLKD method across 11 teacher-student pairs, including RN-56 & RN-20, etc.",
"Among them, 5 pairs of teacher and student models (VGG-13 & MN-V2, etc.)",
"are characterized by distinguishing architectural frameworks.",
"These experimental designs we employed provide a diverse and comprehensive assessment environment.",
"When the teacher and student networks share the same architecture, the experimental results show that our RLKD method has a strong generalization capacity, also exhibits a superior performance compared to CTKD.",
"Specifically, in the case of RN-110 & RN-20, our method outperforms Vanilla KD by 0.78% (71.44% vs 70.66%) and CTKD by 0.36% (71.44% vs 71.08%).",
"Moreover, in the case where the teacher and student networks have different architectures, the powerful generalization capacity of our RLKD is also validated.",
"To validate the generalization of our RLKD method across different KD frameworks, we conduct experiments on 6 currently leading KD frameworks (see Table 3 ), including DKD, PKT, etc.",
"When applied to the teacher-student pair RN110 & RN32, our RLKD brings an improvement of 0.61% (74.27% vs 73.66%) in the DKD framework, which surpasses the accuracy of CTKD by 0.36% (74.27% vs 73.91%).",
"Experiments conducted on other 5 KD frameworks (e.g.",
"PKT, etc.)",
"further confirm the strong generalization of our RLKD.",
"Both the accuracy and stability of the proposed RLKD are significantly superior to CTKD, this can be attributed to our RLKD method considers the future rewards of the instance temperature adjustment operations.",
"ImageNet: image classification.",
"To validate the scalability of our method and its applicability in complex scenarios involving large datasets, we further conduct image classification on ImageNet.",
"Table 2 details the top-1 and top-5 accuracy.",
"Using CTKD and our RLKD as the adaptable plug-in approach, we incorporate them into 5 current leading distillation frameworks (i.e.",
"KD, PKT, RKD, SRRL, and DKD).",
"The experimental results obtained from these 5 KD frameworks unequivocally demonstrate the excellent scalability of our method.",
"Remarkably, our RLKD exhibits robust performance on large dataset like ImageNet.",
"For instance, in the Vanilla KD and SRRL frameworks, our method achieves improvement of 0.2% (90.51% vs 90.31%) and 0.11% (90.52% vs 90.41%) respectively.",
"In contrast, CTKD obtains much fewer improvement on these KD frameworks, with gains of just 0.02% (90.33% vs 90.31%) and 0.01% (90.42% vs 90.41%) respectively, about 10 times lower.",
"We think the superior performance of RLKD can be attributed to its RL-based framework in instance temperature adjustment, which considers the future benefits of these adjustments.",
"Additionally, unlike CTKD, our RLKD also takes into account the student model s grasp of individual instances during instance temperature adjustment.",
"MS-COCO: object detection.",
"To verify whether our RLKD method possesses robustness across other visual tasks, we execute object detection on the MS-COCO dataset.",
"As shown in Table 4 , in the case of RN-50 & MN-V2, regarding the mAP metric, our RLKD outperforms Vanilla KD by 1.36% (31.49% vs 30.13%) and CTKD by 0.28% (31.49% vs 31.21%), respectively.",
"Additionally, for detecting objects with varying sizes – evaluated by the AP metrics for large (APl), medium (APm) and small (APs) objects, our RLKD also shows a significant enhancement, consistently surpasses CTKD across all size categories.",
"Results demonstrate the robustness of our approach, where instance temperature adjustment is treated as a sequential decision-making task, enabling consideration of future benefits.",
""
],
"target_context_ids": [
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26
],
"selected_paragraphs": [
"[paragraph id = 16] Table 2 details the top-1 and top-5 accuracy.",
"[paragraph id = 17] Using CTKD and our RLKD as the adaptable plug-in approach, we incorporate them into 5 current leading distillation frameworks (i.e.",
"[paragraph id = 18] KD, PKT, RKD, SRRL, and DKD).",
"[paragraph id = 19] The experimental results obtained from these 5 KD frameworks unequivocally demonstrate the excellent scalability of our method.",
"[paragraph id = 20] Remarkably, our RLKD exhibits robust performance on large dataset like ImageNet.",
"[paragraph id = 21] For instance, in the Vanilla KD and SRRL frameworks, our method achieves improvement of 0.2% (90.51% vs 90.31%) and 0.11% (90.52% vs 90.41%) respectively.",
"[paragraph id = 22] In contrast, CTKD obtains much fewer improvement on these KD frameworks, with gains of just 0.02% (90.33% vs 90.31%) and 0.01% (90.42% vs 90.41%) respectively, about 10 times lower.",
"[paragraph id = 23] We think the superior performance of RLKD can be attributed to its RL-based framework in instance temperature adjustment, which considers the future benefits of these adjustments.",
"[paragraph id = 24] Additionally, unlike CTKD, our RLKD also takes into account the student model s grasp of individual instances during instance temperature adjustment.",
"[paragraph id = 25] MS-COCO: object detection.",
"[paragraph id = 26] To verify whether our RLKD method possesses robustness across other visual tasks, we execute object detection on the MS-COCO dataset."
],
"table_html": "<figure class=\"ltx_table\" id=\"S5.T2\">\n<div class=\"ltx_inline-block ltx_align_center ltx_transformed_outer\" id=\"S5.T2.2\" style=\"width:474.1pt;height:33pt;vertical-align:-0.6pt;\"><span class=\"ltx_transformed_inner\" style=\"transform:translate(-158.0pt,10.8pt) scale(0.6,0.6) ;\">\n<table class=\"ltx_tabular ltx_guessed_headers ltx_align_middle\" id=\"S5.T2.2.1\">\n<thead class=\"ltx_thead\">\n<tr class=\"ltx_tr\" id=\"S5.T2.2.1.1.1\">\n<th class=\"ltx_td ltx_th ltx_th_column ltx_th_row ltx_border_tt\" id=\"S5.T2.2.1.1.1.1\"></th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S5.T2.2.1.1.1.2\">Teacher</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_tt\" id=\"S5.T2.2.1.1.1.3\">Student</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_th_row ltx_border_tt\" id=\"S5.T2.2.1.1.1.4\">Vanilla KD</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S5.T2.2.1.1.1.5\">+CTKD</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_tt\" id=\"S5.T2.2.1.1.1.6\">+Ours</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_th_row ltx_border_tt\" id=\"S5.T2.2.1.1.1.7\">PKT</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S5.T2.2.1.1.1.8\">+CTKD</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_tt\" id=\"S5.T2.2.1.1.1.9\">+Ours</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_th_row ltx_border_tt\" id=\"S5.T2.2.1.1.1.10\">RKD</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S5.T2.2.1.1.1.11\">+CTKD</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_tt\" id=\"S5.T2.2.1.1.1.12\">+Ours</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_th_row ltx_border_tt\" id=\"S5.T2.2.1.1.1.13\">SRRL</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S5.T2.2.1.1.1.14\">+CTKD</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_tt\" id=\"S5.T2.2.1.1.1.15\">+Ours</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_th_row ltx_border_tt\" id=\"S5.T2.2.1.1.1.16\">DKD</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S5.T2.2.1.1.1.17\">+CTKD</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S5.T2.2.1.1.1.18\">+Ours</th>\n</tr>\n</thead>\n<tbody class=\"ltx_tbody\">\n<tr class=\"ltx_tr\" id=\"S5.T2.2.1.2.1\">\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_t\" id=\"S5.T2.2.1.2.1.1\">Top-1</th>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T2.2.1.2.1.2\">73.96</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S5.T2.2.1.2.1.3\">70.26</td>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_t\" id=\"S5.T2.2.1.2.1.4\">70.83</th>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T2.2.1.2.1.5\">71.28</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S5.T2.2.1.2.1.6\">71.39</td>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_t\" id=\"S5.T2.2.1.2.1.7\">70.92</th>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T2.2.1.2.1.8\">71.31</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S5.T2.2.1.2.1.9\">71.53</td>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_t\" id=\"S5.T2.2.1.2.1.10\">70.94</th>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T2.2.1.2.1.11\">71.13</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S5.T2.2.1.2.1.12\">71.37</td>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_t\" id=\"S5.T2.2.1.2.1.13\">71.01</th>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T2.2.1.2.1.14\">71.25</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S5.T2.2.1.2.1.15\">71.38</td>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_t\" id=\"S5.T2.2.1.2.1.16\">71.13</th>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T2.2.1.2.1.17\">71.47</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T2.2.1.2.1.18\">71.62</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T2.2.1.3.2\">\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_bb\" id=\"S5.T2.2.1.3.2.1\">Top-5</th>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T2.2.1.3.2.2\">91.58</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb ltx_border_r\" id=\"S5.T2.2.1.3.2.3\">89.50</td>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_bb\" id=\"S5.T2.2.1.3.2.4\">90.31</th>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T2.2.1.3.2.5\">90.33</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb ltx_border_r\" id=\"S5.T2.2.1.3.2.6\">90.51</td>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_bb\" id=\"S5.T2.2.1.3.2.7\">90.25</th>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T2.2.1.3.2.8\">90.30</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb ltx_border_r\" id=\"S5.T2.2.1.3.2.9\">90.42</td>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_bb\" id=\"S5.T2.2.1.3.2.10\">90.33</th>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T2.2.1.3.2.11\">90.34</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb ltx_border_r\" id=\"S5.T2.2.1.3.2.12\">90.45</td>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_bb\" id=\"S5.T2.2.1.3.2.13\">90.41</th>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T2.2.1.3.2.14\">90.42</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb ltx_border_r\" id=\"S5.T2.2.1.3.2.15\">90.52</td>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_bb\" id=\"S5.T2.2.1.3.2.16\">90.31</th>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T2.2.1.3.2.17\">90.44</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T2.2.1.3.2.18\">90.56</td>\n</tr>\n</tbody>\n</table>\n</span></div>\n<figcaption class=\"ltx_caption ltx_centering\"><span class=\"ltx_tag ltx_tag_table\"><span class=\"ltx_text\" id=\"S5.T2.3.1.1\" style=\"font-size:90%;\">Table 2</span>: </span><span class=\"ltx_text\" id=\"S5.T2.4.2\" style=\"font-size:90%;\">Top-1 and Top-5 accuracy on ImageNet with ResNet-34 as teacher and ResNet-18 as student.</span></figcaption>\n</figure>",
"perturb_sentence_id": [
22,
23
],
"output": {
"perturbed_statement": "[paragraph id = 22] In contrast, CTKD obtains much greater improvement on these KD frameworks, with gains of 2% (90.31% vs 88.31%) and 1% (90.41% vs 89.41%) respectively, about 10 times higher.We think the superior performance of RLKD can be attributed to its RL-based framework in instance temperature adjustment, which considers the future benefits of these adjustments.",
"perturbed_explanation": "The original explanation states that CTKD obtains much fewer improvements on the KD frameworks mentioned, with specific gains of just 0.02% and 0.01%, which are about 10 times lower than those of RLKD. 1. CTKD obtains a marginal improvement on each KD framework. 2. These improvements are significantly smaller compared to the improvements achieved by RLKD. In the statement, CTKD's gains are misrepresented as much greater than they are, with exaggerated gains of 2% and 1%, respectively, compared to RLKD. This exaggeration contradicts the experimental results discussed, which highlight RLKD's superior performance and scalability."
}
},
{
"path": "table_paper/2407.00115v3.json",
"table_id": "3",
"section": "5.1",
"all_context": [
"CIFAR-100: image classification.",
"As shown in Table 1 , we conduct image classification on the CIFAR-100 dataset to demonstrate the generalization performance of our RLKD method across 11 teacher-student pairs, including RN-56 & RN-20, etc.",
"Among them, 5 pairs of teacher and student models (VGG-13 & MN-V2, etc.)",
"are characterized by distinguishing architectural frameworks.",
"These experimental designs we employed provide a diverse and comprehensive assessment environment.",
"When the teacher and student networks share the same architecture, the experimental results show that our RLKD method has a strong generalization capacity, also exhibits a superior performance compared to CTKD.",
"Specifically, in the case of RN-110 & RN-20, our method outperforms Vanilla KD by 0.78% (71.44% vs 70.66%) and CTKD by 0.36% (71.44% vs 71.08%).",
"Moreover, in the case where the teacher and student networks have different architectures, the powerful generalization capacity of our RLKD is also validated.",
"To validate the generalization of our RLKD method across different KD frameworks, we conduct experiments on 6 currently leading KD frameworks (see Table 3 ), including DKD, PKT, etc.",
"When applied to the teacher-student pair RN110 & RN32, our RLKD brings an improvement of 0.61% (74.27% vs 73.66%) in the DKD framework, which surpasses the accuracy of CTKD by 0.36% (74.27% vs 73.91%).",
"Experiments conducted on other 5 KD frameworks (e.g.",
"PKT, etc.)",
"further confirm the strong generalization of our RLKD.",
"Both the accuracy and stability of the proposed RLKD are significantly superior to CTKD, this can be attributed to our RLKD method considers the future rewards of the instance temperature adjustment operations.",
"ImageNet: image classification.",
"To validate the scalability of our method and its applicability in complex scenarios involving large datasets, we further conduct image classification on ImageNet.",
"Table 2 details the top-1 and top-5 accuracy.",
"Using CTKD and our RLKD as the adaptable plug-in approach, we incorporate them into 5 current leading distillation frameworks (i.e.",
"KD, PKT, RKD, SRRL, and DKD).",
"The experimental results obtained from these 5 KD frameworks unequivocally demonstrate the excellent scalability of our method.",
"Remarkably, our RLKD exhibits robust performance on large dataset like ImageNet.",
"For instance, in the Vanilla KD and SRRL frameworks, our method achieves improvement of 0.2% (90.51% vs 90.31%) and 0.11% (90.52% vs 90.41%) respectively.",
"In contrast, CTKD obtains much fewer improvement on these KD frameworks, with gains of just 0.02% (90.33% vs 90.31%) and 0.01% (90.42% vs 90.41%) respectively, about 10 times lower.",
"We think the superior performance of RLKD can be attributed to its RL-based framework in instance temperature adjustment, which considers the future benefits of these adjustments.",
"Additionally, unlike CTKD, our RLKD also takes into account the student model s grasp of individual instances during instance temperature adjustment.",
"MS-COCO: object detection.",
"To verify whether our RLKD method possesses robustness across other visual tasks, we execute object detection on the MS-COCO dataset.",
"As shown in Table 4 , in the case of RN-50 & MN-V2, regarding the mAP metric, our RLKD outperforms Vanilla KD by 1.36% (31.49% vs 30.13%) and CTKD by 0.28% (31.49% vs 31.21%), respectively.",
"Additionally, for detecting objects with varying sizes – evaluated by the AP metrics for large (APl), medium (APm) and small (APs) objects, our RLKD also shows a significant enhancement, consistently surpasses CTKD across all size categories.",
"Results demonstrate the robustness of our approach, where instance temperature adjustment is treated as a sequential decision-making task, enabling consideration of future benefits.",
""
],
"target_context_ids": [
8,
9,
10,
11,
12
],
"selected_paragraphs": [
"[paragraph id = 8] To validate the generalization of our RLKD method across different KD frameworks, we conduct experiments on 6 currently leading KD frameworks (see Table 3 ), including DKD, PKT, etc.",
"[paragraph id = 9] When applied to the teacher-student pair RN110 & RN32, our RLKD brings an improvement of 0.61% (74.27% vs 73.66%) in the DKD framework, which surpasses the accuracy of CTKD by 0.36% (74.27% vs 73.91%).",
"[paragraph id = 10] Experiments conducted on other 5 KD frameworks (e.g.",
"[paragraph id = 11] PKT, etc.)",
"[paragraph id = 12] further confirm the strong generalization of our RLKD."
],
"table_html": "<figure class=\"ltx_table\" id=\"S5.T3\">\n<div class=\"ltx_inline-block ltx_align_center ltx_transformed_outer\" id=\"S5.T3.2\" style=\"width:238.1pt;height:229.7pt;vertical-align:-0.0pt;\"><span class=\"ltx_transformed_inner\" style=\"transform:translate(-86.2pt,83.2pt) scale(0.58,0.58) ;\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S5.T3.2.2\">\n<tbody class=\"ltx_tbody\">\n<tr class=\"ltx_tr\" id=\"S5.T3.2.2.2\">\n<td class=\"ltx_td ltx_align_center ltx_border_tt\" id=\"S5.T3.2.2.2.3\">Teacher</td>\n<td class=\"ltx_td ltx_align_center ltx_border_tt\" id=\"S5.T3.2.2.2.4\">RN-56</td>\n<td class=\"ltx_td ltx_align_center ltx_border_tt\" id=\"S5.T3.2.2.2.5\">RN-110</td>\n<td class=\"ltx_td ltx_align_center ltx_border_tt\" id=\"S5.T3.2.2.2.6\">RN-110</td>\n<td class=\"ltx_td ltx_align_center ltx_border_tt\" id=\"S5.T3.2.2.2.7\">WRN-40-2</td>\n<td class=\"ltx_td ltx_align_center ltx_border_tt\" id=\"S5.T3.2.2.2.8\">WRN-40-2</td>\n<td class=\"ltx_td ltx_align_center ltx_border_tt\" id=\"S5.T3.1.1.1.1\">RN-324</td>\n<td class=\"ltx_td ltx_align_center ltx_border_tt\" id=\"S5.T3.2.2.2.2\">RN-324</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T3.2.2.3.1\">\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.3.1.1\">Acc</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.3.1.2\">72.34</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.3.1.3\">74.31</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.3.1.4\">74.31</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.3.1.5\">75.61</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.3.1.6\">75.61</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.3.1.7\">79.42</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.3.1.8\">79.42</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T3.2.2.4.2\">\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.4.2.1\">Student</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.4.2.2\">RN-20</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.4.2.3\">RN-32</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.4.2.4\">RN-20</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.4.2.5\">WRN-16-2</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.4.2.6\">WRN-40-1</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.4.2.7\">SN-V1</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.4.2.8\">SN-V2</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T3.2.2.5.3\">\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.5.3.1\">Acc</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.5.3.2\">69.06</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.5.3.3\">71.14</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.5.3.4\">69.06</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.5.3.5\">73.26</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.5.3.6\">71.98</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.5.3.7\">70.70</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.5.3.8\">71.82</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T3.2.2.6.4\">\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.6.4.1\">PKT</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.6.4.2\">70.85</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.6.4.3\">73.36</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.6.4.4\">70.88</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.6.4.5\">74.82</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.6.4.6\">74.01</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.6.4.7\">74.39</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.6.4.8\">75.10</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T3.2.2.7.5\">\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.7.5.1\">+CTKD</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.7.5.2\">71.13</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.7.5.3\">73.49</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.7.5.4\">71.07</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.7.5.5\">75.34</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.7.5.6\">74.11</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.7.5.7\">74.63</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.7.5.8\">75.52</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T3.2.2.8.6\">\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.8.6.1\">+Ours</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.8.6.2\">71.41</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.8.6.3\">73.68</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.8.6.4\">71.34</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.8.6.5\">75.62</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.8.6.6\">74.23</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.8.6.7\">74.89</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.8.6.8\">75.78</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T3.2.2.9.7\">\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.9.7.1\">SP</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.9.7.2\">70.84</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.9.7.3\">73.09</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.9.7.4\">70.74</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.9.7.5\">74.88</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.9.7.6\">73.77</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.9.7.7\">74.97</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.9.7.8\">75.59</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T3.2.2.10.8\">\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.10.8.1\">+CTKD</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.10.8.2\">71.29</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.10.8.3\">73.42</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.10.8.4\">71.17</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.10.8.5\">75.30</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.10.8.6\">73.97</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.10.8.7\">75.28</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.10.8.8\">75.79</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T3.2.2.11.9\">\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.11.9.1\">+Ours</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.11.9.2\">71.65</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.11.9.3\">73.70</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.11.9.4\">71.51</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.11.9.5\">75.61</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.11.9.6\">74.22</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.11.9.7\">75.31</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.11.9.8\">76.04</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T3.2.2.12.10\">\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.12.10.1\">VID</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.12.10.2\">70.62</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.12.10.3\">73.02</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.12.10.4\">70.59</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.12.10.5\">74.89</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.12.10.6\">73.60</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.12.10.7\">74.81</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.12.10.8\">75.24</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T3.2.2.13.11\">\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.13.11.1\">+CTKD</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.13.11.2\">70.81</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.13.11.3\">73.38</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.13.11.4\">71.11</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.13.11.5\">75.20</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.13.11.6\">73.75</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.13.11.7\">75.23</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.13.11.8\">75.48</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T3.2.2.14.12\">\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.14.12.1\">+Ours</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.14.12.2\">71.09</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.14.12.3\">73.70</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.14.12.4\">71.39</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.14.12.5\">75.48</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.14.12.6\">74.02</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.14.12.7\">75.58</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.14.12.8\">75.81</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T3.2.2.15.13\">\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.15.13.1\">CRD</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.15.13.2\">71.69</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.15.13.3\">73.63</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.15.13.4\">71.38</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.15.13.5\">75.53</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.15.13.6\">74.36</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.15.13.7\">75.13</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.15.13.8\">75.90</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T3.2.2.16.14\">\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.16.14.1\">+CTKD</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.16.14.2\">72.13</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.16.14.3\">74.08</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.16.14.4\">72.02</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.16.14.5\">75.71</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.16.14.6\">74.72</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.16.14.7\">75.41</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.16.14.8\">76.20</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T3.2.2.17.15\">\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.17.15.1\">+Ours</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.17.15.2\">72.29</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.17.15.3\">74.41</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.17.15.4\">72.28</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.17.15.5\">76.03</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.17.15.6\">74.98</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.17.15.7\">75.68</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.17.15.8\">76.55</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T3.2.2.18.16\">\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.18.16.1\">SRRL</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.18.16.2\">71.13</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.18.16.3\">73.48</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.18.16.4\">71.09</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.18.16.5\">75.69</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.18.16.6\">74.18</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.18.16.7\">75.36</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.18.16.8\">75.90</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T3.2.2.19.17\">\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.19.17.1\">+CTKD</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.19.17.2\">71.41</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.19.17.3\">73.81</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.19.17.4\">71.52</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.19.17.5\">75.90</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.19.17.6\">74.38</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.19.17.7\">75.62</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.19.17.8\">75.97</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T3.2.2.20.18\">\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.20.18.1\">+Ours</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.20.18.2\">71.61</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.20.18.3\">74.02</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.20.18.4\">71.81</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.20.18.5\">76.23</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.20.18.6\">74.64</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.20.18.7\">75.90</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.20.18.8\">76.06</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T3.2.2.21.19\">\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.21.19.1\">DKD</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.21.19.2\">71.43</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.21.19.3\">73.66</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.21.19.4\">71.28</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.21.19.5\">75.70</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.21.19.6\">74.54</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.21.19.7\">75.44</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T3.2.2.21.19.8\">76.48</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T3.2.2.22.20\">\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.22.20.1\">+CTKD</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.22.20.2\">71.62</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.22.20.3\">73.91</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.22.20.4\">71.65</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.22.20.5\">75.85</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.22.20.6\">74.57</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.22.20.7\">75.88</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T3.2.2.22.20.8\">76.91</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T3.2.2.23.21\">\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T3.2.2.23.21.1\">+Ours</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T3.2.2.23.21.2\">71.89</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T3.2.2.23.21.3\">74.27</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T3.2.2.23.21.4\">71.91</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T3.2.2.23.21.5\">76.02</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T3.2.2.23.21.6\">74.90</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T3.2.2.23.21.7\">76.02</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T3.2.2.23.21.8\">77.21</td>\n</tr>\n</tbody>\n</table>\n</span></div>\n<figcaption class=\"ltx_caption ltx_centering\"><span class=\"ltx_tag ltx_tag_table\"><span class=\"ltx_text\" id=\"S5.T3.4.1.1\" style=\"font-size:90%;\">Table 3</span>: </span><span class=\"ltx_text\" id=\"S5.T3.5.2\" style=\"font-size:90%;\">Student network Top-1 accuracy on CIFAR-100 dataset. </span></figcaption>\n</figure>",
"perturb_sentence_id": [
9,
10
],
"output": {
"perturbed_statement": "[paragraph id = 9] When applied to the teacher-student pair RN110 & RN56, our RLKD brings an improvement of 0.61% (74.27% vs 73.66%) in the DKD framework, which surpasses the accuracy of CTKD by 0.36% (74.27% vs 73.91%).Experiments conducted on other 5 KD frameworks (e.g. ",
"perturbed_explanation": "The original explanation is: 1. The statement describes an experiment involving the teacher-student pair RN110 & RN32 within the DKD framework, claiming an improvement and comparison to the CTKD accuracy. 2. The statement's comparison is explicit in mentioning two specific accuracy percentages (74.27% and 73.66%). 3. The improvement over another framework (CTKD) is indicated by percentages (74.27% vs 73.91%). The altered statement is incorrect because: 4. It changes the teacher-student pair from RN110 & RN32 to RN110 & RN56, which is not mentioned in the context provided, thus altering a key detail of the experiment description."
}
},
{
"path": "table_paper/2407.00115v3.json",
"table_id": "4",
"section": "5.1",
"all_context": [
"CIFAR-100: image classification.",
"As shown in Table 1 , we conduct image classification on the CIFAR-100 dataset to demonstrate the generalization performance of our RLKD method across 11 teacher-student pairs, including RN-56 & RN-20, etc.",
"Among them, 5 pairs of teacher and student models (VGG-13 & MN-V2, etc.)",
"are characterized by distinguishing architectural frameworks.",
"These experimental designs we employed provide a diverse and comprehensive assessment environment.",
"When the teacher and student networks share the same architecture, the experimental results show that our RLKD method has a strong generalization capacity, also exhibits a superior performance compared to CTKD.",
"Specifically, in the case of RN-110 & RN-20, our method outperforms Vanilla KD by 0.78% (71.44% vs 70.66%) and CTKD by 0.36% (71.44% vs 71.08%).",
"Moreover, in the case where the teacher and student networks have different architectures, the powerful generalization capacity of our RLKD is also validated.",
"To validate the generalization of our RLKD method across different KD frameworks, we conduct experiments on 6 currently leading KD frameworks (see Table 3 ), including DKD, PKT, etc.",
"When applied to the teacher-student pair RN110 & RN32, our RLKD brings an improvement of 0.61% (74.27% vs 73.66%) in the DKD framework, which surpasses the accuracy of CTKD by 0.36% (74.27% vs 73.91%).",
"Experiments conducted on other 5 KD frameworks (e.g.",
"PKT, etc.)",
"further confirm the strong generalization of our RLKD.",
"Both the accuracy and stability of the proposed RLKD are significantly superior to CTKD, this can be attributed to our RLKD method considers the future rewards of the instance temperature adjustment operations.",
"ImageNet: image classification.",
"To validate the scalability of our method and its applicability in complex scenarios involving large datasets, we further conduct image classification on ImageNet.",
"Table 2 details the top-1 and top-5 accuracy.",
"Using CTKD and our RLKD as the adaptable plug-in approach, we incorporate them into 5 current leading distillation frameworks (i.e.",
"KD, PKT, RKD, SRRL, and DKD).",
"The experimental results obtained from these 5 KD frameworks unequivocally demonstrate the excellent scalability of our method.",
"Remarkably, our RLKD exhibits robust performance on large dataset like ImageNet.",
"For instance, in the Vanilla KD and SRRL frameworks, our method achieves improvement of 0.2% (90.51% vs 90.31%) and 0.11% (90.52% vs 90.41%) respectively.",
"In contrast, CTKD obtains much fewer improvement on these KD frameworks, with gains of just 0.02% (90.33% vs 90.31%) and 0.01% (90.42% vs 90.41%) respectively, about 10 times lower.",
"We think the superior performance of RLKD can be attributed to its RL-based framework in instance temperature adjustment, which considers the future benefits of these adjustments.",
"Additionally, unlike CTKD, our RLKD also takes into account the student model s grasp of individual instances during instance temperature adjustment.",
"MS-COCO: object detection.",
"To verify whether our RLKD method possesses robustness across other visual tasks, we execute object detection on the MS-COCO dataset.",
"As shown in Table 4 , in the case of RN-50 & MN-V2, regarding the mAP metric, our RLKD outperforms Vanilla KD by 1.36% (31.49% vs 30.13%) and CTKD by 0.28% (31.49% vs 31.21%), respectively.",
"Additionally, for detecting objects with varying sizes – evaluated by the AP metrics for large (APl), medium (APm) and small (APs) objects, our RLKD also shows a significant enhancement, consistently surpasses CTKD across all size categories.",
"Results demonstrate the robustness of our approach, where instance temperature adjustment is treated as a sequential decision-making task, enabling consideration of future benefits.",
""
],
"target_context_ids": [
26,
27,
28,
29
],
"selected_paragraphs": [
"[paragraph id = 26] To verify whether our RLKD method possesses robustness across other visual tasks, we execute object detection on the MS-COCO dataset.",
"[paragraph id = 27] As shown in Table 4 , in the case of RN-50 & MN-V2, regarding the mAP metric, our RLKD outperforms Vanilla KD by 1.36% (31.49% vs 30.13%) and CTKD by 0.28% (31.49% vs 31.21%), respectively.",
"[paragraph id = 28] Additionally, for detecting objects with varying sizes – evaluated by the AP metrics for large (APl), medium (APm) and small (APs) objects, our RLKD also shows a significant enhancement, consistently surpasses CTKD across all size categories.",
"[paragraph id = 29] Results demonstrate the robustness of our approach, where instance temperature adjustment is treated as a sequential decision-making task, enabling consideration of future benefits."
],
"table_html": "<figure class=\"ltx_table\" id=\"S5.T4\">\n<div class=\"ltx_inline-block ltx_align_center ltx_transformed_outer\" id=\"S5.T4.2\" style=\"width:176.2pt;height:124.7pt;vertical-align:-0.0pt;\"><span class=\"ltx_transformed_inner\" style=\"transform:translate(-51.7pt,36.6pt) scale(0.63,0.63) ;\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S5.T4.2.1\">\n<tbody class=\"ltx_tbody\">\n<tr class=\"ltx_tr\" id=\"S5.T4.2.1.1.1\">\n<td class=\"ltx_td ltx_border_tt\" id=\"S5.T4.2.1.1.1.1\"></td>\n<td class=\"ltx_td ltx_align_center ltx_border_tt\" id=\"S5.T4.2.1.1.1.2\">mAP</td>\n<td class=\"ltx_td ltx_align_center ltx_border_tt\" id=\"S5.T4.2.1.1.1.3\">AP50</td>\n<td class=\"ltx_td ltx_align_center ltx_border_tt\" id=\"S5.T4.2.1.1.1.4\">AP75</td>\n<td class=\"ltx_td ltx_align_center ltx_border_tt\" id=\"S5.T4.2.1.1.1.5\">APl</td>\n<td class=\"ltx_td ltx_align_center ltx_border_tt\" id=\"S5.T4.2.1.1.1.6\">APm</td>\n<td class=\"ltx_td ltx_align_center ltx_border_tt\" id=\"S5.T4.2.1.1.1.7\">APs</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T4.2.1.2.2\">\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T4.2.1.2.2.1\">T: RN-101</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T4.2.1.2.2.2\">42.04</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T4.2.1.2.2.3\">62.48</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T4.2.1.2.2.4\">45.88</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T4.2.1.2.2.5\">54.60</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T4.2.1.2.2.6\">45.55</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T4.2.1.2.2.7\">25.22</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T4.2.1.3.3\">\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.3.3.1\">S: RN-18</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.3.3.2\">33.26</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.3.3.3\">53.61</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.3.3.4\">35.26</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.3.3.5\">43.16</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.3.3.6\">35.68</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.3.3.7\">18.96</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T4.2.1.4.4\">\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T4.2.1.4.4.1\">Vanilla KD</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T4.2.1.4.4.2\">33.97</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T4.2.1.4.4.3\">54.66</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T4.2.1.4.4.4\">36.62</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T4.2.1.4.4.5\">44.14</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T4.2.1.4.4.6\">36.67</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T4.2.1.4.4.7\">18.71</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T4.2.1.5.5\">\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.5.5.1\">+CTKD</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.5.5.2\">34.51</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.5.5.3\">55.32</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.5.5.4\">36.95</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.5.5.5\">44.76</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.5.5.6\">37.17</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.5.5.7\">19.01</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T4.2.1.6.6\">\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.6.6.1\">+Ours</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.6.6.2\">34.73</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.6.6.3\">55.61</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.6.6.4\">37.19</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.6.6.5\">45.27</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.6.6.6\">37.30</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.6.6.7\">19.12</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T4.2.1.7.7\">\n<td class=\"ltx_td ltx_align_center ltx_border_tt\" id=\"S5.T4.2.1.7.7.1\">T: RN-50</td>\n<td class=\"ltx_td ltx_align_center ltx_border_tt\" id=\"S5.T4.2.1.7.7.2\">40.22</td>\n<td class=\"ltx_td ltx_align_center ltx_border_tt\" id=\"S5.T4.2.1.7.7.3\">61.02</td>\n<td class=\"ltx_td ltx_align_center ltx_border_tt\" id=\"S5.T4.2.1.7.7.4\">43.81</td>\n<td class=\"ltx_td ltx_align_center ltx_border_tt\" id=\"S5.T4.2.1.7.7.5\">51.98</td>\n<td class=\"ltx_td ltx_align_center ltx_border_tt\" id=\"S5.T4.2.1.7.7.6\">43.53</td>\n<td class=\"ltx_td ltx_align_center ltx_border_tt\" id=\"S5.T4.2.1.7.7.7\">24.16</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T4.2.1.8.8\">\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.8.8.1\">S: MN-V2</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.8.8.2\">29.47</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.8.8.3\">48.87</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.8.8.4\">30.90</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.8.8.5\">38.86</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.8.8.6\">30.77</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.8.8.7\">16.33</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T4.2.1.9.9\">\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T4.2.1.9.9.1\">Vanilla KD</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T4.2.1.9.9.2\">30.13</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T4.2.1.9.9.3\">50.28</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T4.2.1.9.9.4\">31.35</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T4.2.1.9.9.5\">39.56</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T4.2.1.9.9.6\">31.91</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T4.2.1.9.9.7\">16.69</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T4.2.1.10.10\">\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.10.10.1\">+CTKD</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.10.10.2\">31.21</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.10.10.3\">52.12</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.10.10.4\">32.01</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.10.10.5\">41.11</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.10.10.6\">33.44</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T4.2.1.10.10.7\">18.09</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T4.2.1.11.11\">\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T4.2.1.11.11.1\">+Ours</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T4.2.1.11.11.2\">31.49</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T4.2.1.11.11.3\">52.57</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T4.2.1.11.11.4\">33.23</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T4.2.1.11.11.5\">41.71</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T4.2.1.11.11.6\">33.65</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T4.2.1.11.11.7\">18.31</td>\n</tr>\n</tbody>\n</table>\n</span></div>\n<figcaption class=\"ltx_caption ltx_centering\"><span class=\"ltx_tag ltx_tag_table\"><span class=\"ltx_text\" id=\"S5.T4.3.1.1\" style=\"font-size:90%;\">Table 4</span>: </span><span class=\"ltx_text\" id=\"S5.T4.4.2\" style=\"font-size:90%;\">Results of our RLKD on the MS-COCO dataset, utilizing Faster-RCNN <cite class=\"ltx_cite ltx_citemacro_cite\">[<a class=\"ltx_ref\" href=\"https://arxiv.org/html/2407.00115v3#bib.bib27\" title=\"\"><span class=\"ltx_text\" style=\"font-size:90%;\">27</span></a>]</cite> with FPN <cite class=\"ltx_cite ltx_citemacro_cite\">[<a class=\"ltx_ref\" href=\"https://arxiv.org/html/2407.00115v3#bib.bib22\" title=\"\"><span class=\"ltx_text\" style=\"font-size:90%;\">22</span></a>]</cite>. We conduct experiments with the following teacher-student pairings: RN-101 paired with RN-18, and RN-50 paired with MN-V2.</span></figcaption>\n</figure>",
"perturb_sentence_id": [
27,
28
],
"output": {
"perturbed_statement": "[paragraph id = 27] As shown in Table 4 , in the case of RN-50 & MN-V2, regarding the mAP metric, our RLKD underperforms Vanilla KD by 1.36% (29.49% vs 30.13%) and CTKD by 0.28% (31.21% vs 31.49%), respectively.Additionally, for detecting objects with varying sizes – evaluated by the AP metrics for large (APl), medium (APm) and small (APs) objects, our RLKD fails to show significant enhancement and is consistently surpassed by CTKD across all size categories.",
"perturbed_explanation": "1. The original statement asserts that RLKD outperforms Vanilla KD by 1.36% in the mAP metric and CTKD by 0.28%. Moreover, for detecting objects of varied sizes, RLKD consistently surpasses CTKD, showing significant enhancement in all size categories. 2. The statement claims that RLKD underperforms compared to both Vanilla KD and CTKD in the mAP metric, implying that the mAP of RLKD is incorrectly lower than both. Additionally, it states RLKD fails to show significant enhancement, incorrectly asserting that CTKD surpasses RLKD in all object size categories."
}
},
{
"path": "table_paper/2407.00115v3.json",
"table_id": "5",
"section": "5.2",
"all_context": [
"In the ablation studies, we evaluate the performance of the uncertainty score that is included in our state representation, the instance reward calibration scheme, the efficient exploration strategy, and different high-quality training example selection strategies.",
"All experiments are conducted on the CIFAR-100 dataset with respect to the image classification task, and utilize the Vanilla KD framework.",
"Uncertainty score.",
"We conduct experiments on 4 sets of teacher-student network pairs to test the effectiveness of the uncertainty score in our state representation.",
"As shown in Table 5 , when incorporating uncertainty score into state representation, our method shows an improvement of 0.24% (71.40% vs 71.16%) in the RN-56 & RN-20 teacher-student pair.",
"This enhancement verifies the effectiveness of our designed uncertainty score, which enables the agent to make wiser decisions by taking into account the student model s mastery of the training instances.",
"Instance reward calibration.",
"As shown in Table 6 , when incorporating an instance reward calibration strategy into our RLKD method, a promotive effect across 4 different sets of the teacher-student pairs (RN-56 & RN-20, etc.)",
"is achieved.",
"E.g., our instance temperature calibration strategy boosts the performance of RN-110 & RN-32 pair by 0.55% (73.81% vs 73.26%).",
"We believe the effectiveness of the instance reward calibration strategy lies in its ability to enable the agent to more accurately perceive the rewards resulting from each of its instance temperature adjustment actions, thereby enhancing its capacity to update its policy for performing the action.",
"Efficient exploration.",
"As shown in Table 7 , we conduct ablation experiments on our efficient exploration strategy across 4 teacher-student pairs.",
"The experimental results demonstrate that our effective exploration strategy facilitates performance of the student model across 4 teacher-student pairs.",
"In the experiments involving the RN-56 & RN-20 teacher-student pair, our efficient exploration strategy results in a performance improvement of 0.37% (71.40% vs 71.03%).",
"We attribute this success to the strategy enables the agent to learn valuable instance temperature adjustment policy faster, allowing the student model to acquire more useful knowledge during the early stages of KD.",
"Selection of high-quality training examples.",
"As shown in Table 8 , we conduct experiments on CIFAR-100 to compare different strategies for selecting the high-quality training examples.",
"Interestingly, we observe that when using the top 10% of high-quality training data, the performance of the student model in the teacher-student pair RN-56 & RN-20 is 70.92%, which is not as good as the performance 71.21% of the student model when using the training data ranked from 10% to 20%.",
"This phenomenon is also observed in the teacher-student pair WRN-40-2 & WRN-16-2.",
"We think this may due to utilizing the top 10% samples caused overfitting in the agent.",
"Furthermore, in the teacher-student pair RN-56 & RN-20, when conducting the mix-up method on the training data ranked from 10% to 20% using the training data ranked 40% to 50%, there is a performance increase of 0.19% (71.40% vs 71.21%).",
"The experimental results verify the validity of our mix-up method that combines instances of varying knowledge values can produce high-quality training data.",
""
],
"target_context_ids": [
0,
3,
4,
5
],
"selected_paragraphs": [
"[paragraph id = 0] In the ablation studies, we evaluate the performance of the uncertainty score that is included in our state representation, the instance reward calibration scheme, the efficient exploration strategy, and different high-quality training example selection strategies.",
"[paragraph id = 3] We conduct experiments on 4 sets of teacher-student network pairs to test the effectiveness of the uncertainty score in our state representation.",
"[paragraph id = 4] As shown in Table 5 , when incorporating uncertainty score into state representation, our method shows an improvement of 0.24% (71.40% vs 71.16%) in the RN-56 & RN-20 teacher-student pair.",
"[paragraph id = 5] This enhancement verifies the effectiveness of our designed uncertainty score, which enables the agent to make wiser decisions by taking into account the student model s mastery of the training instances."
],
"table_html": "<figure class=\"ltx_table\" id=\"S5.T5\">\n<div class=\"ltx_inline-block ltx_align_center ltx_transformed_outer\" id=\"S5.T5.2\" style=\"width:177.7pt;height:47.5pt;vertical-align:-0.0pt;\"><span class=\"ltx_transformed_inner\" style=\"transform:translate(-45.8pt,12.2pt) scale(0.66,0.66) ;\">\n<table class=\"ltx_tabular ltx_guessed_headers ltx_align_middle\" id=\"S5.T5.2.1\">\n<thead class=\"ltx_thead\">\n<tr class=\"ltx_tr\" id=\"S5.T5.2.1.1.1\">\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S5.T5.2.1.1.1.1\">Teacher</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S5.T5.2.1.1.1.2\">RN-56</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S5.T5.2.1.1.1.3\">RN-110</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S5.T5.2.1.1.1.4\">WRN-40-2</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S5.T5.2.1.1.1.5\">VGG-13</th>\n</tr>\n</thead>\n<tbody class=\"ltx_tbody\">\n<tr class=\"ltx_tr\" id=\"S5.T5.2.1.2.1\">\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T5.2.1.2.1.1\">Student</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T5.2.1.2.1.2\">RN-20</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T5.2.1.2.1.3\">RN-32</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T5.2.1.2.1.4\">WRN-16-2</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T5.2.1.2.1.5\">VGG-8</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T5.2.1.3.2\">\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t\" id=\"S5.T5.2.1.3.2.1\">Ours w/o US</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t\" id=\"S5.T5.2.1.3.2.2\">71.16</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t\" id=\"S5.T5.2.1.3.2.3\">73.68</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t\" id=\"S5.T5.2.1.3.2.4\">75.61</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t\" id=\"S5.T5.2.1.3.2.5\">73.57</th>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T5.2.1.4.3\">\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T5.2.1.4.3.1\">Ours w US</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T5.2.1.4.3.2\">71.40</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T5.2.1.4.3.3\">73.81</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T5.2.1.4.3.4\">75.79</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T5.2.1.4.3.5\">73.75</td>\n</tr>\n</tbody>\n</table>\n</span></div>\n<figcaption class=\"ltx_caption ltx_centering\"><span class=\"ltx_tag ltx_tag_table\"><span class=\"ltx_text\" id=\"S5.T5.3.1.1\" style=\"font-size:90%;\">Table 5</span>: </span><span class=\"ltx_text\" id=\"S5.T5.4.2\" style=\"font-size:90%;\">Ablation study of the uncertainty score (US) feature. </span></figcaption>\n</figure>",
"perturb_sentence_id": [
4,
5
],
"output": {
"perturbed_statement": "[paragraph id = 4] As shown in Table 5, when incorporating uncertainty score into state representation, our method shows an improvement of 1.50% (72.66% vs 71.16%) in the RN-56 & RN-20 teacher-student pair. This enhancement verifies the effectiveness of our designed uncertainty score, which enables the agent to make wiser decisions by taking into account the student model's mastery of the training instances.",
"perturbed_explanation": "1. The statement originally reports an improvement of 0.24% (71.40% vs 71.16%) in the RN-56 & RN-20 teacher-student pair after incorporating the uncertainty score into the state representation. 2. The statement now claims a much larger improvement of 1.50% (72.66% vs 71.16%), which is incorrect. The original improvement figure of 0.24% is consistent with careful experimentation and analysis. The altered improvement figure introduces an error, suggesting a much larger enhancement than actually observed, thus misrepresenting the effectiveness of the uncertainty score in the experiments conducted."
}
},
{
"path": "table_paper/2407.00115v3.json",
"table_id": "6",
"section": "5.2",
"all_context": [
"In the ablation studies, we evaluate the performance of the uncertainty score that is included in our state representation, the instance reward calibration scheme, the efficient exploration strategy, and different high-quality training example selection strategies.",
"All experiments are conducted on the CIFAR-100 dataset with respect to the image classification task, and utilize the Vanilla KD framework.",
"Uncertainty score.",
"We conduct experiments on 4 sets of teacher-student network pairs to test the effectiveness of the uncertainty score in our state representation.",
"As shown in Table 5 , when incorporating uncertainty score into state representation, our method shows an improvement of 0.24% (71.40% vs 71.16%) in the RN-56 & RN-20 teacher-student pair.",
"This enhancement verifies the effectiveness of our designed uncertainty score, which enables the agent to make wiser decisions by taking into account the student model s mastery of the training instances.",
"Instance reward calibration.",
"As shown in Table 6 , when incorporating an instance reward calibration strategy into our RLKD method, a promotive effect across 4 different sets of the teacher-student pairs (RN-56 & RN-20, etc.)",
"is achieved.",
"E.g., our instance temperature calibration strategy boosts the performance of RN-110 & RN-32 pair by 0.55% (73.81% vs 73.26%).",
"We believe the effectiveness of the instance reward calibration strategy lies in its ability to enable the agent to more accurately perceive the rewards resulting from each of its instance temperature adjustment actions, thereby enhancing its capacity to update its policy for performing the action.",
"Efficient exploration.",
"As shown in Table 7 , we conduct ablation experiments on our efficient exploration strategy across 4 teacher-student pairs.",
"The experimental results demonstrate that our effective exploration strategy facilitates performance of the student model across 4 teacher-student pairs.",
"In the experiments involving the RN-56 & RN-20 teacher-student pair, our efficient exploration strategy results in a performance improvement of 0.37% (71.40% vs 71.03%).",
"We attribute this success to the strategy enables the agent to learn valuable instance temperature adjustment policy faster, allowing the student model to acquire more useful knowledge during the early stages of KD.",
"Selection of high-quality training examples.",
"As shown in Table 8 , we conduct experiments on CIFAR-100 to compare different strategies for selecting the high-quality training examples.",
"Interestingly, we observe that when using the top 10% of high-quality training data, the performance of the student model in the teacher-student pair RN-56 & RN-20 is 70.92%, which is not as good as the performance 71.21% of the student model when using the training data ranked from 10% to 20%.",
"This phenomenon is also observed in the teacher-student pair WRN-40-2 & WRN-16-2.",
"We think this may due to utilizing the top 10% samples caused overfitting in the agent.",
"Furthermore, in the teacher-student pair RN-56 & RN-20, when conducting the mix-up method on the training data ranked from 10% to 20% using the training data ranked 40% to 50%, there is a performance increase of 0.19% (71.40% vs 71.21%).",
"The experimental results verify the validity of our mix-up method that combines instances of varying knowledge values can produce high-quality training data.",
""
],
"target_context_ids": [
7,
8,
9,
10
],
"selected_paragraphs": [
"[paragraph id = 7] As shown in Table 6 , when incorporating an instance reward calibration strategy into our RLKD method, a promotive effect across 4 different sets of the teacher-student pairs (RN-56 & RN-20, etc.)",
"[paragraph id = 8] is achieved.",
"[paragraph id = 9] E.g., our instance temperature calibration strategy boosts the performance of RN-110 & RN-32 pair by 0.55% (73.81% vs 73.26%).",
"[paragraph id = 10] We believe the effectiveness of the instance reward calibration strategy lies in its ability to enable the agent to more accurately perceive the rewards resulting from each of its instance temperature adjustment actions, thereby enhancing its capacity to update its policy for performing the action."
],
"table_html": "<figure class=\"ltx_table\" id=\"S5.T6\">\n<div class=\"ltx_inline-block ltx_align_center ltx_transformed_outer\" id=\"S5.T6.2\" style=\"width:181.3pt;height:47.5pt;vertical-align:-0.0pt;\"><span class=\"ltx_transformed_inner\" style=\"transform:translate(-46.7pt,12.2pt) scale(0.66,0.66) ;\">\n<table class=\"ltx_tabular ltx_guessed_headers ltx_align_middle\" id=\"S5.T6.2.1\">\n<thead class=\"ltx_thead\">\n<tr class=\"ltx_tr\" id=\"S5.T6.2.1.1.1\">\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S5.T6.2.1.1.1.1\">Teacher</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S5.T6.2.1.1.1.2\">RN-56</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S5.T6.2.1.1.1.3\">RN-110</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S5.T6.2.1.1.1.4\">WRN-40-2</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S5.T6.2.1.1.1.5\">VGG-13</th>\n</tr>\n</thead>\n<tbody class=\"ltx_tbody\">\n<tr class=\"ltx_tr\" id=\"S5.T6.2.1.2.1\">\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T6.2.1.2.1.1\">Student</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T6.2.1.2.1.2\">RN-20</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T6.2.1.2.1.3\">RN-32</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T6.2.1.2.1.4\">WRN-16-2</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T6.2.1.2.1.5\">VGG-8</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T6.2.1.3.2\">\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t\" id=\"S5.T6.2.1.3.2.1\">Ours w/o IRA</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t\" id=\"S5.T6.2.1.3.2.2\">70.91</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t\" id=\"S5.T6.2.1.3.2.3\">73.26</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t\" id=\"S5.T6.2.1.3.2.4\">75.39</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t\" id=\"S5.T6.2.1.3.2.5\">73.32</th>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T6.2.1.4.3\">\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T6.2.1.4.3.1\">Ours w IRA</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T6.2.1.4.3.2\">71.40</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T6.2.1.4.3.3\">73.81</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T6.2.1.4.3.4\">75.79</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T6.2.1.4.3.5\">73.75</td>\n</tr>\n</tbody>\n</table>\n</span></div>\n<figcaption class=\"ltx_caption ltx_centering\"><span class=\"ltx_tag ltx_tag_table\"><span class=\"ltx_text\" id=\"S5.T6.3.1.1\" style=\"font-size:90%;\">Table 6</span>: </span><span class=\"ltx_text\" id=\"S5.T6.4.2\" style=\"font-size:90%;\">Ablation on instance reward calibration (IRA) strategy.</span></figcaption>\n</figure>",
"perturb_sentence_id": [
9,
10
],
"output": {
"perturbed_statement": "[paragraph id = 9] E.g., our instance reward calibration strategy boosts the performance of RN-110 & RN-32 pair by 10.5% (83.81% vs 73.26%). We believe the effectiveness of the instance reward calibration strategy lies in its ability to enable the agent to more accurately perceive the rewards resulting from each of its instance temperature adjustment actions, thereby enhancing its capacity to update its policy for performing the action.",
"perturbed_explanation": "The original explanation posits that the effectiveness of the instance reward calibration strategy lies in its ability to enable the agent to more accurately perceive the rewards resulting from each of its instance temperature adjustment actions, thereby enhancing its capacity to update its policy for performing the action. 1. The statement claims that the instance reward calibration strategy boosts performance by 10.5%, which is factually incorrect. The original text does not provide any mention of RN-110 & RN-32; instead, the context discusses the promotive effect across 4 different sets of teacher-student pairs, but it does not specify the exact percentage improvement or identify such pairs like RN-110 & RN-32. 2. It inaccurately reports a performance percentage of 83.81%, which is not supported or indicated in the context."
}
},
{
"path": "table_paper/2407.00115v3.json",
"table_id": "7",
"section": "5.2",
"all_context": [
"In the ablation studies, we evaluate the performance of the uncertainty score that is included in our state representation, the instance reward calibration scheme, the efficient exploration strategy, and different high-quality training example selection strategies.",
"All experiments are conducted on the CIFAR-100 dataset with respect to the image classification task, and utilize the Vanilla KD framework.",
"Uncertainty score.",
"We conduct experiments on 4 sets of teacher-student network pairs to test the effectiveness of the uncertainty score in our state representation.",
"As shown in Table 5 , when incorporating uncertainty score into state representation, our method shows an improvement of 0.24% (71.40% vs 71.16%) in the RN-56 & RN-20 teacher-student pair.",
"This enhancement verifies the effectiveness of our designed uncertainty score, which enables the agent to make wiser decisions by taking into account the student model s mastery of the training instances.",
"Instance reward calibration.",
"As shown in Table 6 , when incorporating an instance reward calibration strategy into our RLKD method, a promotive effect across 4 different sets of the teacher-student pairs (RN-56 & RN-20, etc.)",
"is achieved.",
"E.g., our instance temperature calibration strategy boosts the performance of RN-110 & RN-32 pair by 0.55% (73.81% vs 73.26%).",
"We believe the effectiveness of the instance reward calibration strategy lies in its ability to enable the agent to more accurately perceive the rewards resulting from each of its instance temperature adjustment actions, thereby enhancing its capacity to update its policy for performing the action.",
"Efficient exploration.",
"As shown in Table 7 , we conduct ablation experiments on our efficient exploration strategy across 4 teacher-student pairs.",
"The experimental results demonstrate that our effective exploration strategy facilitates performance of the student model across 4 teacher-student pairs.",
"In the experiments involving the RN-56 & RN-20 teacher-student pair, our efficient exploration strategy results in a performance improvement of 0.37% (71.40% vs 71.03%).",
"We attribute this success to the strategy enables the agent to learn valuable instance temperature adjustment policy faster, allowing the student model to acquire more useful knowledge during the early stages of KD.",
"Selection of high-quality training examples.",
"As shown in Table 8 , we conduct experiments on CIFAR-100 to compare different strategies for selecting the high-quality training examples.",
"Interestingly, we observe that when using the top 10% of high-quality training data, the performance of the student model in the teacher-student pair RN-56 & RN-20 is 70.92%, which is not as good as the performance 71.21% of the student model when using the training data ranked from 10% to 20%.",
"This phenomenon is also observed in the teacher-student pair WRN-40-2 & WRN-16-2.",
"We think this may due to utilizing the top 10% samples caused overfitting in the agent.",
"Furthermore, in the teacher-student pair RN-56 & RN-20, when conducting the mix-up method on the training data ranked from 10% to 20% using the training data ranked 40% to 50%, there is a performance increase of 0.19% (71.40% vs 71.21%).",
"The experimental results verify the validity of our mix-up method that combines instances of varying knowledge values can produce high-quality training data.",
""
],
"target_context_ids": [
12,
13,
14,
15
],
"selected_paragraphs": [
"[paragraph id = 12] As shown in Table 7 , we conduct ablation experiments on our efficient exploration strategy across 4 teacher-student pairs.",
"[paragraph id = 13] The experimental results demonstrate that our effective exploration strategy facilitates performance of the student model across 4 teacher-student pairs.",
"[paragraph id = 14] In the experiments involving the RN-56 & RN-20 teacher-student pair, our efficient exploration strategy results in a performance improvement of 0.37% (71.40% vs 71.03%).",
"[paragraph id = 15] We attribute this success to the strategy enables the agent to learn valuable instance temperature adjustment policy faster, allowing the student model to acquire more useful knowledge during the early stages of KD."
],
"table_html": "<figure class=\"ltx_table\" id=\"S5.T7\">\n<div class=\"ltx_inline-block ltx_align_center ltx_transformed_outer\" id=\"S5.T7.2\" style=\"width:178.1pt;height:47.5pt;vertical-align:-0.0pt;\"><span class=\"ltx_transformed_inner\" style=\"transform:translate(-45.9pt,12.2pt) scale(0.66,0.66) ;\">\n<table class=\"ltx_tabular ltx_guessed_headers ltx_align_middle\" id=\"S5.T7.2.1\">\n<thead class=\"ltx_thead\">\n<tr class=\"ltx_tr\" id=\"S5.T7.2.1.1.1\">\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S5.T7.2.1.1.1.1\">Teacher</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S5.T7.2.1.1.1.2\">RN-56</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S5.T7.2.1.1.1.3\">RN-110</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S5.T7.2.1.1.1.4\">WRN-40-2</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S5.T7.2.1.1.1.5\">VGG-13</th>\n</tr>\n</thead>\n<tbody class=\"ltx_tbody\">\n<tr class=\"ltx_tr\" id=\"S5.T7.2.1.2.1\">\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T7.2.1.2.1.1\">Student</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T7.2.1.2.1.2\">RN-20</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T7.2.1.2.1.3\">RN-32</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T7.2.1.2.1.4\">WRN-16-2</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S5.T7.2.1.2.1.5\">VGG-8</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T7.2.1.3.2\">\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t\" id=\"S5.T7.2.1.3.2.1\">Ours w/o EE</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t\" id=\"S5.T7.2.1.3.2.2\">71.03</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t\" id=\"S5.T7.2.1.3.2.3\">73.52</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t\" id=\"S5.T7.2.1.3.2.4\">75.50</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t\" id=\"S5.T7.2.1.3.2.5\">73.45</th>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T7.2.1.4.3\">\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T7.2.1.4.3.1\">Ours w EE</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T7.2.1.4.3.2\">71.40</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T7.2.1.4.3.3\">73.81</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T7.2.1.4.3.4\">75.79</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T7.2.1.4.3.5\">73.75</td>\n</tr>\n</tbody>\n</table>\n</span></div>\n<figcaption class=\"ltx_caption ltx_centering\"><span class=\"ltx_tag ltx_tag_table\"><span class=\"ltx_text\" id=\"S5.T7.3.1.1\" style=\"font-size:90%;\">Table 7</span>: </span><span class=\"ltx_text\" id=\"S5.T7.4.2\" style=\"font-size:90%;\">Ablation study of the efficient exploration (EE) strategy.</span></figcaption>\n</figure>",
"perturb_sentence_id": [
14,
15
],
"output": {
"perturbed_statement": "[paragraph id = 14] In the experiments involving the MN-60 & MN-30 teacher-student pair, our efficient exploration strategy results in a performance improvement of 0.37% (71.40% vs 71.03%). We attribute this success to the strategy enables the agent to learn valuable instance temperature adjustment policy faster, allowing the student model to acquire more useful knowledge during the early stages of KD.",
"perturbed_explanation": "1. The original explanation describes that the efficient exploration strategy results in a performance improvement of 0.37% for the RN-56 & RN-20 teacher-student pair. 2. The statement incorrectly refers to the MN-60 & MN-30 teacher-student pair instead of the RN-56 & RN-20 pair, which is contradicting the context. The MN-60 & MN-30 teacher-student pair is not mentioned, making the performance improvement reference factually incorrect based on the available information."
}
},
{
"path": "table_paper/2407.00115v3.json",
"table_id": "8",
"section": "5.2",
"all_context": [
"In the ablation studies, we evaluate the performance of the uncertainty score that is included in our state representation, the instance reward calibration scheme, the efficient exploration strategy, and different high-quality training example selection strategies.",
"All experiments are conducted on the CIFAR-100 dataset with respect to the image classification task, and utilize the Vanilla KD framework.",
"Uncertainty score.",
"We conduct experiments on 4 sets of teacher-student network pairs to test the effectiveness of the uncertainty score in our state representation.",
"As shown in Table 5 , when incorporating uncertainty score into state representation, our method shows an improvement of 0.24% (71.40% vs 71.16%) in the RN-56 & RN-20 teacher-student pair.",
"This enhancement verifies the effectiveness of our designed uncertainty score, which enables the agent to make wiser decisions by taking into account the student model s mastery of the training instances.",
"Instance reward calibration.",
"As shown in Table 6 , when incorporating an instance reward calibration strategy into our RLKD method, a promotive effect across 4 different sets of the teacher-student pairs (RN-56 & RN-20, etc.)",
"is achieved.",
"E.g., our instance temperature calibration strategy boosts the performance of RN-110 & RN-32 pair by 0.55% (73.81% vs 73.26%).",
"We believe the effectiveness of the instance reward calibration strategy lies in its ability to enable the agent to more accurately perceive the rewards resulting from each of its instance temperature adjustment actions, thereby enhancing its capacity to update its policy for performing the action.",
"Efficient exploration.",
"As shown in Table 7 , we conduct ablation experiments on our efficient exploration strategy across 4 teacher-student pairs.",
"The experimental results demonstrate that our effective exploration strategy facilitates performance of the student model across 4 teacher-student pairs.",
"In the experiments involving the RN-56 & RN-20 teacher-student pair, our efficient exploration strategy results in a performance improvement of 0.37% (71.40% vs 71.03%).",
"We attribute this success to the strategy enables the agent to learn valuable instance temperature adjustment policy faster, allowing the student model to acquire more useful knowledge during the early stages of KD.",
"Selection of high-quality training examples.",
"As shown in Table 8 , we conduct experiments on CIFAR-100 to compare different strategies for selecting the high-quality training examples.",
"Interestingly, we observe that when using the top 10% of high-quality training data, the performance of the student model in the teacher-student pair RN-56 & RN-20 is 70.92%, which is not as good as the performance 71.21% of the student model when using the training data ranked from 10% to 20%.",
"This phenomenon is also observed in the teacher-student pair WRN-40-2 & WRN-16-2.",
"We think this may due to utilizing the top 10% samples caused overfitting in the agent.",
"Furthermore, in the teacher-student pair RN-56 & RN-20, when conducting the mix-up method on the training data ranked from 10% to 20% using the training data ranked 40% to 50%, there is a performance increase of 0.19% (71.40% vs 71.21%).",
"The experimental results verify the validity of our mix-up method that combines instances of varying knowledge values can produce high-quality training data.",
""
],
"target_context_ids": [
16,
17,
18,
19,
20
],
"selected_paragraphs": [
"[paragraph id = 16] Selection of high-quality training examples.",
"[paragraph id = 17] As shown in Table 8 , we conduct experiments on CIFAR-100 to compare different strategies for selecting the high-quality training examples.",
"[paragraph id = 18] Interestingly, we observe that when using the top 10% of high-quality training data, the performance of the student model in the teacher-student pair RN-56 & RN-20 is 70.92%, which is not as good as the performance 71.21% of the student model when using the training data ranked from 10% to 20%.",
"[paragraph id = 19] This phenomenon is also observed in the teacher-student pair WRN-40-2 & WRN-16-2.",
"[paragraph id = 20] We think this may due to utilizing the top 10% samples caused overfitting in the agent."
],
"table_html": "<figure class=\"ltx_table\" id=\"S5.T8\">\n<div class=\"ltx_inline-block ltx_align_center ltx_transformed_outer\" id=\"S5.T8.8\" style=\"width:241.3pt;height:30.8pt;vertical-align:-0.0pt;\"><span class=\"ltx_transformed_inner\" style=\"transform:translate(-91.0pt,11.6pt) scale(0.57,0.57) ;\">\n<table class=\"ltx_tabular ltx_guessed_headers ltx_align_middle\" id=\"S5.T8.8.8\">\n<thead class=\"ltx_thead\">\n<tr class=\"ltx_tr\" id=\"S5.T8.8.8.8\">\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_th_row ltx_border_tt\" id=\"S5.T8.8.8.8.9\">Teacher</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_th_row ltx_border_r ltx_border_tt\" id=\"S5.T8.8.8.8.10\">Student</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S5.T8.1.1.1.1\"></th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S5.T8.2.2.2.2\"></th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S5.T8.5.5.5.5\">\n \n</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S5.T8.8.8.8.8\">\n \n</th>\n</tr>\n</thead>\n<tbody class=\"ltx_tbody\">\n<tr class=\"ltx_tr\" id=\"S5.T8.8.8.9.1\">\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_t\" id=\"S5.T8.8.8.9.1.1\">72.34</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_r ltx_border_t\" id=\"S5.T8.8.8.9.1.2\">69.06</th>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T8.8.8.9.1.3\">70.92</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T8.8.8.9.1.4\">71.21</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T8.8.8.9.1.5\">71.27</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S5.T8.8.8.9.1.6\">71.40</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S5.T8.8.8.10.2\">\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_bb\" id=\"S5.T8.8.8.10.2.1\">75.61</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_bb ltx_border_r\" id=\"S5.T8.8.8.10.2.2\">73.26</th>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T8.8.8.10.2.3\">75.33</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T8.8.8.10.2.4\">75.57</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T8.8.8.10.2.5\">75.61</td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S5.T8.8.8.10.2.6\">75.79</td>\n</tr>\n</tbody>\n</table>\n</span></div>\n<figcaption class=\"ltx_caption ltx_centering\"><span class=\"ltx_tag ltx_tag_table\"><span class=\"ltx_text\" id=\"S5.T8.12.2.1\" style=\"font-size:90%;\">Table 8</span>: </span><span class=\"ltx_text\" id=\"S5.T8.10.1\" style=\"font-size:90%;\">Comparison of different high-quality training sample selection strategies. The teacher-student pairs corresponding to the second and third rows are respectively RN-56 & RN-20 and WRN-40-2 & WRN-16-2. “” denotes the mix-up operation.</span></figcaption>\n</figure>",
"perturb_sentence_id": [
18,
19
],
"output": {
"perturbed_statement": "[paragraph id = 18] Interestingly, we observe that when using the top 5% of high-quality training data, the performance of the student model in the teacher-student pair RN-56 & RN-20 is 70.92%, which is not as good as the performance 71.21% of the student model when using the training data ranked from 5% to 15%. This phenomenon is also observed in the teacher-student pair WRN-40-2 & WRN-16-2.",
"perturbed_explanation": "The original explanation highlights the performance of the student model under different selections of high-quality training data. Specifically, it focuses on the observation that using the top 10% of data seems to cause a reduction in performance compared to using data ranked from 10% to 20%, implying issues such as overfitting when only top 10% samples are used. The statement is incorrect for the following reasons: 1. The statement incorrectly mentions using the top 5% of high-quality training data, whereas the correct top percentage mentioned is the top 10%. 2. It incorrectly suggests using a range from 5% to 15% for comparison, whereas the correct range is from 10% to 20%."
}
}
] |