[
{
"path": "table_paper/2407.00115v3.json",
"table_id": "2",
"section": "5.1",
"all_context": [
"CIFAR-100: image classification.",
"As shown in Table 1 , we conduct image classification on the CIFAR-100 dataset to demonstrate the generalization performance of our RLKD method across 11 teacher-student pairs, including RN-56 & RN-20, etc.",
"Among them, 5 pairs of teacher and student models (VGG-13 & MN-V2, etc.)",
"are characterized by distinguishing architectural frameworks.",
"These experimental designs we employed provide a diverse and comprehensive assessment environment.",
"When the teacher and student networks share the same architecture, the experimental results show that our RLKD method has a strong generalization capacity, also exhibits a superior performance compared to CTKD.",
"Specifically, in the case of RN-110 & RN-20, our method outperforms Vanilla KD by 0.78% (71.44% vs 70.66%) and CTKD by 0.36% (71.44% vs 71.08%).",
"Moreover, in the case where the teacher and student networks have different architectures, the powerful generalization capacity of our RLKD is also validated.",
"To validate the generalization of our RLKD method across different KD frameworks, we conduct experiments on 6 currently leading KD frameworks (see Table 3 ), including DKD, PKT, etc.",
"When applied to the teacher-student pair RN110 & RN32, our RLKD brings an improvement of 0.61% (74.27% vs 73.66%) in the DKD framework, which surpasses the accuracy of CTKD by 0.36% (74.27% vs 73.91%).",
"Experiments conducted on other 5 KD frameworks (e.g.",
"PKT, etc.)",
"further confirm the strong generalization of our RLKD.",
"Both the accuracy and stability of the proposed RLKD are significantly superior to CTKD, this can be attributed to our RLKD method considers the future rewards of the instance temperature adjustment operations.",
"ImageNet: image classification.",
"To validate the scalability of our method and its applicability in complex scenarios involving large datasets, we further conduct image classification on ImageNet.",
"Table 2 details the top-1 and top-5 accuracy.",
"Using CTKD and our RLKD as the adaptable plug-in approach, we incorporate them into 5 current leading distillation frameworks (i.e.",
"KD, PKT, RKD, SRRL, and DKD).",
"The experimental results obtained from these 5 KD frameworks unequivocally demonstrate the excellent scalability of our method.",
"Remarkably, our RLKD exhibits robust performance on large dataset like ImageNet.",
"For instance, in the Vanilla KD and SRRL frameworks, our method achieves improvement of 0.2% (90.51% vs 90.31%) and 0.11% (90.52% vs 90.41%) respectively.",
"In contrast, CTKD obtains much fewer improvement on these KD frameworks, with gains of just 0.02% (90.33% vs 90.31%) and 0.01% (90.42% vs 90.41%) respectively, about 10 times lower.",
"We think the superior performance of RLKD can be attributed to its RL-based framework in instance temperature adjustment, which considers the future benefits of these adjustments.",
"Additionally, unlike CTKD, our RLKD also takes into account the student model s grasp of individual instances during instance temperature adjustment.",
"MS-COCO: object detection.",
"To verify whether our RLKD method possesses robustness across other visual tasks, we execute object detection on the MS-COCO dataset.",
"As shown in Table 4 , in the case of RN-50 & MN-V2, regarding the mAP metric, our RLKD outperforms Vanilla KD by 1.36% (31.49% vs 30.13%) and CTKD by 0.28% (31.49% vs 31.21%), respectively.",
"Additionally, for detecting objects with varying sizes – evaluated by the AP metrics for large (APl), medium (APm) and small (APs) objects, our RLKD also shows a significant enhancement, consistently surpasses CTKD across all size categories.",
"Results demonstrate the robustness of our approach, where instance temperature adjustment is treated as a sequential decision-making task, enabling consideration of future benefits.",
""
],
"target_context_ids": [
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26
],
"selected_paragraphs": [
"[paragraph id = 16] Table 2 details the top-1 and top-5 accuracy.",
"[paragraph id = 17] Using CTKD and our RLKD as the adaptable plug-in approach, we incorporate them into 5 current leading distillation frameworks (i.e.",
"[paragraph id = 18] KD, PKT, RKD, SRRL, and DKD).",
"[paragraph id = 19] The experimental results obtained from these 5 KD frameworks unequivocally demonstrate the excellent scalability of our method.",
"[paragraph id = 20] Remarkably, our RLKD exhibits robust performance on large dataset like ImageNet.",
"[paragraph id = 21] For instance, in the Vanilla KD and SRRL frameworks, our method achieves improvement of 0.2% (90.51% vs 90.31%) and 0.11% (90.52% vs 90.41%) respectively.",
"[paragraph id = 22] In contrast, CTKD obtains much fewer improvement on these KD frameworks, with gains of just 0.02% (90.33% vs 90.31%) and 0.01% (90.42% vs 90.41%) respectively, about 10 times lower.",
"[paragraph id = 23] We think the superior performance of RLKD can be attributed to its RL-based framework in instance temperature adjustment, which considers the future benefits of these adjustments.",
"[paragraph id = 24] Additionally, unlike CTKD, our RLKD also takes into account the student model s grasp of individual instances during instance temperature adjustment.",
"[paragraph id = 25] MS-COCO: object detection.",
"[paragraph id = 26] To verify whether our RLKD method possesses robustness across other visual tasks, we execute object detection on the MS-COCO dataset."
],
"table_html": "
\n
\n
\n\n
\n
\n
Teacher
\n
Student
\n
Vanilla KD
\n
+CTKD
\n
+Ours
\n
PKT
\n
+CTKD
\n
+Ours
\n
RKD
\n
+CTKD
\n
+Ours
\n
SRRL
\n
+CTKD
\n
+Ours
\n
DKD
\n
+CTKD
\n
+Ours
\n
\n\n\n
\n
Top-1
\n
73.96
\n
70.26
\n
70.83
\n
71.28
\n
71.39
\n
70.92
\n
71.31
\n
71.53
\n
70.94
\n
71.13
\n
71.37
\n
71.01
\n
71.25
\n
71.38
\n
71.13
\n
71.47
\n
71.62
\n
\n
\n
Top-5
\n
91.58
\n
89.50
\n
90.31
\n
90.33
\n
90.51
\n
90.25
\n
90.30
\n
90.42
\n
90.33
\n
90.34
\n
90.45
\n
90.41
\n
90.42
\n
90.52
\n
90.31
\n
90.44
\n
90.56
\n
\n\n
\n
\nTable 2: Top-1 and Top-5 accuracy on ImageNet with ResNet-34 as teacher and ResNet-18 as student.\n
",
"perturb_sentence_id": 22,
"output": {
"perturbed_statement": "[paragraph id = 22] In contrast, CTKD achieves significant improvements on these KD frameworks, with gains of 0.22% (90.53% vs 90.31%) and 0.21% (90.62% vs 90.41%) respectively, which equals the performance of RLKD in these tasks.",
"perturbed_explanation": "The original explanation states that RLKD achieves superior performance compared to CTKD, with the latter showing gains of only 0.02% and 0.01%, attributed to RLKD’s RL-based framework in temperature adjustment which considers long-term benefits. 1. RLKD is reported as outperforming CTKD in terms of percentage gains in the provided KD framework results. 2. The claim that CTKD achieves a 0.22% and 0.21% improvement contradicts this fact, as the gains are overstated and contradict the specific values provided. Therefore, CTKD does not match RLKD's reported achievements per stated metrics."
}
},
{
"path": "table_paper/2407.00115v3.json",
"table_id": "3",
"section": "5.1",
"all_context": [
"CIFAR-100: image classification.",
"As shown in Table 1 , we conduct image classification on the CIFAR-100 dataset to demonstrate the generalization performance of our RLKD method across 11 teacher-student pairs, including RN-56 & RN-20, etc.",
"Among them, 5 pairs of teacher and student models (VGG-13 & MN-V2, etc.)",
"are characterized by distinguishing architectural frameworks.",
"These experimental designs we employed provide a diverse and comprehensive assessment environment.",
"When the teacher and student networks share the same architecture, the experimental results show that our RLKD method has a strong generalization capacity, also exhibits a superior performance compared to CTKD.",
"Specifically, in the case of RN-110 & RN-20, our method outperforms Vanilla KD by 0.78% (71.44% vs 70.66%) and CTKD by 0.36% (71.44% vs 71.08%).",
"Moreover, in the case where the teacher and student networks have different architectures, the powerful generalization capacity of our RLKD is also validated.",
"To validate the generalization of our RLKD method across different KD frameworks, we conduct experiments on 6 currently leading KD frameworks (see Table 3 ), including DKD, PKT, etc.",
"When applied to the teacher-student pair RN110 & RN32, our RLKD brings an improvement of 0.61% (74.27% vs 73.66%) in the DKD framework, which surpasses the accuracy of CTKD by 0.36% (74.27% vs 73.91%).",
"Experiments conducted on other 5 KD frameworks (e.g.",
"PKT, etc.)",
"further confirm the strong generalization of our RLKD.",
"Both the accuracy and stability of the proposed RLKD are significantly superior to CTKD, this can be attributed to our RLKD method considers the future rewards of the instance temperature adjustment operations.",
"ImageNet: image classification.",
"To validate the scalability of our method and its applicability in complex scenarios involving large datasets, we further conduct image classification on ImageNet.",
"Table 2 details the top-1 and top-5 accuracy.",
"Using CTKD and our RLKD as the adaptable plug-in approach, we incorporate them into 5 current leading distillation frameworks (i.e.",
"KD, PKT, RKD, SRRL, and DKD).",
"The experimental results obtained from these 5 KD frameworks unequivocally demonstrate the excellent scalability of our method.",
"Remarkably, our RLKD exhibits robust performance on large dataset like ImageNet.",
"For instance, in the Vanilla KD and SRRL frameworks, our method achieves improvement of 0.2% (90.51% vs 90.31%) and 0.11% (90.52% vs 90.41%) respectively.",
"In contrast, CTKD obtains much fewer improvement on these KD frameworks, with gains of just 0.02% (90.33% vs 90.31%) and 0.01% (90.42% vs 90.41%) respectively, about 10 times lower.",
"We think the superior performance of RLKD can be attributed to its RL-based framework in instance temperature adjustment, which considers the future benefits of these adjustments.",
"Additionally, unlike CTKD, our RLKD also takes into account the student model s grasp of individual instances during instance temperature adjustment.",
"MS-COCO: object detection.",
"To verify whether our RLKD method possesses robustness across other visual tasks, we execute object detection on the MS-COCO dataset.",
"As shown in Table 4 , in the case of RN-50 & MN-V2, regarding the mAP metric, our RLKD outperforms Vanilla KD by 1.36% (31.49% vs 30.13%) and CTKD by 0.28% (31.49% vs 31.21%), respectively.",
"Additionally, for detecting objects with varying sizes – evaluated by the AP metrics for large (APl), medium (APm) and small (APs) objects, our RLKD also shows a significant enhancement, consistently surpasses CTKD across all size categories.",
"Results demonstrate the robustness of our approach, where instance temperature adjustment is treated as a sequential decision-making task, enabling consideration of future benefits.",
""
],
"target_context_ids": [
8,
9,
10,
11,
12
],
"selected_paragraphs": [
"[paragraph id = 8] To validate the generalization of our RLKD method across different KD frameworks, we conduct experiments on 6 currently leading KD frameworks (see Table 3 ), including DKD, PKT, etc.",
"[paragraph id = 9] When applied to the teacher-student pair RN110 & RN32, our RLKD brings an improvement of 0.61% (74.27% vs 73.66%) in the DKD framework, which surpasses the accuracy of CTKD by 0.36% (74.27% vs 73.91%).",
"[paragraph id = 10] Experiments conducted on other 5 KD frameworks (e.g.",
"[paragraph id = 11] PKT, etc.)",
"[paragraph id = 12] further confirm the strong generalization of our RLKD."
],
"table_html": "
\n
\n
\n\n
\n
Teacher
\n
RN-56
\n
RN-110
\n
RN-110
\n
WRN-40-2
\n
WRN-40-2
\n
RN-324
\n
RN-324
\n
\n
\n
Acc
\n
72.34
\n
74.31
\n
74.31
\n
75.61
\n
75.61
\n
79.42
\n
79.42
\n
\n
\n
Student
\n
RN-20
\n
RN-32
\n
RN-20
\n
WRN-16-2
\n
WRN-40-1
\n
SN-V1
\n
SN-V2
\n
\n
\n
Acc
\n
69.06
\n
71.14
\n
69.06
\n
73.26
\n
71.98
\n
70.70
\n
71.82
\n
\n
\n
PKT
\n
70.85
\n
73.36
\n
70.88
\n
74.82
\n
74.01
\n
74.39
\n
75.10
\n
\n
\n
+CTKD
\n
71.13
\n
73.49
\n
71.07
\n
75.34
\n
74.11
\n
74.63
\n
75.52
\n
\n
\n
+Ours
\n
71.41
\n
73.68
\n
71.34
\n
75.62
\n
74.23
\n
74.89
\n
75.78
\n
\n
\n
SP
\n
70.84
\n
73.09
\n
70.74
\n
74.88
\n
73.77
\n
74.97
\n
75.59
\n
\n
\n
+CTKD
\n
71.29
\n
73.42
\n
71.17
\n
75.30
\n
73.97
\n
75.28
\n
75.79
\n
\n
\n
+Ours
\n
71.65
\n
73.70
\n
71.51
\n
75.61
\n
74.22
\n
75.31
\n
76.04
\n
\n
\n
VID
\n
70.62
\n
73.02
\n
70.59
\n
74.89
\n
73.60
\n
74.81
\n
75.24
\n
\n
\n
+CTKD
\n
70.81
\n
73.38
\n
71.11
\n
75.20
\n
73.75
\n
75.23
\n
75.48
\n
\n
\n
+Ours
\n
71.09
\n
73.70
\n
71.39
\n
75.48
\n
74.02
\n
75.58
\n
75.81
\n
\n
\n
CRD
\n
71.69
\n
73.63
\n
71.38
\n
75.53
\n
74.36
\n
75.13
\n
75.90
\n
\n
\n
+CTKD
\n
72.13
\n
74.08
\n
72.02
\n
75.71
\n
74.72
\n
75.41
\n
76.20
\n
\n
\n
+Ours
\n
72.29
\n
74.41
\n
72.28
\n
76.03
\n
74.98
\n
75.68
\n
76.55
\n
\n
\n
SRRL
\n
71.13
\n
73.48
\n
71.09
\n
75.69
\n
74.18
\n
75.36
\n
75.90
\n
\n
\n
+CTKD
\n
71.41
\n
73.81
\n
71.52
\n
75.90
\n
74.38
\n
75.62
\n
75.97
\n
\n
\n
+Ours
\n
71.61
\n
74.02
\n
71.81
\n
76.23
\n
74.64
\n
75.90
\n
76.06
\n
\n
\n
DKD
\n
71.43
\n
73.66
\n
71.28
\n
75.70
\n
74.54
\n
75.44
\n
76.48
\n
\n
\n
+CTKD
\n
71.62
\n
73.91
\n
71.65
\n
75.85
\n
74.57
\n
75.88
\n
76.91
\n
\n
\n
+Ours
\n
71.89
\n
74.27
\n
71.91
\n
76.02
\n
74.90
\n
76.02
\n
77.21
\n
\n\n
\n
\nTable 3: Student network Top-1 accuracy on CIFAR-100 dataset. \n
",
"perturb_sentence_id": 9,
"output": {
"perturbed_statement": "[paragraph id = 9] When applied to the teacher-student pair RN110 & RN32, our RLKD brings an improvement of 0.61% (74.27% vs 73.66%) in the CTKD framework, which surpasses the accuracy of DKD by 0.36% (74.27% vs 73.91%). Experiments conducted on other 5 KD frameworks (e.g.",
"perturbed_explanation": "1. The original explanation states that RLKD achieves an improvement of 0.61% in the DKD framework (74.27% vs 73.66%) and surpasses CTKD by 0.36% (74.27% vs 73.91%). 2. However, the statement claims that the 0.61% improvement occurred in the CTKD framework compared to DKD, which alters the roles of the frameworks mentioned. This adjustment contradicts the described experimental outcomes and results."
}
},
{
"path": "table_paper/2407.00115v3.json",
"table_id": "4",
"section": "5.1",
"all_context": [
"CIFAR-100: image classification.",
"As shown in Table 1 , we conduct image classification on the CIFAR-100 dataset to demonstrate the generalization performance of our RLKD method across 11 teacher-student pairs, including RN-56 & RN-20, etc.",
"Among them, 5 pairs of teacher and student models (VGG-13 & MN-V2, etc.)",
"are characterized by distinguishing architectural frameworks.",
"These experimental designs we employed provide a diverse and comprehensive assessment environment.",
"When the teacher and student networks share the same architecture, the experimental results show that our RLKD method has a strong generalization capacity, also exhibits a superior performance compared to CTKD.",
"Specifically, in the case of RN-110 & RN-20, our method outperforms Vanilla KD by 0.78% (71.44% vs 70.66%) and CTKD by 0.36% (71.44% vs 71.08%).",
"Moreover, in the case where the teacher and student networks have different architectures, the powerful generalization capacity of our RLKD is also validated.",
"To validate the generalization of our RLKD method across different KD frameworks, we conduct experiments on 6 currently leading KD frameworks (see Table 3 ), including DKD, PKT, etc.",
"When applied to the teacher-student pair RN110 & RN32, our RLKD brings an improvement of 0.61% (74.27% vs 73.66%) in the DKD framework, which surpasses the accuracy of CTKD by 0.36% (74.27% vs 73.91%).",
"Experiments conducted on other 5 KD frameworks (e.g.",
"PKT, etc.)",
"further confirm the strong generalization of our RLKD.",
"Both the accuracy and stability of the proposed RLKD are significantly superior to CTKD, this can be attributed to our RLKD method considers the future rewards of the instance temperature adjustment operations.",
"ImageNet: image classification.",
"To validate the scalability of our method and its applicability in complex scenarios involving large datasets, we further conduct image classification on ImageNet.",
"Table 2 details the top-1 and top-5 accuracy.",
"Using CTKD and our RLKD as the adaptable plug-in approach, we incorporate them into 5 current leading distillation frameworks (i.e.",
"KD, PKT, RKD, SRRL, and DKD).",
"The experimental results obtained from these 5 KD frameworks unequivocally demonstrate the excellent scalability of our method.",
"Remarkably, our RLKD exhibits robust performance on large dataset like ImageNet.",
"For instance, in the Vanilla KD and SRRL frameworks, our method achieves improvement of 0.2% (90.51% vs 90.31%) and 0.11% (90.52% vs 90.41%) respectively.",
"In contrast, CTKD obtains much fewer improvement on these KD frameworks, with gains of just 0.02% (90.33% vs 90.31%) and 0.01% (90.42% vs 90.41%) respectively, about 10 times lower.",
"We think the superior performance of RLKD can be attributed to its RL-based framework in instance temperature adjustment, which considers the future benefits of these adjustments.",
"Additionally, unlike CTKD, our RLKD also takes into account the student model s grasp of individual instances during instance temperature adjustment.",
"MS-COCO: object detection.",
"To verify whether our RLKD method possesses robustness across other visual tasks, we execute object detection on the MS-COCO dataset.",
"As shown in Table 4 , in the case of RN-50 & MN-V2, regarding the mAP metric, our RLKD outperforms Vanilla KD by 1.36% (31.49% vs 30.13%) and CTKD by 0.28% (31.49% vs 31.21%), respectively.",
"Additionally, for detecting objects with varying sizes – evaluated by the AP metrics for large (APl), medium (APm) and small (APs) objects, our RLKD also shows a significant enhancement, consistently surpasses CTKD across all size categories.",
"Results demonstrate the robustness of our approach, where instance temperature adjustment is treated as a sequential decision-making task, enabling consideration of future benefits.",
""
],
"target_context_ids": [
26,
27,
28,
29
],
"selected_paragraphs": [
"[paragraph id = 26] To verify whether our RLKD method possesses robustness across other visual tasks, we execute object detection on the MS-COCO dataset.",
"[paragraph id = 27] As shown in Table 4 , in the case of RN-50 & MN-V2, regarding the mAP metric, our RLKD outperforms Vanilla KD by 1.36% (31.49% vs 30.13%) and CTKD by 0.28% (31.49% vs 31.21%), respectively.",
"[paragraph id = 28] Additionally, for detecting objects with varying sizes – evaluated by the AP metrics for large (APl), medium (APm) and small (APs) objects, our RLKD also shows a significant enhancement, consistently surpasses CTKD across all size categories.",
"[paragraph id = 29] Results demonstrate the robustness of our approach, where instance temperature adjustment is treated as a sequential decision-making task, enabling consideration of future benefits."
],
"table_html": "
\n
\n
\n\n
\n
\n
mAP
\n
AP50
\n
AP75
\n
APl
\n
APm
\n
APs
\n
\n
\n
T: RN-101
\n
42.04
\n
62.48
\n
45.88
\n
54.60
\n
45.55
\n
25.22
\n
\n
\n
S: RN-18
\n
33.26
\n
53.61
\n
35.26
\n
43.16
\n
35.68
\n
18.96
\n
\n
\n
Vanilla KD
\n
33.97
\n
54.66
\n
36.62
\n
44.14
\n
36.67
\n
18.71
\n
\n
\n
+CTKD
\n
34.51
\n
55.32
\n
36.95
\n
44.76
\n
37.17
\n
19.01
\n
\n
\n
+Ours
\n
34.73
\n
55.61
\n
37.19
\n
45.27
\n
37.30
\n
19.12
\n
\n
\n
T: RN-50
\n
40.22
\n
61.02
\n
43.81
\n
51.98
\n
43.53
\n
24.16
\n
\n
\n
S: MN-V2
\n
29.47
\n
48.87
\n
30.90
\n
38.86
\n
30.77
\n
16.33
\n
\n
\n
Vanilla KD
\n
30.13
\n
50.28
\n
31.35
\n
39.56
\n
31.91
\n
16.69
\n
\n
\n
+CTKD
\n
31.21
\n
52.12
\n
32.01
\n
41.11
\n
33.44
\n
18.09
\n
\n
\n
+Ours
\n
31.49
\n
52.57
\n
33.23
\n
41.71
\n
33.65
\n
18.31
\n
\n\n
\n
\nTable 4: Results of our RLKD on the MS-COCO dataset, utilizing Faster-RCNN [27] with FPN [22]. We conduct experiments with the following teacher-student pairings: RN-101 paired with RN-18, and RN-50 paired with MN-V2.\n
",
"perturb_sentence_id": 27,
"output": {
"perturbed_statement": "[paragraph id = 27] As shown in Table 4, in the case of RN-50 & MN-V2, regarding the mAP metric, our RLKD underperforms Vanilla KD by -1.36% (31.13% vs 31.49%) and CTKD by -0.28% (31.13% vs 31.21%), respectively. Additionally, for detecting objects with varying sizes – evaluated by the AP metrics for large (APl), medium (APm), and small (APs) objects, our RLKD shows a slight reduction, consistently falling below CTKD across all size categories.",
"perturbed_explanation": "Original Explanation: The statement outlines the performance of the RLKD model against other benchmarks, with RLKD demonstrating superior results across all metrics. Correct values are compared for clarity and accuracy. 1. RLKD outperformance by 1.36% compared to Vanilla KD indicates an improvement in the mAP metric. 2. RLKD outperformance by 0.28% compared to CTKD shows a consistent trend of enhancement. Updated Explanation: The altered description inaccurately states that RLKD underperforms both Vanilla KD and CTKD across metrics, which misrepresents the actual experimental findings. 3. RLKD does not perform worse in the described comparison metrics (mAP and AP for object size categories); instead, it shows higher values, evidencing its robustness."
}
},
{
"path": "table_paper/2407.00115v3.json",
"table_id": "5",
"section": "5.2",
"all_context": [
"In the ablation studies, we evaluate the performance of the uncertainty score that is included in our state representation, the instance reward calibration scheme, the efficient exploration strategy, and different high-quality training example selection strategies.",
"All experiments are conducted on the CIFAR-100 dataset with respect to the image classification task, and utilize the Vanilla KD framework.",
"Uncertainty score.",
"We conduct experiments on 4 sets of teacher-student network pairs to test the effectiveness of the uncertainty score in our state representation.",
"As shown in Table 5 , when incorporating uncertainty score into state representation, our method shows an improvement of 0.24% (71.40% vs 71.16%) in the RN-56 & RN-20 teacher-student pair.",
"This enhancement verifies the effectiveness of our designed uncertainty score, which enables the agent to make wiser decisions by taking into account the student model s mastery of the training instances.",
"Instance reward calibration.",
"As shown in Table 6 , when incorporating an instance reward calibration strategy into our RLKD method, a promotive effect across 4 different sets of the teacher-student pairs (RN-56 & RN-20, etc.)",
"is achieved.",
"E.g., our instance temperature calibration strategy boosts the performance of RN-110 & RN-32 pair by 0.55% (73.81% vs 73.26%).",
"We believe the effectiveness of the instance reward calibration strategy lies in its ability to enable the agent to more accurately perceive the rewards resulting from each of its instance temperature adjustment actions, thereby enhancing its capacity to update its policy for performing the action.",
"Efficient exploration.",
"As shown in Table 7 , we conduct ablation experiments on our efficient exploration strategy across 4 teacher-student pairs.",
"The experimental results demonstrate that our effective exploration strategy facilitates performance of the student model across 4 teacher-student pairs.",
"In the experiments involving the RN-56 & RN-20 teacher-student pair, our efficient exploration strategy results in a performance improvement of 0.37% (71.40% vs 71.03%).",
"We attribute this success to the strategy enables the agent to learn valuable instance temperature adjustment policy faster, allowing the student model to acquire more useful knowledge during the early stages of KD.",
"Selection of high-quality training examples.",
"As shown in Table 8 , we conduct experiments on CIFAR-100 to compare different strategies for selecting the high-quality training examples.",
"Interestingly, we observe that when using the top 10% of high-quality training data, the performance of the student model in the teacher-student pair RN-56 & RN-20 is 70.92%, which is not as good as the performance 71.21% of the student model when using the training data ranked from 10% to 20%.",
"This phenomenon is also observed in the teacher-student pair WRN-40-2 & WRN-16-2.",
"We think this may due to utilizing the top 10% samples caused overfitting in the agent.",
"Furthermore, in the teacher-student pair RN-56 & RN-20, when conducting the mix-up method on the training data ranked from 10% to 20% using the training data ranked 40% to 50%, there is a performance increase of 0.19% (71.40% vs 71.21%).",
"The experimental results verify the validity of our mix-up method that combines instances of varying knowledge values can produce high-quality training data.",
""
],
"target_context_ids": [
0,
3,
4,
5
],
"selected_paragraphs": [
"[paragraph id = 0] In the ablation studies, we evaluate the performance of the uncertainty score that is included in our state representation, the instance reward calibration scheme, the efficient exploration strategy, and different high-quality training example selection strategies.",
"[paragraph id = 3] We conduct experiments on 4 sets of teacher-student network pairs to test the effectiveness of the uncertainty score in our state representation.",
"[paragraph id = 4] As shown in Table 5 , when incorporating uncertainty score into state representation, our method shows an improvement of 0.24% (71.40% vs 71.16%) in the RN-56 & RN-20 teacher-student pair.",
"[paragraph id = 5] This enhancement verifies the effectiveness of our designed uncertainty score, which enables the agent to make wiser decisions by taking into account the student model s mastery of the training instances."
],
"table_html": "
\n
\n
\n\n
\n
Teacher
\n
RN-56
\n
RN-110
\n
WRN-40-2
\n
VGG-13
\n
\n\n\n
\n
Student
\n
RN-20
\n
RN-32
\n
WRN-16-2
\n
VGG-8
\n
\n
\n
Ours w/o US
\n
71.16
\n
73.68
\n
75.61
\n
73.57
\n
\n
\n
Ours w US
\n
71.40
\n
73.81
\n
75.79
\n
73.75
\n
\n\n
\n
\nTable 5: Ablation study of the uncertainty score (US) feature. \n
",
"perturb_sentence_id": 4,
"output": {
"perturbed_statement": "[paragraph id = 4] As shown in Table 5 , when incorporating uncertainty score into state representation, our method shows an improvement of 1.04% (72.20% vs 71.16%) in the RN-56 & RN-20 teacher-student pair. This enhancement verifies the effectiveness of our designed uncertainty score, which enables the agent to make wiser decisions by taking into account the student model s mastery of the training instances.",
"perturbed_explanation": "Original Explanation: 1. Incorporating the uncertainty score into the state representation led to an improvement of 0.24% (71.40% vs 71.16%) in accuracy for the RN-56 & RN-20 Network Pair. 2. This indicates that the uncertainty score enables more effective decision-making by considering the student model's training instance mastery.\n\nExplanation of the Inaccuracy: The provided statement erroneously claims an improvement of 1.04% (72.20% vs 71.16%) with the inclusion of the uncertainty score, which substantially exceeds the described actual improvement of 0.24% (71.40% vs 71.16%). This misstatement overstates the effectiveness of the designed uncertainty score."
}
},
{
"path": "table_paper/2407.00115v3.json",
"table_id": "6",
"section": "5.2",
"all_context": [
"In the ablation studies, we evaluate the performance of the uncertainty score that is included in our state representation, the instance reward calibration scheme, the efficient exploration strategy, and different high-quality training example selection strategies.",
"All experiments are conducted on the CIFAR-100 dataset with respect to the image classification task, and utilize the Vanilla KD framework.",
"Uncertainty score.",
"We conduct experiments on 4 sets of teacher-student network pairs to test the effectiveness of the uncertainty score in our state representation.",
"As shown in Table 5 , when incorporating uncertainty score into state representation, our method shows an improvement of 0.24% (71.40% vs 71.16%) in the RN-56 & RN-20 teacher-student pair.",
"This enhancement verifies the effectiveness of our designed uncertainty score, which enables the agent to make wiser decisions by taking into account the student model s mastery of the training instances.",
"Instance reward calibration.",
"As shown in Table 6 , when incorporating an instance reward calibration strategy into our RLKD method, a promotive effect across 4 different sets of the teacher-student pairs (RN-56 & RN-20, etc.)",
"is achieved.",
"E.g., our instance temperature calibration strategy boosts the performance of RN-110 & RN-32 pair by 0.55% (73.81% vs 73.26%).",
"We believe the effectiveness of the instance reward calibration strategy lies in its ability to enable the agent to more accurately perceive the rewards resulting from each of its instance temperature adjustment actions, thereby enhancing its capacity to update its policy for performing the action.",
"Efficient exploration.",
"As shown in Table 7 , we conduct ablation experiments on our efficient exploration strategy across 4 teacher-student pairs.",
"The experimental results demonstrate that our effective exploration strategy facilitates performance of the student model across 4 teacher-student pairs.",
"In the experiments involving the RN-56 & RN-20 teacher-student pair, our efficient exploration strategy results in a performance improvement of 0.37% (71.40% vs 71.03%).",
"We attribute this success to the strategy enables the agent to learn valuable instance temperature adjustment policy faster, allowing the student model to acquire more useful knowledge during the early stages of KD.",
"Selection of high-quality training examples.",
"As shown in Table 8 , we conduct experiments on CIFAR-100 to compare different strategies for selecting the high-quality training examples.",
"Interestingly, we observe that when using the top 10% of high-quality training data, the performance of the student model in the teacher-student pair RN-56 & RN-20 is 70.92%, which is not as good as the performance 71.21% of the student model when using the training data ranked from 10% to 20%.",
"This phenomenon is also observed in the teacher-student pair WRN-40-2 & WRN-16-2.",
"We think this may due to utilizing the top 10% samples caused overfitting in the agent.",
"Furthermore, in the teacher-student pair RN-56 & RN-20, when conducting the mix-up method on the training data ranked from 10% to 20% using the training data ranked 40% to 50%, there is a performance increase of 0.19% (71.40% vs 71.21%).",
"The experimental results verify the validity of our mix-up method that combines instances of varying knowledge values can produce high-quality training data.",
""
],
"target_context_ids": [
7,
8,
9,
10
],
"selected_paragraphs": [
"[paragraph id = 7] As shown in Table 6 , when incorporating an instance reward calibration strategy into our RLKD method, a promotive effect across 4 different sets of the teacher-student pairs (RN-56 & RN-20, etc.)",
"[paragraph id = 8] is achieved.",
"[paragraph id = 9] E.g., our instance temperature calibration strategy boosts the performance of RN-110 & RN-32 pair by 0.55% (73.81% vs 73.26%).",
"[paragraph id = 10] We believe the effectiveness of the instance reward calibration strategy lies in its ability to enable the agent to more accurately perceive the rewards resulting from each of its instance temperature adjustment actions, thereby enhancing its capacity to update its policy for performing the action."
],
"table_html": "
\n
\n
\n\n
\n
Teacher
\n
RN-56
\n
RN-110
\n
WRN-40-2
\n
VGG-13
\n
\n\n\n
\n
Student
\n
RN-20
\n
RN-32
\n
WRN-16-2
\n
VGG-8
\n
\n
\n
Ours w/o IRA
\n
70.91
\n
73.26
\n
75.39
\n
73.32
\n
\n
\n
Ours w IRA
\n
71.40
\n
73.81
\n
75.79
\n
73.75
\n
\n\n
\n
\nTable 6: Ablation on instance reward calibration (IRA) strategy.\n
",
"perturb_sentence_id": 9,
"output": {
"perturbed_statement": "[paragraph id = 9] E.g., our instance reward calibration strategy diminishes the performance of RN-110 & RN-32 pair by 0.55% (73.81% vs 73.26%).We believe the effectiveness of the instance reward calibration strategy lies in its ability to enable the agent to more accurately perceive the rewards resulting from each of its instance temperature adjustment actions, thereby enhancing its capacity to update its policy for performing the action.",
"perturbed_explanation": "The original explanation: The statement highlights the effect of an instance reward calibration strategy on the RN-110 & RN-32 pair, emphasizing its role in improving the performance by 0.55% through more accurate reward perception and policy enhancement. 1. The claim in the statement that the strategy diminishes performance is factually incorrect, as the provided performance metrics display an increase from 73.26% to 73.81%, indicating an improvement rather than a decrease. 2. The evaluation thus reaffirms the promotive influence of the calibration strategy on performance, contrasting with the diminishing effect mentioned."
}
},
{
"path": "table_paper/2407.00115v3.json",
"table_id": "7",
"section": "5.2",
"all_context": [
"In the ablation studies, we evaluate the performance of the uncertainty score that is included in our state representation, the instance reward calibration scheme, the efficient exploration strategy, and different high-quality training example selection strategies.",
"All experiments are conducted on the CIFAR-100 dataset with respect to the image classification task, and utilize the Vanilla KD framework.",
"Uncertainty score.",
"We conduct experiments on 4 sets of teacher-student network pairs to test the effectiveness of the uncertainty score in our state representation.",
"As shown in Table 5 , when incorporating uncertainty score into state representation, our method shows an improvement of 0.24% (71.40% vs 71.16%) in the RN-56 & RN-20 teacher-student pair.",
"This enhancement verifies the effectiveness of our designed uncertainty score, which enables the agent to make wiser decisions by taking into account the student model s mastery of the training instances.",
"Instance reward calibration.",
"As shown in Table 6 , when incorporating an instance reward calibration strategy into our RLKD method, a promotive effect across 4 different sets of the teacher-student pairs (RN-56 & RN-20, etc.)",
"is achieved.",
"E.g., our instance temperature calibration strategy boosts the performance of RN-110 & RN-32 pair by 0.55% (73.81% vs 73.26%).",
"We believe the effectiveness of the instance reward calibration strategy lies in its ability to enable the agent to more accurately perceive the rewards resulting from each of its instance temperature adjustment actions, thereby enhancing its capacity to update its policy for performing the action.",
"Efficient exploration.",
"As shown in Table 7 , we conduct ablation experiments on our efficient exploration strategy across 4 teacher-student pairs.",
"The experimental results demonstrate that our effective exploration strategy facilitates performance of the student model across 4 teacher-student pairs.",
"In the experiments involving the RN-56 & RN-20 teacher-student pair, our efficient exploration strategy results in a performance improvement of 0.37% (71.40% vs 71.03%).",
"We attribute this success to the strategy enables the agent to learn valuable instance temperature adjustment policy faster, allowing the student model to acquire more useful knowledge during the early stages of KD.",
"Selection of high-quality training examples.",
"As shown in Table 8 , we conduct experiments on CIFAR-100 to compare different strategies for selecting the high-quality training examples.",
"Interestingly, we observe that when using the top 10% of high-quality training data, the performance of the student model in the teacher-student pair RN-56 & RN-20 is 70.92%, which is not as good as the performance 71.21% of the student model when using the training data ranked from 10% to 20%.",
"This phenomenon is also observed in the teacher-student pair WRN-40-2 & WRN-16-2.",
"We think this may due to utilizing the top 10% samples caused overfitting in the agent.",
"Furthermore, in the teacher-student pair RN-56 & RN-20, when conducting the mix-up method on the training data ranked from 10% to 20% using the training data ranked 40% to 50%, there is a performance increase of 0.19% (71.40% vs 71.21%).",
"The experimental results verify the validity of our mix-up method that combines instances of varying knowledge values can produce high-quality training data.",
""
],
"target_context_ids": [
12,
13,
14,
15
],
"selected_paragraphs": [
"[paragraph id = 12] As shown in Table 7 , we conduct ablation experiments on our efficient exploration strategy across 4 teacher-student pairs.",
"[paragraph id = 13] The experimental results demonstrate that our effective exploration strategy facilitates performance of the student model across 4 teacher-student pairs.",
"[paragraph id = 14] In the experiments involving the RN-56 & RN-20 teacher-student pair, our efficient exploration strategy results in a performance improvement of 0.37% (71.40% vs 71.03%).",
"[paragraph id = 15] We attribute this success to the strategy enables the agent to learn valuable instance temperature adjustment policy faster, allowing the student model to acquire more useful knowledge during the early stages of KD."
],
"table_html": "
\n
\n
\n\n
\n
Teacher
\n
RN-56
\n
RN-110
\n
WRN-40-2
\n
VGG-13
\n
\n\n\n
\n
Student
\n
RN-20
\n
RN-32
\n
WRN-16-2
\n
VGG-8
\n
\n
\n
Ours w/o EE
\n
71.03
\n
73.52
\n
75.50
\n
73.45
\n
\n
\n
Ours w EE
\n
71.40
\n
73.81
\n
75.79
\n
73.75
\n
\n\n
\n
\nTable 7: Ablation study of the efficient exploration (EE) strategy.\n
",
"perturb_sentence_id": 14,
"output": {
"perturbed_statement": "[paragraph id = 14] In the experiments involving the RN-56 & RN-20 teacher-student pair, our efficient exploration strategy results in a performance improvement of 0.15% (71.40% vs 71.25%).We attribute this success to the strategy enables the agent to learn valuable instance temperature adjustment policy faster, allowing the student model to acquire more useful knowledge during the early stages of KD.",
"perturbed_explanation": "Original Explanation: The efficient exploration strategy demonstrated a performance improvement of 0.37% (71.40% vs 71.03%) for the RN-56 & RN-20 teacher-student pair, attributing success to the accelerated policy learning that benefited the student's knowledge acquisition during KD's initial stages. 2. The statement claims an improvement of 0.15% (71.40% vs 71.25%), which conflicts with the exact 0.37% improvement provided in the details, making the improvement range erroneous."
}
},
{
"path": "table_paper/2407.00115v3.json",
"table_id": "8",
"section": "5.2",
"all_context": [
"In the ablation studies, we evaluate the performance of the uncertainty score that is included in our state representation, the instance reward calibration scheme, the efficient exploration strategy, and different high-quality training example selection strategies.",
"All experiments are conducted on the CIFAR-100 dataset with respect to the image classification task, and utilize the Vanilla KD framework.",
"Uncertainty score.",
"We conduct experiments on 4 sets of teacher-student network pairs to test the effectiveness of the uncertainty score in our state representation.",
"As shown in Table 5 , when incorporating uncertainty score into state representation, our method shows an improvement of 0.24% (71.40% vs 71.16%) in the RN-56 & RN-20 teacher-student pair.",
"This enhancement verifies the effectiveness of our designed uncertainty score, which enables the agent to make wiser decisions by taking into account the student model s mastery of the training instances.",
"Instance reward calibration.",
"As shown in Table 6 , when incorporating an instance reward calibration strategy into our RLKD method, a promotive effect across 4 different sets of the teacher-student pairs (RN-56 & RN-20, etc.)",
"is achieved.",
"E.g., our instance temperature calibration strategy boosts the performance of RN-110 & RN-32 pair by 0.55% (73.81% vs 73.26%).",
"We believe the effectiveness of the instance reward calibration strategy lies in its ability to enable the agent to more accurately perceive the rewards resulting from each of its instance temperature adjustment actions, thereby enhancing its capacity to update its policy for performing the action.",
"Efficient exploration.",
"As shown in Table 7 , we conduct ablation experiments on our efficient exploration strategy across 4 teacher-student pairs.",
"The experimental results demonstrate that our effective exploration strategy facilitates performance of the student model across 4 teacher-student pairs.",
"In the experiments involving the RN-56 & RN-20 teacher-student pair, our efficient exploration strategy results in a performance improvement of 0.37% (71.40% vs 71.03%).",
"We attribute this success to the strategy enables the agent to learn valuable instance temperature adjustment policy faster, allowing the student model to acquire more useful knowledge during the early stages of KD.",
"Selection of high-quality training examples.",
"As shown in Table 8 , we conduct experiments on CIFAR-100 to compare different strategies for selecting the high-quality training examples.",
"Interestingly, we observe that when using the top 10% of high-quality training data, the performance of the student model in the teacher-student pair RN-56 & RN-20 is 70.92%, which is not as good as the performance 71.21% of the student model when using the training data ranked from 10% to 20%.",
"This phenomenon is also observed in the teacher-student pair WRN-40-2 & WRN-16-2.",
"We think this may due to utilizing the top 10% samples caused overfitting in the agent.",
"Furthermore, in the teacher-student pair RN-56 & RN-20, when conducting the mix-up method on the training data ranked from 10% to 20% using the training data ranked 40% to 50%, there is a performance increase of 0.19% (71.40% vs 71.21%).",
"The experimental results verify the validity of our mix-up method that combines instances of varying knowledge values can produce high-quality training data.",
""
],
"target_context_ids": [
16,
17,
18,
19,
20
],
"selected_paragraphs": [
"[paragraph id = 16] Selection of high-quality training examples.",
"[paragraph id = 17] As shown in Table 8 , we conduct experiments on CIFAR-100 to compare different strategies for selecting the high-quality training examples.",
"[paragraph id = 18] Interestingly, we observe that when using the top 10% of high-quality training data, the performance of the student model in the teacher-student pair RN-56 & RN-20 is 70.92%, which is not as good as the performance 71.21% of the student model when using the training data ranked from 10% to 20%.",
"[paragraph id = 19] This phenomenon is also observed in the teacher-student pair WRN-40-2 & WRN-16-2.",
"[paragraph id = 20] We think this may due to utilizing the top 10% samples caused overfitting in the agent."
],
"table_html": "
\n
\n
\n\n
\n
Teacher
\n
Student
\n
\n
\n
\n \n
\n
\n \n
\n
\n\n\n
\n
72.34
\n
69.06
\n
70.92
\n
71.21
\n
71.27
\n
71.40
\n
\n
\n
75.61
\n
73.26
\n
75.33
\n
75.57
\n
75.61
\n
75.79
\n
\n\n
\n
\nTable 8: Comparison of different high-quality training sample selection strategies. The teacher-student pairs corresponding to the second and third rows are respectively RN-56 & RN-20 and WRN-40-2 & WRN-16-2. “” denotes the mix-up operation.\n
",
"perturb_sentence_id": 18,
"output": {
"perturbed_statement": "[paragraph id = 18] Interestingly, we observe that when using the top 5% of high-quality training data, the performance of the student model in the teacher-student pair RN-56 & RN-20 is 70.92%, which is not as good as the performance 71.21% of the student model when using the training data ranked from 10% to 20%.This phenomenon is also observed in the teacher-student pair WRN-40-2 & WRN-16-2.",
"perturbed_explanation": "The original explanation states: 1. The performance comparison is based on using the top 10% of data versus data ranked from 10% to 20%. 2. The performance values given are the result of this specific data segmentation. Alteration Impact Analysis: 3. The altered statement incorrectly specifies the top 5% instead of the top 10%. This misrepresentation contradicts the actual observed methodology and data selection criteria, which might lead to a misunderstanding of the experiment's outcomes."
}
}
]