File size: 18,684 Bytes
fd31a8c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
[
    {
        "path": "table_paper/2407.00101v1.json",
        "table_id": "1",
        "section": "7.1",
        "all_context": [
            "Plots 4 and 5 shows the average values of testing accuracy, testing loss, and training loss for five rounds of training from random initialization on the MNIST dataset.",
            "It can be seen clearly that our algorithm maintains the lead in terms of accuracy and loss as compared to both asynchronous and synchronous versions.",
            "The same trend is observed for all the combinations of batch sizes and step sizes.",
            "However, the speed gain by our algorithm over the asynchronous version is not that significant, we believe that MNIST poses a simple optimization problem that does not bring out problems of asynchronous algorithm effectively.",
            "Table 1 shows the difference of the metrics like accuracy and loss between our algorithm and asynchronous algorithm averaged over the entire training interval.",
            "For better performance, the difference in accuracy should be positive and that loss should be negative.",
            "For the next set of experiments, we selected CIFAR-10 as our dataset since we believe that it provides a difficult optimization problem as compared to MNIST.",
            "Table 2 and plots 6 and 7 show similar statistics as that for MNIST.",
            "We can clearly note here that our algorithms show significant speedup as compared to both of the other algorithms.",
            "It is able to achieve higher accuracy and lower loss as compared to asynchronous and synchronous algorithms.",
            "In all the previous experiments, the synchronous algorithm was very slow, and hence for future analysis, only present a comparison between our algorithm and the asynchronous algorithm.",
            ""
        ],
        "target_context_ids": [
            4,
            5
        ],
        "selected_paragraphs": [
            "[paragraph id = 4] Table 1 shows the difference of the metrics like accuracy and loss between our algorithm and asynchronous algorithm averaged over the entire training interval.",
            "[paragraph id = 5] For better performance, the difference in accuracy should be positive and that loss should be negative."
        ],
        "table_html": "<figure class=\"ltx_table\" id=\"S7.T1\">\n<table class=\"ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle\" id=\"S7.T1.1\">\n<thead class=\"ltx_thead\">\n<tr class=\"ltx_tr\" id=\"S7.T1.1.1\">\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_l ltx_border_r ltx_border_t\" id=\"S7.T1.1.1.1\"></th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t\" id=\"S7.T1.1.1.2\">(300,32)</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t\" id=\"S7.T1.1.1.3\">(300,64)</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t\" id=\"S7.T1.1.1.4\">(500,32)</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t\" id=\"S7.T1.1.1.5\">(500,64)</th>\n</tr>\n</thead>\n<tbody class=\"ltx_tbody\">\n<tr class=\"ltx_tr\" id=\"S7.T1.1.2.1\">\n<td class=\"ltx_td ltx_align_center ltx_border_l ltx_border_r ltx_border_t\" id=\"S7.T1.1.2.1.1\">Test Accuracy</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S7.T1.1.2.1.2\">1.374</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S7.T1.1.2.1.3\">-0.516</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S7.T1.1.2.1.4\">1.366</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S7.T1.1.2.1.5\">1.291</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S7.T1.1.3.2\">\n<td class=\"ltx_td ltx_align_center ltx_border_l ltx_border_r ltx_border_t\" id=\"S7.T1.1.3.2.1\">Test loss</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S7.T1.1.3.2.2\">-0.047</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S7.T1.1.3.2.3\">0.001</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S7.T1.1.3.2.4\">-0.053</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S7.T1.1.3.2.5\">-0.022</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S7.T1.1.4.3\">\n<td class=\"ltx_td ltx_align_center ltx_border_b ltx_border_l ltx_border_r ltx_border_t\" id=\"S7.T1.1.4.3.1\">Train loss</td>\n<td class=\"ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t\" id=\"S7.T1.1.4.3.2\">-0.047</td>\n<td class=\"ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t\" id=\"S7.T1.1.4.3.3\">-0.001</td>\n<td class=\"ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t\" id=\"S7.T1.1.4.3.4\">-0.054</td>\n<td class=\"ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t\" id=\"S7.T1.1.4.3.5\">-0.023</td>\n</tr>\n</tbody>\n</table>\n<figcaption class=\"ltx_caption ltx_centering\"><span class=\"ltx_tag ltx_tag_table\">Table 1. </span>Difference between the metric for our algorithm and asynchronous algorithm averaged over entire training interval for MNIST dataset. For better performance, difference in accuracy should be positive and that loss should be negative</figcaption>\n</figure>",
        "perturb_sentence_id": 4,
        "output": {
            "perturbed_statement": "[paragraph id = 4] Table 1 shows the difference of the metrics like accuracy and speed between our algorithm and asynchronous algorithm averaged over the test subset. For better performance, the difference in accuracy should be greater than or equal to zero and speed should be negative.",
            "perturbed_explanation": "1. Initially, the metrics compared were accuracy and loss over the training interval, not accuracy and speed, nor over a limited subset like the test data. 2. While it is correct that higher performance is indicated by positive differences in accuracy, speed is not typically represented as a metric difference and cannot logically be negative under usual interpretations, leading to an incoherent assertion."
        }
    },
    {
        "path": "table_paper/2407.00101v1.json",
        "table_id": "2",
        "section": "7.1",
        "all_context": [
            "Plots 4 and 5 shows the average values of testing accuracy, testing loss, and training loss for five rounds of training from random initialization on the MNIST dataset.",
            "It can be seen clearly that our algorithm maintains the lead in terms of accuracy and loss as compared to both asynchronous and synchronous versions.",
            "The same trend is observed for all the combinations of batch sizes and step sizes.",
            "However, the speed gain by our algorithm over the asynchronous version is not that significant, we believe that MNIST poses a simple optimization problem that does not bring out problems of asynchronous algorithm effectively.",
            "Table 1 shows the difference of the metrics like accuracy and loss between our algorithm and asynchronous algorithm averaged over the entire training interval.",
            "For better performance, the difference in accuracy should be positive and that loss should be negative.",
            "For the next set of experiments, we selected CIFAR-10 as our dataset since we believe that it provides a difficult optimization problem as compared to MNIST.",
            "Table 2 and plots 6 and 7 show similar statistics as that for MNIST.",
            "We can clearly note here that our algorithms show significant speedup as compared to both of the other algorithms.",
            "It is able to achieve higher accuracy and lower loss as compared to asynchronous and synchronous algorithms.",
            "In all the previous experiments, the synchronous algorithm was very slow, and hence for future analysis, only present a comparison between our algorithm and the asynchronous algorithm.",
            ""
        ],
        "target_context_ids": [
            6,
            7,
            8,
            9
        ],
        "selected_paragraphs": [
            "[paragraph id = 6] For the next set of experiments, we selected CIFAR-10 as our dataset since we believe that it provides a difficult optimization problem as compared to MNIST.",
            "[paragraph id = 7] Table 2 and plots 6 and 7 show similar statistics as that for MNIST.",
            "[paragraph id = 8] We can clearly note here that our algorithms show significant speedup as compared to both of the other algorithms.",
            "[paragraph id = 9] It is able to achieve higher accuracy and lower loss as compared to asynchronous and synchronous algorithms."
        ],
        "table_html": "<figure class=\"ltx_table\" id=\"S7.T2\">\n<table class=\"ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle\" id=\"S7.T2.1\">\n<thead class=\"ltx_thead\">\n<tr class=\"ltx_tr\" id=\"S7.T2.1.1\">\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_l ltx_border_r ltx_border_t\" id=\"S7.T2.1.1.1\"></th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t\" id=\"S7.T2.1.1.2\">(300,32)</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t\" id=\"S7.T2.1.1.3\">(300,64)</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t\" id=\"S7.T2.1.1.4\">(500,32)</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t\" id=\"S7.T2.1.1.5\">(500,64)</th>\n</tr>\n</thead>\n<tbody class=\"ltx_tbody\">\n<tr class=\"ltx_tr\" id=\"S7.T2.1.2.1\">\n<td class=\"ltx_td ltx_align_center ltx_border_l ltx_border_r ltx_border_t\" id=\"S7.T2.1.2.1.1\">Test Accuracy</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S7.T2.1.2.1.2\">4.849</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S7.T2.1.2.1.3\">2.435</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S7.T2.1.2.1.4\">3.468</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S7.T2.1.2.1.5\">2.884</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S7.T2.1.3.2\">\n<td class=\"ltx_td ltx_align_center ltx_border_l ltx_border_r ltx_border_t\" id=\"S7.T2.1.3.2.1\">Test loss</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S7.T2.1.3.2.2\">-0.137</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S7.T2.1.3.2.3\">-0.066</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S7.T2.1.3.2.4\">-0.092</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S7.T2.1.3.2.5\">-0.080</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S7.T2.1.4.3\">\n<td class=\"ltx_td ltx_align_center ltx_border_b ltx_border_l ltx_border_r ltx_border_t\" id=\"S7.T2.1.4.3.1\">Train loss</td>\n<td class=\"ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t\" id=\"S7.T2.1.4.3.2\">-0.139</td>\n<td class=\"ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t\" id=\"S7.T2.1.4.3.3\">-0.067</td>\n<td class=\"ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t\" id=\"S7.T2.1.4.3.4\">-0.091</td>\n<td class=\"ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t\" id=\"S7.T2.1.4.3.5\">-0.082</td>\n</tr>\n</tbody>\n</table>\n<figcaption class=\"ltx_caption ltx_centering\"><span class=\"ltx_tag ltx_tag_table\">Table 2. </span>Difference between the metric for our algorithm and asynchronous algorithm averaged over entire training interval for CIFAR-10 dataset. For better performance, difference in accuracy should be positive and that loss should be negative</figcaption>\n</figure>",
        "perturb_sentence_id": 7,
        "output": {
            "perturbed_statement": "[paragraph id = 7] Table 2 and plots 6 and 7 show similar statistics as that for CIFAR-10. We can clearly note here that our algorithms demonstrate comparable performance to both of the other algorithms.",
            "perturbed_explanation": "1. The original explanation highlighted that Table 2 and plots 6 and 7 illustrated a clear speedup of the authors' algorithms relative to the other algorithms. 2. The statement is factually incorrect because it incorrectly asserts that the algorithms exhibit performance comparable to the others, which is contrary to the context indicating a significant speedup."
        }
    },
    {
        "path": "table_paper/2407.00101v1.json",
        "table_id": "3",
        "section": "7.2",
        "all_context": [
            "Further, we wanted to understand how different values of batch sizes affect the efficiency of our approach.",
            "For each of the batch sizes, we executed 5 rounds of training, each with different initialization of the parameters on the randomly generated dataset.",
            "Table 3 shows the difference of the metrics like accuracy and loss between our algorithm and asynchronous algorithm averaged over the entire training interval.",
            "We hypothesized that as the batch size increases, the difference should decrease since asynchronous algorithms start providing updates with high confidence.",
            "This can be also validated by the trend observed in the plot 8 .",
            ""
        ],
        "target_context_ids": [
            0,
            2,
            3
        ],
        "selected_paragraphs": [
            "[paragraph id = 0] Further, we wanted to understand how different values of batch sizes affect the efficiency of our approach.",
            "[paragraph id = 2] Table 3 shows the difference of the metrics like accuracy and loss between our algorithm and asynchronous algorithm averaged over the entire training interval.",
            "[paragraph id = 3] We hypothesized that as the batch size increases, the difference should decrease since asynchronous algorithms start providing updates with high confidence."
        ],
        "table_html": "<figure class=\"ltx_table\" id=\"S7.T3\">\n<table class=\"ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle\" id=\"S7.T3.1\">\n<thead class=\"ltx_thead\">\n<tr class=\"ltx_tr\" id=\"S7.T3.1.1\">\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_l ltx_border_r ltx_border_t\" id=\"S7.T3.1.1.1\"></th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t\" id=\"S7.T3.1.1.2\">8</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t\" id=\"S7.T3.1.1.3\">16</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t\" id=\"S7.T3.1.1.4\">32</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t\" id=\"S7.T3.1.1.5\">64</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t\" id=\"S7.T3.1.1.6\">128</th>\n</tr>\n</thead>\n<tbody class=\"ltx_tbody\">\n<tr class=\"ltx_tr\" id=\"S7.T3.1.2.1\">\n<td class=\"ltx_td ltx_align_center ltx_border_l ltx_border_r ltx_border_t\" id=\"S7.T3.1.2.1.1\">Test Accuracy</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S7.T3.1.2.1.2\">4.896</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S7.T3.1.2.1.3\">5.183</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S7.T3.1.2.1.4\">4.222</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S7.T3.1.2.1.5\">3.304</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S7.T3.1.2.1.6\">2.599</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S7.T3.1.3.2\">\n<td class=\"ltx_td ltx_align_center ltx_border_l ltx_border_r ltx_border_t\" id=\"S7.T3.1.3.2.1\">Test loss</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S7.T3.1.3.2.2\">-0.141</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S7.T3.1.3.2.3\">-0.141</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S7.T3.1.3.2.4\">-0.117</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S7.T3.1.3.2.5\">-0.089</td>\n<td class=\"ltx_td ltx_align_center ltx_border_r ltx_border_t\" id=\"S7.T3.1.3.2.6\">-0.072</td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S7.T3.1.4.3\">\n<td class=\"ltx_td ltx_align_center ltx_border_b ltx_border_l ltx_border_r ltx_border_t\" id=\"S7.T3.1.4.3.1\">Train loss</td>\n<td class=\"ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t\" id=\"S7.T3.1.4.3.2\">-0.143</td>\n<td class=\"ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t\" id=\"S7.T3.1.4.3.3\">-0.141</td>\n<td class=\"ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t\" id=\"S7.T3.1.4.3.4\">-0.114</td>\n<td class=\"ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t\" id=\"S7.T3.1.4.3.5\">-0.088</td>\n<td class=\"ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t\" id=\"S7.T3.1.4.3.6\">-0.068</td>\n</tr>\n</tbody>\n</table>\n<figcaption class=\"ltx_caption ltx_centering\"><span class=\"ltx_tag ltx_tag_table\">Table 3. </span>Difference between the metric for our algorithm and asynchronous algorithm averaged over entire training interval for various batch sizes and constant step size of 500. For better performance, difference in accuracy should be positive and that loss should be negative</figcaption>\n</figure>",
        "perturb_sentence_id": 2,
        "output": {
            "perturbed_statement": "[paragraph id = 2] Table 3 shows the difference in metrics like accuracy and efficiency between our algorithm and the asynchronous algorithm averaged over the entire testing interval. We hypothesized that as the batch size increases, the difference should increase since asynchronous algorithms start providing updates with low certainty.",
            "perturbed_explanation": "The original explanation states two key ideas: (1) the difference in metrics like accuracy and loss between the algorithms are analyzed over the training interval, and (2) it is hypothesized that increasing batch size decreases the difference due to heightened confidence in asynchronous updates. The statement is incorrect because: (3) it incorrectly states the metrics as accuracy and efficiency instead of accuracy and loss, altering their analytical significance, (4) it specifies the interval as testing instead of training, misrepresenting the context's application phase, and (5) it reverses the hypothesized impact of increasing batch size on difference, making the direction inconsistent with the stated rationale."
        }
    }
]