File size: 67,369 Bytes
fd31a8c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0803c45
 
 
 
fd31a8c
0803c45
 
fd31a8c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0803c45
 
 
 
fd31a8c
0803c45
 
fd31a8c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0803c45
 
 
 
fd31a8c
0803c45
 
fd31a8c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0803c45
 
 
 
fd31a8c
0803c45
 
fd31a8c
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
[
    {
        "path": "table_paper/2407.00035v1.json",
        "table_id": "1",
        "section": "2",
        "all_context": [
            "This section presents relevant information on Fog Computing and Observability.",
            "Fog Computing - Fog computing was presented in 2012 [7 ] with the objective of providing computing, storage and network services between end devices and cloud providers, complementing resources when it is not possible to meet the requirements with traditional cloud services.",
            "In recent years, the concept of Fog Computing has been improved both by academia [13 , 38 , 52 , 53 ] and industry [25 , 39 ].",
            "However, due to the lack of consensus on its definition in terms of scope, composition devices, architecture, service models, etc., there are some other similar paradigms, such as Edge Computing [13 ], Mobile Edge Computing (MEC) [14 ], and Mist Computing [43 ] that are frequently confused with fog.",
            "In this work we consider Fog Computing as a broader and more complete concept that can be considered as an umbrella that encompasses all other similar paradigms [10 ].",
            "The architecture most used to represent a Fog Computing environment is composed of three layers: IoT layer, Fog Layer, and Cloud Layer, as presented in Figure 1 .",
            "The IoT layer represents the IoT devices connected at the edge of the network by which the end users can request the services to be processed in the above layers.",
            "The Fog Layer is placed between the IoT and Cloud Layers and provides shared resources that IoT applications can use as needed, such as processing and data storage resources, before data are transferred to the Cloud [3 ].",
            "This layer is made up of nodes, commonly called fog nodes, i.e.",
            "any hardware device that has software and hardware resources with high communication capability[4 ].",
            "Finally, the Cloud Layer is composed of cloud providers services, with more robust computational resources to deliver high-order processing and long-term storage.",
            "A Fog Computing environment is characterised by having a more distributed organisation, heterogeneity of physical devices and networks, and connectivity uncertainty, caused by device mobility, network instabilities, and battery exhaustion [26 ].",
            "This scenario is different from a cloud computing environment, supported by homogeneous resource-rich servers, continuous power supply, and stable redundant network connections.",
            "Observability - Observability is a characteristic of systems that provide information about their internal states by means of external output[30 ].",
            "The higher the observability, the easier it will be to understand the current and past behaviours of the system and actuate over it when needed.",
            "Observability Instrumentation Domains - Observability can be instrumented in a system by generating outputs that inform the internal state of the system at specific points in time.",
            "The different data types that compose the output are named Instrumentation Domains.",
            "Each instrumentation domain contributes to the observability of a system, offering a different perspective on the system.",
            "There is a consensus in the literature that the most important instrumentation domains of observability are metrics, logs, and traces [11 , 24 , 32 ].",
            "Some authors also consider other instrumentation domains such as events[32 ], profiles[11 , 24 ] and crash dumps [11 ], although they are not recognised as such nowadays by most researchers.",
            "As time passes, some of those new domains could be standardised and incorporated into the observability s context.",
            "So we can define as the set of instrumentation domains that are outputed by a system.",
            "Having more instrumentation domains available means a higher level of observability.",
            "Thus, observability is directly related to the cardinality of ID ().",
            "For instance, a Fog monitoring solution that manages only metrics has lower observability than one that manages metrics, logs, and traces simultaneously.",
            "It is feasible to connect the instrumentation domains by the time at which each piece of information was generated.",
            "When it is viable to relate two or more of them in the same analysis, more opportunities for actuation arise.",
            "In addition to the independent value of each domain, there is an additional value in the cross-analysis between domains, due to their synergistic interactions [37 ], i.e., when two or more factors act as causes of a particular outcome.",
            "This effect is popularly known as “The whole is more than the sum of its parts”.",
            "So in addition to determining the observability level of a system by the number of its instrumentation domains, we need to also consider the synergistic interactions between them as well.",
            "Synergistic interactions could be modelled as , where is an operator that filters the data from all available instrumentation domains () and returns the subset of each that matches a specific period of time.",
            "Whenever more than one returns a non-empty subset after applying , the system has a potentially higher observability level for that period of time.",
            "This definition shows that to increase the observability of a system it is important not only to collect information from the instrumentation domains and analyse each data set isolatedly.",
            "It is also relevant to be prepared to learn from their interactions and correlate them.",
            "Metrics, Logs, and Traces - There is a consensus in the literature that Metrics, Logs, and Traces are the most important instrumentation domains.",
            "This study will focus on them from now on.",
            "Metrics are more related to the performance of a system.",
            "They are numerical values collected at a point in time and their collection can be characterised as a time series.",
            "In the motivating scenario, the following metrics are available: percentage of CPU usage, speed of a truck in km/h, throughput of the 5G network in Mbps, amount of video data sent to the Cloud in MB, etc.",
            "Logs are unstructured or semi-structured text files that report relevant events and contextual information, and the instrumentation is usually done at the development time.",
            "Using the motivating scenario, there is information in the logs related to the quality of service of the network connection, the geographic coordinates of the truck, etc.",
            "Traces are records of service calls made by the system.",
            "They allow observations of the call sequence delays, from the beginning to the end of a request.",
            "Trace analysis can show which service calls are taking longer in the response time composition of an application.",
            "They can also show requests that do not finish correctly.",
            "In the motivating scenario, the application performance information (upload throughput) is reported aggregated by suburb to the City Council.",
            "This aggregation took a long time to process due to the high volume (2.5 million measurements per week).",
            "After optimising the code using the point-in-polygon approach [28 ] instead of brute force, the time spent on this operation was reduced to 1% of the original time.",
            "Metrics, logs, and traces carry different types of information, as can be seen in Table 1 .",
            "Each of them contributes to increasing the observability of a system, allowing for complimentary actuation.",
            "Metrics deliver objective information about the external interface of a system, e.g., video upload throughput.",
            "Logs usually provide internal information about failure events, such as specific error messages, exception handling messages, and runtime errors.",
            "This information is necessary to speed up root cause analysis, help the maintenance team improve error treatment, and return the system to a healthy state.",
            "Traces provide details about the internal flow of information.",
            "These data can be visualised as a graph and a critical path can be generated from it, allowing scrutiny of the dependency among the components of a distributed system [44 ].",
            "The volume of data depends on the amount of requests and can be bursty.",
            ""
        ],
        "target_context_ids": [
            36,
            37,
            38,
            39,
            40,
            41,
            42,
            43,
            44,
            45,
            46,
            47,
            48,
            49,
            50
        ],
        "selected_paragraphs": [
            "[paragraph id = 36] Metrics are more related to the performance of a system.",
            "[paragraph id = 37] They are numerical values collected at a point in time and their collection can be characterised as a time series.",
            "[paragraph id = 38] In the motivating scenario, the following metrics are available: percentage of CPU usage, speed of a truck in km/h, throughput of the 5G network in Mbps, amount of video data sent to the Cloud in MB, etc.",
            "[paragraph id = 39] Logs are unstructured or semi-structured text files that report relevant events and contextual information, and the instrumentation is usually done at the development time.",
            "[paragraph id = 40] Using the motivating scenario, there is information in the logs related to the quality of service of the network connection, the geographic coordinates of the truck, etc.",
            "[paragraph id = 41] Traces are records of service calls made by the system.",
            "[paragraph id = 42] They allow observations of the call sequence delays, from the beginning to the end of a request.",
            "[paragraph id = 43] Trace analysis can show which service calls are taking longer in the response time composition of an application.",
            "[paragraph id = 44] They can also show requests that do not finish correctly.",
            "[paragraph id = 45] In the motivating scenario, the application performance information (upload throughput) is reported aggregated by suburb to the City Council.",
            "[paragraph id = 46] This aggregation took a long time to process due to the high volume (2.5 million measurements per week).",
            "[paragraph id = 47] After optimising the code using the point-in-polygon approach [28 ] instead of brute force, the time spent on this operation was reduced to 1% of the original time.",
            "[paragraph id = 48] Metrics, logs, and traces carry different types of information, as can be seen in Table 1 .",
            "[paragraph id = 49] Each of them contributes to increasing the observability of a system, allowing for complimentary actuation.",
            "[paragraph id = 50] Metrics deliver objective information about the external interface of a system, e.g., video upload throughput."
        ],
        "table_html": "<figure class=\"ltx_table\" id=\"S2.T1\">\n<figcaption class=\"ltx_caption\"><span class=\"ltx_tag ltx_tag_table\"><span class=\"ltx_text\" id=\"S2.T1.2.1.1\" style=\"font-size:90%;\">Table 1</span>: </span><span class=\"ltx_text\" id=\"S2.T1.3.2\" style=\"font-size:90%;\">The three most important domains of observability differ in their data characteristics.</span></figcaption>\n<table class=\"ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle\" id=\"S2.T1.4\">\n<thead class=\"ltx_thead\">\n<tr class=\"ltx_tr\" id=\"S2.T1.4.1.1\">\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_column ltx_border_tt\" id=\"S2.T1.4.1.1.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S2.T1.4.1.1.1.1\" style=\"font-size:80%;\">Domain</span></th>\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_column ltx_border_tt\" id=\"S2.T1.4.1.1.2\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S2.T1.4.1.1.2.1\" style=\"font-size:80%;\">Type</span></th>\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_column ltx_border_tt\" id=\"S2.T1.4.1.1.3\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S2.T1.4.1.1.3.1\" style=\"font-size:80%;\">Query</span></th>\n<th class=\"ltx_td ltx_nopad_r ltx_align_left ltx_th ltx_th_column ltx_border_tt\" id=\"S2.T1.4.1.1.4\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S2.T1.4.1.1.4.1\" style=\"font-size:80%;\">Storage</span></th>\n</tr>\n</thead>\n<tbody class=\"ltx_tbody\">\n<tr class=\"ltx_tr\" id=\"S2.T1.4.2.1\">\n<td class=\"ltx_td ltx_align_left ltx_border_t\" id=\"S2.T1.4.2.1.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.2.1.1.1\" style=\"font-size:80%;\">Metric</span></td>\n<td class=\"ltx_td ltx_align_left ltx_border_t\" id=\"S2.T1.4.2.1.2\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.2.1.2.1\" style=\"font-size:80%;\">Numeric</span></td>\n<td class=\"ltx_td ltx_align_left ltx_border_t\" id=\"S2.T1.4.2.1.3\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.2.1.3.1\" style=\"font-size:80%;\">Aggregations</span></td>\n<td class=\"ltx_td ltx_nopad_r ltx_align_left ltx_border_t\" id=\"S2.T1.4.2.1.4\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.2.1.4.1\" style=\"font-size:80%;\">Time Series Database</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S2.T1.4.3.2\">\n<td class=\"ltx_td ltx_align_left\" id=\"S2.T1.4.3.2.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.3.2.1.1\" style=\"font-size:80%;\">Log</span></td>\n<td class=\"ltx_td ltx_align_left\" id=\"S2.T1.4.3.2.2\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.3.2.2.1\" style=\"font-size:80%;\">Semi/not structured strings</span></td>\n<td class=\"ltx_td ltx_align_left\" id=\"S2.T1.4.3.2.3\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.3.2.3.1\" style=\"font-size:80%;\">Approximate string search</span></td>\n<td class=\"ltx_td ltx_nopad_r ltx_align_left\" id=\"S2.T1.4.3.2.4\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.3.2.4.1\" style=\"font-size:80%;\">Inverted Index</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S2.T1.4.4.3\">\n<td class=\"ltx_td ltx_align_left ltx_border_bb\" id=\"S2.T1.4.4.3.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.4.3.1.1\" style=\"font-size:80%;\">Trace</span></td>\n<td class=\"ltx_td ltx_align_left ltx_border_bb\" id=\"S2.T1.4.4.3.2\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.4.3.2.1\" style=\"font-size:80%;\">DAGs of duration of execution</span></td>\n<td class=\"ltx_td ltx_align_left ltx_border_bb\" id=\"S2.T1.4.4.3.3\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.4.3.3.1\" style=\"font-size:80%;\">Disassociated graph search</span></td>\n<td class=\"ltx_td ltx_nopad_r ltx_align_left ltx_border_bb\" id=\"S2.T1.4.4.3.4\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.4.3.4.1\" style=\"font-size:80%;\">Inverted Index</span></td>\n</tr>\n</tbody>\n</table>\n</figure>",
        "perturb_sentence_id": [
            47,
            48
        ],
        "output": {
            "perturbed_statement": "[paragraph id = 47] After optimising the code using the point-in-polygon approach [28] instead of brute force, the time spent on this operation was reduced to 1% of the original time. All information about truck speed, call sequences, and geographic coordinates is stored in metrics.",
            "perturbed_explanation": "1. Metrics, logs, and traces indeed carry different types of information, serving distinct roles in system observability, as suggested by the presence of each in Table 1. For example, metrics are generally numerical measures of system performance, logs are text files for recording events, and traces are records of service call paths. 2. The statement incorrectly claims that all information about truck speed, call sequences, and geographic coordinates is stored in metrics. According to the context, truck speed is a metric, geographic coordinates are found in logs, and call sequences are part of traces. Therefore, the statement inaccurately groups distinct observability data types under metrics, failing to recognize their unique roles."
        }
    },
    {
        "path": "table_paper/2407.00035v1.json",
        "table_id": "1",
        "section": "4.1",
        "all_context": [
            "To obtain valuable information from each instrumentation domain and to increase the observability of an application running in a Fog environment, it is necessary to be aware of the following six-step Observability Data Life Cycle, depicted in Figure 2 : 1.",
            "Collection; 2.",
            "IoT storage; 3.",
            "Transmission of data to the Fog; 4.",
            "Fog storage; 5.",
            "Data analysis and visualisation; 6.Cloud storage and analysis.",
            "The first three Steps make up the Data Collection phase of the life cycle.",
            "The last three Steps form the Data Analysis phase.",
            "Collection - In the initial Step of the fog observability data life cycle, the data are collected.",
            "This can happen in a multitude of ways depending on the instrumentation domain in place.",
            "Metrics can be acquired from the operating system by means of system calls.",
            "Logs are written according to the specific event flow that was instrumented to be recorded in text.",
            "When previously instrumented, traces can be created by specific API calls that record the sequence and delay of each service call.",
            "IoT Storage - data staging in the device awaiting transmission - Observability data are usually immutable and append-heavy [31 ].",
            "In order to avoid running out of storage resources, a data removal policy should be in place.",
            "The period of time that a device can handle stored observability data will depend on several factors, such as data footprint by period of time, the frequency of generation, and the available storage space reserved for the system.",
            "Although metrics can be stable in terms of data volume, logs and traces have greater variability [32 ].",
            "Data transmission to Fog - Observability may allow timely and proper decision making.",
            "Although it is possible to make some minor decisions locally using a single device, critical decisions are expected to be made using a process that can assess a higher volume of data that came from different subcomponents of the system, granting a more comprehensive view of the system.",
            "Therefore, the data collected from the IoT layer should be transmitted to the Fog Layer, where a resource-richer node will store them and allow for a more comprehensive data analysis.",
            "The network connections used by the application to receive and respond to user requests may be the same as those used by the observability data flow.",
            "An adaptive process may be in place to define the amount of data that can be transferred from the devices, selecting which instrumentation domains will be included in each transmission, and the period of time to which the collected data will refer.",
            "Fog Storage - Specialised pre-processing and storage according to the type of data and usage - Fog nodes are expected to be resource richer compared to IoT devices [4 ].",
            "Due to this, it is on the Fog Layer where observability data from several IoT devices are stored with the aim of rapid actuation and decision-making.",
            "The metrics should be stored in a time series database (TSDB).",
            "However, logs and traces are structured differently and will benefit from other storage solutions, such as inverted index-based storage, due to the type of queries that are usually made to retrieve meaningful information from them [32 ].",
            "Therefore, an observability data ingestion service on the fog should consider the data requirements that each instrumentation domain needs (see Table 1 ), while allowing cross-analysis to be performed.",
            "Data analysis and visualisation for decision making - Once the observability data are available on the Fog, it is possible to query them and make decisions and actuations accordingly.",
            "Observability data tend to give more relevant answers when they are queried as soon as they arrive, which means that most queries and analysis use more recent data (less than 24 hours) [31 ].",
            "Thus, it is important to guarantee fast access to this time window data.",
            "In addition to that, to save resources to continue receiving IoT data, it is important to provide automated mechanisms to send the data out of this range to long-term storage in the Cloud.",
            "Cloud Storage - Long-term storage and historical analysis - Cloud is the appropriate environment to store large data volumes and run heavy data processing models, such as historical analysis of observability data [23 ].",
            ""
        ],
        "target_context_ids": [
            25
        ],
        "selected_paragraphs": [
            "[paragraph id = 25] However, logs and traces are structured differently and will benefit from other storage solutions, such as inverted index-based storage, due to the type of queries that are usually made to retrieve meaningful information from them [32 ]."
        ],
        "table_html": "<figure class=\"ltx_table\" id=\"S2.T1\">\n<figcaption class=\"ltx_caption\"><span class=\"ltx_tag ltx_tag_table\"><span class=\"ltx_text\" id=\"S2.T1.2.1.1\" style=\"font-size:90%;\">Table 1</span>: </span><span class=\"ltx_text\" id=\"S2.T1.3.2\" style=\"font-size:90%;\">The three most important domains of observability differ in their data characteristics.</span></figcaption>\n<table class=\"ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle\" id=\"S2.T1.4\">\n<thead class=\"ltx_thead\">\n<tr class=\"ltx_tr\" id=\"S2.T1.4.1.1\">\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_column ltx_border_tt\" id=\"S2.T1.4.1.1.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S2.T1.4.1.1.1.1\" style=\"font-size:80%;\">Domain</span></th>\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_column ltx_border_tt\" id=\"S2.T1.4.1.1.2\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S2.T1.4.1.1.2.1\" style=\"font-size:80%;\">Type</span></th>\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_column ltx_border_tt\" id=\"S2.T1.4.1.1.3\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S2.T1.4.1.1.3.1\" style=\"font-size:80%;\">Query</span></th>\n<th class=\"ltx_td ltx_nopad_r ltx_align_left ltx_th ltx_th_column ltx_border_tt\" id=\"S2.T1.4.1.1.4\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S2.T1.4.1.1.4.1\" style=\"font-size:80%;\">Storage</span></th>\n</tr>\n</thead>\n<tbody class=\"ltx_tbody\">\n<tr class=\"ltx_tr\" id=\"S2.T1.4.2.1\">\n<td class=\"ltx_td ltx_align_left ltx_border_t\" id=\"S2.T1.4.2.1.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.2.1.1.1\" style=\"font-size:80%;\">Metric</span></td>\n<td class=\"ltx_td ltx_align_left ltx_border_t\" id=\"S2.T1.4.2.1.2\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.2.1.2.1\" style=\"font-size:80%;\">Numeric</span></td>\n<td class=\"ltx_td ltx_align_left ltx_border_t\" id=\"S2.T1.4.2.1.3\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.2.1.3.1\" style=\"font-size:80%;\">Aggregations</span></td>\n<td class=\"ltx_td ltx_nopad_r ltx_align_left ltx_border_t\" id=\"S2.T1.4.2.1.4\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.2.1.4.1\" style=\"font-size:80%;\">Time Series Database</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S2.T1.4.3.2\">\n<td class=\"ltx_td ltx_align_left\" id=\"S2.T1.4.3.2.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.3.2.1.1\" style=\"font-size:80%;\">Log</span></td>\n<td class=\"ltx_td ltx_align_left\" id=\"S2.T1.4.3.2.2\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.3.2.2.1\" style=\"font-size:80%;\">Semi/not structured strings</span></td>\n<td class=\"ltx_td ltx_align_left\" id=\"S2.T1.4.3.2.3\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.3.2.3.1\" style=\"font-size:80%;\">Approximate string search</span></td>\n<td class=\"ltx_td ltx_nopad_r ltx_align_left\" id=\"S2.T1.4.3.2.4\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.3.2.4.1\" style=\"font-size:80%;\">Inverted Index</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S2.T1.4.4.3\">\n<td class=\"ltx_td ltx_align_left ltx_border_bb\" id=\"S2.T1.4.4.3.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.4.3.1.1\" style=\"font-size:80%;\">Trace</span></td>\n<td class=\"ltx_td ltx_align_left ltx_border_bb\" id=\"S2.T1.4.4.3.2\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.4.3.2.1\" style=\"font-size:80%;\">DAGs of duration of execution</span></td>\n<td class=\"ltx_td ltx_align_left ltx_border_bb\" id=\"S2.T1.4.4.3.3\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.4.3.3.1\" style=\"font-size:80%;\">Disassociated graph search</span></td>\n<td class=\"ltx_td ltx_nopad_r ltx_align_left ltx_border_bb\" id=\"S2.T1.4.4.3.4\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.4.3.4.1\" style=\"font-size:80%;\">Inverted Index</span></td>\n</tr>\n</tbody>\n</table>\n</figure>",
        "perturb_sentence_id": [
            25,
            26
        ],
        "output": {
            "perturbed_statement": "However, logs and traces are structured the same way and will benefit from relational database storage, contrary to the type of queries that are usually made to retrieve meaningful information from them [paragraph id = 25]. Therefore, an observability data ingestion service on the fog should consider the data requirements that each instrumentation domain needs (see Table 1), while allowing cross-analysis to be performed.",
            "perturbed_explanation": "Logs and traces are structured differently and will benefit from other storage solutions, such as inverted index-based storage, due to the type of queries that are usually made to retrieve meaningful information from them. 1. The statement incorrectly claims that logs and traces are structured the same way, which suggests they do not require different storage solutions, contradicting the necessity for tailored storage solutions like inverted index-based storage that cater to their specific structures. 2. The statement wrongly suggests that relational database storage is beneficial for logs and traces, while in reality, such storage may not efficiently handle complex queries often needed for these types of data."
        }
    },
    {
        "path": "table_paper/2407.00035v1.json",
        "table_id": "1",
        "section": "6.4",
        "all_context": [
            "Table 4 presents the volume of observability data that were managed by the data life cycle during the experiments.",
            "NodeExporter was deployed with the default configuration.",
            "Although it is a tool with a small footprint in terms of CPU and memory usage [22 ], it may have a not negligible impact in terms of the volume of data it collects.",
            "The default set of metrics that it exposes accounts for 65KB of information.",
            "These data are presented only when Prometheus pulls them using an HTTP call.",
            "This means that there is no IoT storage for these data.",
            "Prometheus is configured by default to scrape the NodeExporter page every 5s, getting all the metrics exposed and storing them in its TSDB on the Fog node.",
            "Considering that there are four IoT devices exposing metrics, the data volume transmitted to and stored on the Fog node is about 8.75GB in the span of a week, the period when these data will be available for decision-making and other analysis on the Fog Layer.",
            "After reaching a week of age, the information is removed from the fog node and sent to the Cloud for long-term storage and historical analysis.",
            "As a matter of estimation, the volume on the Cloud will reach 75GB after 2 months of operation.",
            "The default output from NodeExporter provides help text for each metric, as shown in Figure 6 .",
            "This information accounts for at least 20% of the total output footprint and should be removed prior to exposing the metrics.",
            "The default set of metrics is very extensive and probably not all metrics are useful for every use case.",
            "For example, the node exporter exposes dozens of Go environment metrics (Figure 6 ) that are not of interest for the monitoring of Mobile IoT- RoadBot and should be removed.",
            "In addition to cutting off metrics that are not of interest, machine learning over historical data can be used to figure out metric correlations and keep the target metric set at minimum [5 ].",
            "Furthermore, the frequency of scraping can be decreased in the Prometheus configuration without a relevant loss of opportunities for actuation.",
            "Increasing the scrap delay to 10 seconds will reduce the data volume transmitted to and stored on the Fog Layer by half.",
            "Using the strategies of removing the help text and changing the configuration of Node Exporter to expose only metrics about CPU, memory, disk, network, and power supply, and increasing the scrap delay to 10 seconds on Prometheus, we could reduce the volume of metric data on the Fog node by 87%, which also positively affected CPU and memory usage by Prometheus.",
            "Regarding the logs generated by Mobile IoT-Roadbot while the trucks were moving around the city, they record information about 5G network analysis, such as latency and throughput, and contextual information (GNSS coordinates, truck speed, etc.).",
            "Although the application writes information in the logs every second, the volume of data written is low (0.67 GB, in the span of a week, as seen in Table 4 ), being smaller than the volume of data generated by Node Exporter after applying volume reduction strategies.",
            "Filebeat was configured to harvest only the logs written by Mobile IoT-RoadBot and transmit them to the Fog Layer.",
            "It was necessary to change the default configuration of Filebeat to turn off the auto-discovery feature.",
            "When this feature is active, Filebeat receives from the Docker manager every status change of any container on the device, consuming more memory than necessary.",
            "As the application was not originally instrumented to record trace calls, we instrumented it in a reporting feature, utilised to aggregate 5G data by the suburbs of Brimbank.",
            "To make this aggregation, geographic coordinates were used to find the full address of Australia using a service called MapBox [35 ].",
            "Using the data footprint of these traces, we estimate the data volume to generate the traces of regular operation of the Mobile IoT-RoadBot.",
            "This use case does not have bursty behaviour in terms of request processing because it performs the same volume of operations while in service.",
            "Therefore, the volume of trace data is steady.",
            "The aggregated data volume, collected by the four IoT devices, transmitted and stored on the fog node for a period of one week was approximately 10GB, considering the default configuration of the open source tools used as shown in Table 4 .",
            "Using the strategies described above, the volume of aggregated data was reduced to 2GB, a reduction of 80%.",
            "The four trucks whose observability data are replayed by the IoT devices in this experiment transmitted 291 GB of video data using the 5G network in a week of real world operation.",
            "Therefore, the observability data (2GB) would represent an overhead of less than 1% in this use case.",
            "The experiments show that it is possible to collect the benefits of achieving a higher level of observability for a system in a Fog computing environment.",
            "In addition, the overhead of deploying an observability data life cycle can be low, if properly managed.",
            "The utilisation of Docker containers as the runtime environment for the observability tools help to address the Fog challenge of device heterogeneity.",
            "Due to the resource restriction of IoT devices, observability data collection should be done by lightweight agents.",
            "In addition, the data footprint should be minimised to reduce the risk of network congestion and increased overhead to collect and transmit the data to the Fog Layer.",
            "Each instrumentation domain has specific data requirements (Table 1 ) that must be met to optimise storage and minimise the average delay in analysing the observability data in the Fog Layer for decision-making and actuation.",
            "Leaving in the Fog Layer only a window of most recent observability data is another strategy to cope with the resource-restriction of fog nodes.",
            "Data that are outside the age range are sent to the Cloud for long-term storage and historical analysis.",
            "The open source tools selected to make up the experimental setup are managed independently.",
            "This scenario makes more complex actuation difficult to implement.",
            "For instance, in the dynamic environment of Fog Computing, a system may present errors running on specific devices while it is functioning properly on others.",
            "In such cases, if there are not enough resources to transmit all observability data to the Fog Layer, a proper decision should be prioritising data from those specific devices and returning to regular operation when the issue is solved.",
            "To implement such adaptive and autonomous behaviour, it might be necessary to orchestrate the observability data life cycle and its agents.",
            "To our knowledge, there is no Fog solution in the literature that provides this functionality [12 ].",
            ""
        ],
        "target_context_ids": [
            32
        ],
        "selected_paragraphs": [
            "[paragraph id = 32] The experiments show that it is possible to collect the benefits of achieving a higher level of observability for a system in a Fog computing environment."
        ],
        "table_html": "<figure class=\"ltx_table\" id=\"S2.T1\">\n<figcaption class=\"ltx_caption\"><span class=\"ltx_tag ltx_tag_table\"><span class=\"ltx_text\" id=\"S2.T1.2.1.1\" style=\"font-size:90%;\">Table 1</span>: </span><span class=\"ltx_text\" id=\"S2.T1.3.2\" style=\"font-size:90%;\">The three most important domains of observability differ in their data characteristics.</span></figcaption>\n<table class=\"ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle\" id=\"S2.T1.4\">\n<thead class=\"ltx_thead\">\n<tr class=\"ltx_tr\" id=\"S2.T1.4.1.1\">\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_column ltx_border_tt\" id=\"S2.T1.4.1.1.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S2.T1.4.1.1.1.1\" style=\"font-size:80%;\">Domain</span></th>\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_column ltx_border_tt\" id=\"S2.T1.4.1.1.2\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S2.T1.4.1.1.2.1\" style=\"font-size:80%;\">Type</span></th>\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_column ltx_border_tt\" id=\"S2.T1.4.1.1.3\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S2.T1.4.1.1.3.1\" style=\"font-size:80%;\">Query</span></th>\n<th class=\"ltx_td ltx_nopad_r ltx_align_left ltx_th ltx_th_column ltx_border_tt\" id=\"S2.T1.4.1.1.4\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S2.T1.4.1.1.4.1\" style=\"font-size:80%;\">Storage</span></th>\n</tr>\n</thead>\n<tbody class=\"ltx_tbody\">\n<tr class=\"ltx_tr\" id=\"S2.T1.4.2.1\">\n<td class=\"ltx_td ltx_align_left ltx_border_t\" id=\"S2.T1.4.2.1.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.2.1.1.1\" style=\"font-size:80%;\">Metric</span></td>\n<td class=\"ltx_td ltx_align_left ltx_border_t\" id=\"S2.T1.4.2.1.2\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.2.1.2.1\" style=\"font-size:80%;\">Numeric</span></td>\n<td class=\"ltx_td ltx_align_left ltx_border_t\" id=\"S2.T1.4.2.1.3\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.2.1.3.1\" style=\"font-size:80%;\">Aggregations</span></td>\n<td class=\"ltx_td ltx_nopad_r ltx_align_left ltx_border_t\" id=\"S2.T1.4.2.1.4\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.2.1.4.1\" style=\"font-size:80%;\">Time Series Database</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S2.T1.4.3.2\">\n<td class=\"ltx_td ltx_align_left\" id=\"S2.T1.4.3.2.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.3.2.1.1\" style=\"font-size:80%;\">Log</span></td>\n<td class=\"ltx_td ltx_align_left\" id=\"S2.T1.4.3.2.2\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.3.2.2.1\" style=\"font-size:80%;\">Semi/not structured strings</span></td>\n<td class=\"ltx_td ltx_align_left\" id=\"S2.T1.4.3.2.3\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.3.2.3.1\" style=\"font-size:80%;\">Approximate string search</span></td>\n<td class=\"ltx_td ltx_nopad_r ltx_align_left\" id=\"S2.T1.4.3.2.4\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.3.2.4.1\" style=\"font-size:80%;\">Inverted Index</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S2.T1.4.4.3\">\n<td class=\"ltx_td ltx_align_left ltx_border_bb\" id=\"S2.T1.4.4.3.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.4.3.1.1\" style=\"font-size:80%;\">Trace</span></td>\n<td class=\"ltx_td ltx_align_left ltx_border_bb\" id=\"S2.T1.4.4.3.2\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.4.3.2.1\" style=\"font-size:80%;\">DAGs of duration of execution</span></td>\n<td class=\"ltx_td ltx_align_left ltx_border_bb\" id=\"S2.T1.4.4.3.3\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.4.3.3.1\" style=\"font-size:80%;\">Disassociated graph search</span></td>\n<td class=\"ltx_td ltx_nopad_r ltx_align_left ltx_border_bb\" id=\"S2.T1.4.4.3.4\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S2.T1.4.4.3.4.1\" style=\"font-size:80%;\">Inverted Index</span></td>\n</tr>\n</tbody>\n</table>\n</figure>",
        "perturb_sentence_id": [
            32,
            33
        ],
        "output": {
            "perturbed_statement": "[paragraph id = 32] The experiments show that it is impossible to collect the benefits of achieving a higher level of observability for a system in a Fog computing environment. In addition, the overhead of deploying an observability data life cycle can be high, if properly managed.",
            "perturbed_explanation": "1. The original explanation states that it is possible to collect the benefits of achieving a higher level of observability for a system in a Fog computing environment, and that the overhead of deploying an observability data life cycle can be low if properly managed. 2. The statement is incorrect because it claims that it is impossible to collect the benefits of achieving higher observability in a Fog computing environment, which contradicts the fact that the experiments show it is indeed possible to gain such benefits. Additionally, it incorrectly states that the overhead can be high, whereas the original statement indicates that, with proper management, the overhead can be low. These alterations change the implications of the experiments and the potential benefits of observability in Fog computing."
        }
    },
    {
        "path": "table_paper/2407.00035v1.json",
        "table_id": "4",
        "section": "6.4",
        "all_context": [
            "Table 4 presents the volume of observability data that were managed by the data life cycle during the experiments.",
            "NodeExporter was deployed with the default configuration.",
            "Although it is a tool with a small footprint in terms of CPU and memory usage [22 ], it may have a not negligible impact in terms of the volume of data it collects.",
            "The default set of metrics that it exposes accounts for 65KB of information.",
            "These data are presented only when Prometheus pulls them using an HTTP call.",
            "This means that there is no IoT storage for these data.",
            "Prometheus is configured by default to scrape the NodeExporter page every 5s, getting all the metrics exposed and storing them in its TSDB on the Fog node.",
            "Considering that there are four IoT devices exposing metrics, the data volume transmitted to and stored on the Fog node is about 8.75GB in the span of a week, the period when these data will be available for decision-making and other analysis on the Fog Layer.",
            "After reaching a week of age, the information is removed from the fog node and sent to the Cloud for long-term storage and historical analysis.",
            "As a matter of estimation, the volume on the Cloud will reach 75GB after 2 months of operation.",
            "The default output from NodeExporter provides help text for each metric, as shown in Figure 6 .",
            "This information accounts for at least 20% of the total output footprint and should be removed prior to exposing the metrics.",
            "The default set of metrics is very extensive and probably not all metrics are useful for every use case.",
            "For example, the node exporter exposes dozens of Go environment metrics (Figure 6 ) that are not of interest for the monitoring of Mobile IoT- RoadBot and should be removed.",
            "In addition to cutting off metrics that are not of interest, machine learning over historical data can be used to figure out metric correlations and keep the target metric set at minimum [5 ].",
            "Furthermore, the frequency of scraping can be decreased in the Prometheus configuration without a relevant loss of opportunities for actuation.",
            "Increasing the scrap delay to 10 seconds will reduce the data volume transmitted to and stored on the Fog Layer by half.",
            "Using the strategies of removing the help text and changing the configuration of Node Exporter to expose only metrics about CPU, memory, disk, network, and power supply, and increasing the scrap delay to 10 seconds on Prometheus, we could reduce the volume of metric data on the Fog node by 87%, which also positively affected CPU and memory usage by Prometheus.",
            "Regarding the logs generated by Mobile IoT-Roadbot while the trucks were moving around the city, they record information about 5G network analysis, such as latency and throughput, and contextual information (GNSS coordinates, truck speed, etc.).",
            "Although the application writes information in the logs every second, the volume of data written is low (0.67 GB, in the span of a week, as seen in Table 4 ), being smaller than the volume of data generated by Node Exporter after applying volume reduction strategies.",
            "Filebeat was configured to harvest only the logs written by Mobile IoT-RoadBot and transmit them to the Fog Layer.",
            "It was necessary to change the default configuration of Filebeat to turn off the auto-discovery feature.",
            "When this feature is active, Filebeat receives from the Docker manager every status change of any container on the device, consuming more memory than necessary.",
            "As the application was not originally instrumented to record trace calls, we instrumented it in a reporting feature, utilised to aggregate 5G data by the suburbs of Brimbank.",
            "To make this aggregation, geographic coordinates were used to find the full address of Australia using a service called MapBox [35 ].",
            "Using the data footprint of these traces, we estimate the data volume to generate the traces of regular operation of the Mobile IoT-RoadBot.",
            "This use case does not have bursty behaviour in terms of request processing because it performs the same volume of operations while in service.",
            "Therefore, the volume of trace data is steady.",
            "The aggregated data volume, collected by the four IoT devices, transmitted and stored on the fog node for a period of one week was approximately 10GB, considering the default configuration of the open source tools used as shown in Table 4 .",
            "Using the strategies described above, the volume of aggregated data was reduced to 2GB, a reduction of 80%.",
            "The four trucks whose observability data are replayed by the IoT devices in this experiment transmitted 291 GB of video data using the 5G network in a week of real world operation.",
            "Therefore, the observability data (2GB) would represent an overhead of less than 1% in this use case.",
            "The experiments show that it is possible to collect the benefits of achieving a higher level of observability for a system in a Fog computing environment.",
            "In addition, the overhead of deploying an observability data life cycle can be low, if properly managed.",
            "The utilisation of Docker containers as the runtime environment for the observability tools help to address the Fog challenge of device heterogeneity.",
            "Due to the resource restriction of IoT devices, observability data collection should be done by lightweight agents.",
            "In addition, the data footprint should be minimised to reduce the risk of network congestion and increased overhead to collect and transmit the data to the Fog Layer.",
            "Each instrumentation domain has specific data requirements (Table 1 ) that must be met to optimise storage and minimise the average delay in analysing the observability data in the Fog Layer for decision-making and actuation.",
            "Leaving in the Fog Layer only a window of most recent observability data is another strategy to cope with the resource-restriction of fog nodes.",
            "Data that are outside the age range are sent to the Cloud for long-term storage and historical analysis.",
            "The open source tools selected to make up the experimental setup are managed independently.",
            "This scenario makes more complex actuation difficult to implement.",
            "For instance, in the dynamic environment of Fog Computing, a system may present errors running on specific devices while it is functioning properly on others.",
            "In such cases, if there are not enough resources to transmit all observability data to the Fog Layer, a proper decision should be prioritising data from those specific devices and returning to regular operation when the issue is solved.",
            "To implement such adaptive and autonomous behaviour, it might be necessary to orchestrate the observability data life cycle and its agents.",
            "To our knowledge, there is no Fog solution in the literature that provides this functionality [12 ].",
            ""
        ],
        "target_context_ids": [
            0,
            7,
            17,
            26,
            27,
            32
        ],
        "selected_paragraphs": [
            "[paragraph id = 0] Table 4 presents the volume of observability data that were managed by the data life cycle during the experiments.",
            "[paragraph id = 7] Considering that there are four IoT devices exposing metrics, the data volume transmitted to and stored on the Fog node is about 8.75GB in the span of a week, the period when these data will be available for decision-making and other analysis on the Fog Layer.",
            "[paragraph id = 17] Using the strategies of removing the help text and changing the configuration of Node Exporter to expose only metrics about CPU, memory, disk, network, and power supply, and increasing the scrap delay to 10 seconds on Prometheus, we could reduce the volume of metric data on the Fog node by 87%, which also positively affected CPU and memory usage by Prometheus.",
            "[paragraph id = 26] This use case does not have bursty behaviour in terms of request processing because it performs the same volume of operations while in service.",
            "[paragraph id = 27] Therefore, the volume of trace data is steady.",
            "[paragraph id = 32] The experiments show that it is possible to collect the benefits of achieving a higher level of observability for a system in a Fog computing environment."
        ],
        "table_html": "<figure class=\"ltx_table\" id=\"S6.T4\">\n<figcaption class=\"ltx_caption ltx_centering\"><span class=\"ltx_tag ltx_tag_table\"><span class=\"ltx_text\" id=\"S6.T4.2.1.1\" style=\"font-size:90%;\">Table 4</span>: </span><span class=\"ltx_text\" id=\"S6.T4.3.2\" style=\"font-size:90%;\">Mobile IoT-Roadbot assessment of each observability domain .</span></figcaption>\n<table class=\"ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle\" id=\"S6.T4.4\">\n<thead class=\"ltx_thead\">\n<tr class=\"ltx_tr\" id=\"S6.T4.4.1.1\">\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_tt\" id=\"S6.T4.4.1.1.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S6.T4.4.1.1.1.1\" style=\"font-size:80%;\">Tool</span></th>\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_tt\" id=\"S6.T4.4.1.1.2\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S6.T4.4.1.1.2.1\" style=\"font-size:80%;\">Domain</span></th>\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_tt\" id=\"S6.T4.4.1.1.3\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S6.T4.4.1.1.3.1\">\n<tr class=\"ltx_tr\" id=\"S6.T4.4.1.1.3.1.1\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_left\" id=\"S6.T4.4.1.1.3.1.1.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S6.T4.4.1.1.3.1.1.1.1\" style=\"font-size:80%;\">Data</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S6.T4.4.1.1.3.1.2\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_left\" id=\"S6.T4.4.1.1.3.1.2.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S6.T4.4.1.1.3.1.2.1.1\" style=\"font-size:80%;\">Collection</span></td>\n</tr>\n</table>\n</th>\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_tt\" id=\"S6.T4.4.1.1.4\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S6.T4.4.1.1.4.1\" style=\"font-size:80%;\">Frequency</span></th>\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_column ltx_border_tt\" id=\"S6.T4.4.1.1.5\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S6.T4.4.1.1.5.1\">\n<tr class=\"ltx_tr\" id=\"S6.T4.4.1.1.5.1.1\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_left\" id=\"S6.T4.4.1.1.5.1.1.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S6.T4.4.1.1.5.1.1.1.1\" style=\"font-size:80%;\">Volume</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S6.T4.4.1.1.5.1.2\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_left\" id=\"S6.T4.4.1.1.5.1.2.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S6.T4.4.1.1.5.1.2.1.1\" style=\"font-size:80%;\">by Hour</span></td>\n</tr>\n</table>\n</th>\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_column ltx_border_tt\" id=\"S6.T4.4.1.1.6\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S6.T4.4.1.1.6.1\">\n<tr class=\"ltx_tr\" id=\"S6.T4.4.1.1.6.1.1\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_left\" id=\"S6.T4.4.1.1.6.1.1.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S6.T4.4.1.1.6.1.1.1.1\" style=\"font-size:80%;\">IoT</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S6.T4.4.1.1.6.1.2\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_left\" id=\"S6.T4.4.1.1.6.1.2.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S6.T4.4.1.1.6.1.2.1.1\" style=\"font-size:80%;\">Storage</span></td>\n</tr>\n</table>\n</th>\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_column ltx_border_tt\" id=\"S6.T4.4.1.1.7\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S6.T4.4.1.1.7.1\">\n<tr class=\"ltx_tr\" id=\"S6.T4.4.1.1.7.1.1\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_left\" id=\"S6.T4.4.1.1.7.1.1.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S6.T4.4.1.1.7.1.1.1.1\" style=\"font-size:80%;\">Fog</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S6.T4.4.1.1.7.1.2\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_left\" id=\"S6.T4.4.1.1.7.1.2.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S6.T4.4.1.1.7.1.2.1.1\" style=\"font-size:80%;\">Storage</span></td>\n</tr>\n</table>\n</th>\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_column ltx_border_tt\" id=\"S6.T4.4.1.1.8\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S6.T4.4.1.1.8.1\">\n<tr class=\"ltx_tr\" id=\"S6.T4.4.1.1.8.1.1\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_left\" id=\"S6.T4.4.1.1.8.1.1.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S6.T4.4.1.1.8.1.1.1.1\" style=\"font-size:80%;\">Fog Volume</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S6.T4.4.1.1.8.1.2\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_left\" id=\"S6.T4.4.1.1.8.1.2.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S6.T4.4.1.1.8.1.2.1.1\" style=\"font-size:80%;\">(1 week)</span></td>\n</tr>\n</table>\n</th>\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_column ltx_border_tt\" id=\"S6.T4.4.1.1.9\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S6.T4.4.1.1.9.1\">\n<tr class=\"ltx_tr\" id=\"S6.T4.4.1.1.9.1.1\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_left\" id=\"S6.T4.4.1.1.9.1.1.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S6.T4.4.1.1.9.1.1.1.1\" style=\"font-size:80%;\">Cloud</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S6.T4.4.1.1.9.1.2\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_left\" id=\"S6.T4.4.1.1.9.1.2.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S6.T4.4.1.1.9.1.2.1.1\" style=\"font-size:80%;\">Storage</span></td>\n</tr>\n</table>\n</th>\n<th class=\"ltx_td ltx_nopad_r ltx_align_left ltx_th ltx_th_column ltx_border_tt\" id=\"S6.T4.4.1.1.10\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S6.T4.4.1.1.10.1\">\n<tr class=\"ltx_tr\" id=\"S6.T4.4.1.1.10.1.1\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_left\" id=\"S6.T4.4.1.1.10.1.1.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S6.T4.4.1.1.10.1.1.1.1\" style=\"font-size:80%;\">Cloud Vol.</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S6.T4.4.1.1.10.1.2\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_left\" id=\"S6.T4.4.1.1.10.1.2.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text ltx_font_bold\" id=\"S6.T4.4.1.1.10.1.2.1.1\" style=\"font-size:80%;\">(2 months)</span></td>\n</tr>\n</table>\n</th>\n</tr>\n</thead>\n<tbody class=\"ltx_tbody\">\n<tr class=\"ltx_tr\" id=\"S6.T4.4.2.1\">\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t\" id=\"S6.T4.4.2.1.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S6.T4.4.2.1.1.1\">\n<tr class=\"ltx_tr\" id=\"S6.T4.4.2.1.1.1.1\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_left\" id=\"S6.T4.4.2.1.1.1.1.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.2.1.1.1.1.1.1\" style=\"font-size:80%;\">Node Exporter</span></td>\n</tr>\n</table>\n</th>\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t\" id=\"S6.T4.4.2.1.2\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.2.1.2.1\" style=\"font-size:80%;\">Metrics</span></th>\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t\" id=\"S6.T4.4.2.1.3\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.2.1.3.1\" style=\"font-size:80%;\">65KB</span></th>\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t\" id=\"S6.T4.4.2.1.4\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.2.1.4.1\" style=\"font-size:80%;\">each 5s</span></th>\n<td class=\"ltx_td ltx_align_left ltx_border_t\" id=\"S6.T4.4.2.1.5\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.2.1.5.1\" style=\"font-size:80%;\">46 MB</span></td>\n<td class=\"ltx_td ltx_align_left ltx_border_t\" id=\"S6.T4.4.2.1.6\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.2.1.6.1\" style=\"font-size:80%;\">No</span></td>\n<td class=\"ltx_td ltx_align_left ltx_border_t\" id=\"S6.T4.4.2.1.7\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.2.1.7.1\" style=\"font-size:80%;\">Yes</span></td>\n<td class=\"ltx_td ltx_align_left ltx_border_t\" id=\"S6.T4.4.2.1.8\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.2.1.8.1\" style=\"font-size:80%;\">8.75 GB</span></td>\n<td class=\"ltx_td ltx_align_left ltx_border_t\" id=\"S6.T4.4.2.1.9\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.2.1.9.1\" style=\"font-size:80%;\">Yes</span></td>\n<td class=\"ltx_td ltx_nopad_r ltx_align_left ltx_border_t\" id=\"S6.T4.4.2.1.10\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.2.1.10.1\" style=\"font-size:80%;\">75 GB</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S6.T4.4.3.2\">\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_row\" id=\"S6.T4.4.3.2.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.3.2.1.1\" style=\"font-size:80%;\">Filebeat</span></th>\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_row\" id=\"S6.T4.4.3.2.2\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.3.2.2.1\" style=\"font-size:80%;\">Logs</span></th>\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_row\" id=\"S6.T4.4.3.2.3\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.3.2.3.1\" style=\"font-size:80%;\">1KB</span></th>\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_row\" id=\"S6.T4.4.3.2.4\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.3.2.4.1\" style=\"font-size:80%;\">each 1s</span></th>\n<td class=\"ltx_td ltx_align_left\" id=\"S6.T4.4.3.2.5\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.3.2.5.1\" style=\"font-size:80%;\">3.50 MB</span></td>\n<td class=\"ltx_td ltx_align_left\" id=\"S6.T4.4.3.2.6\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.3.2.6.1\" style=\"font-size:80%;\">Yes</span></td>\n<td class=\"ltx_td ltx_align_left\" id=\"S6.T4.4.3.2.7\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.3.2.7.1\" style=\"font-size:80%;\">Yes</span></td>\n<td class=\"ltx_td ltx_align_left\" id=\"S6.T4.4.3.2.8\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.3.2.8.1\" style=\"font-size:80%;\">0.67 GB</span></td>\n<td class=\"ltx_td ltx_align_left\" id=\"S6.T4.4.3.2.9\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.3.2.9.1\" style=\"font-size:80%;\">Yes</span></td>\n<td class=\"ltx_td ltx_nopad_r ltx_align_left\" id=\"S6.T4.4.3.2.10\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.3.2.10.1\" style=\"font-size:80%;\">5.77 GB</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S6.T4.4.4.3\">\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_bb\" id=\"S6.T4.4.4.3.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S6.T4.4.4.3.1.1\">\n<tr class=\"ltx_tr\" id=\"S6.T4.4.4.3.1.1.1\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_left\" id=\"S6.T4.4.4.3.1.1.1.1\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.4.3.1.1.1.1.1\" style=\"font-size:80%;\">Open Telemetry</span></td>\n</tr>\n</table>\n</th>\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_bb\" id=\"S6.T4.4.4.3.2\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.4.3.2.1\" style=\"font-size:80%;\">Traces</span></th>\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_bb\" id=\"S6.T4.4.4.3.3\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.4.3.3.1\" style=\"font-size:80%;\">4KB</span></th>\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_bb\" id=\"S6.T4.4.4.3.4\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.4.3.4.1\" style=\"font-size:80%;\">each 15s</span></th>\n<td class=\"ltx_td ltx_align_left ltx_border_bb\" id=\"S6.T4.4.4.3.5\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.4.3.5.1\" style=\"font-size:80%;\">1 MB</span></td>\n<td class=\"ltx_td ltx_align_left ltx_border_bb\" id=\"S6.T4.4.4.3.6\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.4.3.6.1\" style=\"font-size:80%;\">No</span></td>\n<td class=\"ltx_td ltx_align_left ltx_border_bb\" id=\"S6.T4.4.4.3.7\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.4.3.7.1\" style=\"font-size:80%;\">Yes</span></td>\n<td class=\"ltx_td ltx_align_left ltx_border_bb\" id=\"S6.T4.4.4.3.8\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.4.3.8.1\" style=\"font-size:80%;\">0.2 GB</span></td>\n<td class=\"ltx_td ltx_align_left ltx_border_bb\" id=\"S6.T4.4.4.3.9\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.4.3.9.1\" style=\"font-size:80%;\">Yes</span></td>\n<td class=\"ltx_td ltx_nopad_r ltx_align_left ltx_border_bb\" id=\"S6.T4.4.4.3.10\" style=\"padding-top:-0.65pt;padding-bottom:-0.65pt;\"><span class=\"ltx_text\" id=\"S6.T4.4.4.3.10.1\" style=\"font-size:80%;\">1.54 GB</span></td>\n</tr>\n</tbody>\n</table>\n</figure>",
        "perturb_sentence_id": [
            17,
            18
        ],
        "output": {
            "perturbed_statement": "[paragraph id = 17] Using the strategies of removing the help text and changing the configuration of Node Exporter to expose only metrics about CPU, memory, disk, network, and power supply, and increasing the scrap delay to 10 seconds on Prometheus, we could reduce the volume of metric data on the Fog node by 50%, which also positively affected CPU and memory usage by Prometheus. Regarding the logs generated by Mobile IoT-Roadbot while the trucks were moving around the city, they record information about 3G network analysis, such as latency and throughput, and contextual information (GNSS coordinates, truck speed, etc.).",
            "perturbed_explanation": "Original Explanation: The first part of the sentence describes the strategy used to reduce the data collected by Prometheus, including changing the configuration of Node Exporter and increasing the scrap delay, which in turn significantly reduced the system's data volume. The second part gives context to the kind of logs generated by Mobile IoT-Roadbot while active. \n\n1. The statement incorrectly changes the reduction in the volume of metric data from 87% to 50%. The original context mentions an 87% reduction, indicating a more significant impact, thus erroneously changing the extent of the reduction. \n\n2. The statement incorrectly mentions the logs being for 3G network analysis instead of 5G. The context specifies that the logs recorded by Mobile IoT-Roadbot concern analysis of the 5G network, making the reference to 3G incorrect."
        }
    }
]