perturb_for_table / table_result /2407.00079v3_output.json
linbojunzi's picture
Upload 30 files
fd31a8c verified
raw
history blame
19.9 kB
[
{
"path": "table_paper/2407.00079v3.json",
"table_id": "1",
"section": "4.2",
"all_context": [
"Figure 5 illustrates the distribution of input and output lengths in our trace, with an average input length of 7,590 tokens and an average output length of 182 tokens.",
"The average input-output ratio is approximately 720.",
"It is important to note that this is only a representative pattern and not unanimous for all workloads, reflecting Kimi s renowned capability for superior long-context processing and understanding.",
"We also conducted a simple cache policy analysis based on this trace, assuming a single global cache pool.",
"Table 1 compares three cache strategies: LRU, LFU, and LengthAwareCache (similar to LFU but prioritizing eviction of cache blocks occurring later in requests) across different cache capacities.",
"Increasing the cache capacity from 1,000 to 50,000 blocks boosts the cache hit ratio from 30% to 50%.",
"Further capacity increases show minimal improvement.",
"However, this should not be interpreted as an indication that larger caches are unnecessary, as the sample trace represents only a subset of real-world workloads.",
"The required capacity should scale proportionally in actual scenarios.",
"LRUCache performs best under this dataset s patterns, likely due to the temporal proximity in request utilization.",
"Additionally, we observed a notable imbalance in cache block popularity, with over 50% of cache blocks remaining unused while certain blocks are accessed tens of thousands of times, as shown in Figure 6 .",
"Replicating these hot blocks is essential to avoid transfer congestion.",
""
],
"target_context_ids": [
4,
5,
6,
7,
8,
9
],
"selected_paragraphs": [
"[paragraph id = 4] Table 1 compares three cache strategies: LRU, LFU, and LengthAwareCache (similar to LFU but prioritizing eviction of cache blocks occurring later in requests) across different cache capacities.",
"[paragraph id = 5] Increasing the cache capacity from 1,000 to 50,000 blocks boosts the cache hit ratio from 30% to 50%.",
"[paragraph id = 6] Further capacity increases show minimal improvement.",
"[paragraph id = 7] However, this should not be interpreted as an indication that larger caches are unnecessary, as the sample trace represents only a subset of real-world workloads.",
"[paragraph id = 8] The required capacity should scale proportionally in actual scenarios.",
"[paragraph id = 9] LRUCache performs best under this dataset s patterns, likely due to the temporal proximity in request utilization."
],
"table_html": "<figure class=\"ltx_table\" id=\"S4.T1\">\n<figcaption class=\"ltx_caption ltx_centering\" style=\"font-size:90%;\"><span class=\"ltx_tag ltx_tag_table\">Table 1: </span>Cache hit rates under different cache policies and capacities.</figcaption>\n<table class=\"ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle\" id=\"S4.T1.4\">\n<thead class=\"ltx_thead\">\n<tr class=\"ltx_tr\" id=\"S4.T1.4.1.1\">\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_th_row ltx_border_tt\" id=\"S4.T1.4.1.1.1\"><span class=\"ltx_text\" id=\"S4.T1.4.1.1.1.1\" style=\"font-size:90%;\">Block capacity</span></th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_th_row ltx_border_tt\" id=\"S4.T1.4.1.1.2\"><span class=\"ltx_text\" id=\"S4.T1.4.1.1.2.1\" style=\"font-size:90%;\">Inf</span></th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S4.T1.4.1.1.3\"><span class=\"ltx_text\" id=\"S4.T1.4.1.1.3.1\" style=\"font-size:90%;\">100000</span></th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S4.T1.4.1.1.4\"><span class=\"ltx_text\" id=\"S4.T1.4.1.1.4.1\" style=\"font-size:90%;\">50000</span></th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S4.T1.4.1.1.5\"><span class=\"ltx_text\" id=\"S4.T1.4.1.1.5.1\" style=\"font-size:90%;\">30000</span></th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S4.T1.4.1.1.6\"><span class=\"ltx_text\" id=\"S4.T1.4.1.1.6.1\" style=\"font-size:90%;\">10000</span></th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S4.T1.4.1.1.7\"><span class=\"ltx_text\" id=\"S4.T1.4.1.1.7.1\" style=\"font-size:90%;\">1000</span></th>\n</tr>\n</thead>\n<tbody class=\"ltx_tbody\">\n<tr class=\"ltx_tr\" id=\"S4.T1.4.2.1\">\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_t\" id=\"S4.T1.4.2.1.1\"><span class=\"ltx_text\" id=\"S4.T1.4.2.1.1.1\" style=\"font-size:90%;\">LRUCache</span></th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_t\" id=\"S4.T1.4.2.1.2\"><span class=\"ltx_text\" id=\"S4.T1.4.2.1.2.1\" style=\"font-size:90%;\">0.51</span></th>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S4.T1.4.2.1.3\"><span class=\"ltx_text\" id=\"S4.T1.4.2.1.3.1\" style=\"font-size:90%;\">0.51</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S4.T1.4.2.1.4\"><span class=\"ltx_text\" id=\"S4.T1.4.2.1.4.1\" style=\"font-size:90%;\">0.50</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S4.T1.4.2.1.5\"><span class=\"ltx_text\" id=\"S4.T1.4.2.1.5.1\" style=\"font-size:90%;\">0.48</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S4.T1.4.2.1.6\"><span class=\"ltx_text\" id=\"S4.T1.4.2.1.6.1\" style=\"font-size:90%;\">0.40</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S4.T1.4.2.1.7\"><span class=\"ltx_text\" id=\"S4.T1.4.2.1.7.1\" style=\"font-size:90%;\">0.30</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S4.T1.4.3.2\">\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_row\" id=\"S4.T1.4.3.2.1\"><span class=\"ltx_text\" id=\"S4.T1.4.3.2.1.1\" style=\"font-size:90%;\">LFUCache</span></th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_row\" id=\"S4.T1.4.3.2.2\"><span class=\"ltx_text\" id=\"S4.T1.4.3.2.2.1\" style=\"font-size:90%;\">0.51</span></th>\n<td class=\"ltx_td ltx_align_center\" id=\"S4.T1.4.3.2.3\"><span class=\"ltx_text\" id=\"S4.T1.4.3.2.3.1\" style=\"font-size:90%;\">0.51</span></td>\n<td class=\"ltx_td ltx_align_center\" id=\"S4.T1.4.3.2.4\"><span class=\"ltx_text\" id=\"S4.T1.4.3.2.4.1\" style=\"font-size:90%;\">0.49</span></td>\n<td class=\"ltx_td ltx_align_center\" id=\"S4.T1.4.3.2.5\"><span class=\"ltx_text\" id=\"S4.T1.4.3.2.5.1\" style=\"font-size:90%;\">0.43</span></td>\n<td class=\"ltx_td ltx_align_center\" id=\"S4.T1.4.3.2.6\"><span class=\"ltx_text\" id=\"S4.T1.4.3.2.6.1\" style=\"font-size:90%;\">0.35</span></td>\n<td class=\"ltx_td ltx_align_center\" id=\"S4.T1.4.3.2.7\"><span class=\"ltx_text\" id=\"S4.T1.4.3.2.7.1\" style=\"font-size:90%;\">0.30</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S4.T1.4.4.3\">\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_bb\" id=\"S4.T1.4.4.3.1\"><span class=\"ltx_text\" id=\"S4.T1.4.4.3.1.1\" style=\"font-size:90%;\">LengthAwareCache</span></th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_bb\" id=\"S4.T1.4.4.3.2\"><span class=\"ltx_text\" id=\"S4.T1.4.4.3.2.1\" style=\"font-size:90%;\">0.51</span></th>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S4.T1.4.4.3.3\"><span class=\"ltx_text\" id=\"S4.T1.4.4.3.3.1\" style=\"font-size:90%;\">0.50</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S4.T1.4.4.3.4\"><span class=\"ltx_text\" id=\"S4.T1.4.4.3.4.1\" style=\"font-size:90%;\">0.48</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S4.T1.4.4.3.5\"><span class=\"ltx_text\" id=\"S4.T1.4.4.3.5.1\" style=\"font-size:90%;\">0.42</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S4.T1.4.4.3.6\"><span class=\"ltx_text\" id=\"S4.T1.4.4.3.6.1\" style=\"font-size:90%;\">0.35</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S4.T1.4.4.3.7\"><span class=\"ltx_text\" id=\"S4.T1.4.4.3.7.1\" style=\"font-size:90%;\">0.30</span></td>\n</tr>\n</tbody>\n</table>\n</figure>",
"perturb_sentence_id": 5,
"output": {
"perturbed_statement": "[paragraph id = 5] Increasing the cache capacity from 1,000 to 50,000 blocks boosts the cache hit ratio from 30% to 70%.Further capacity increases show minimal improvement.",
"perturbed_explanation": "The original explanation for the improvement in cache hit ratio primarily stems from the increased availability of cached data blocks when cache capacity is expanded. This facilitates a reduction in cache misses for frequently accessed data. 2. However, the statement suggests an improvement from 30% to 70%, which deviates from the provided data indicating an increase to 50%. This exaggerates the impact of increasing cache capacity, misrepresenting the scale of quality improvement."
}
},
{
"path": "table_paper/2407.00079v3.json",
"table_id": "2",
"section": "8.1",
"all_context": [
"This section evaluates the end-to-end performance of Mooncake under different datasets and various workloads.",
"As stated before, to protect proprietary information and facilitate reproducibility, all the experimental results reported in this paper are based on a dummy model that follows the same architecture as LLaMA2-70B.",
"Testbed During the experiments, the system was deployed on a high-performance computing node cluster to test performance.",
"Each node in the cluster is configured as follows: 8 NVIDIA-A800-SXM4-80GB GPUs, each with 80GB HBM, connected by NVLINK; equipped with RDMA network cards that supporting up to 800 Gbps of interconnect bandwidth between nodes.",
"Each node deploys either a prefill instance or a decoding instance according to the startup parameter.",
"Dataset and Workload Building upon previous research [15 , 8 , 14 ], we selected or designed the datasets as outlined in Table 2 .",
"In addition to utilizing public datasets, we generated a batch of simulated data featuring predefined lengths and prefix cache ratios for our experiments.",
"To examine performance in real-world scenarios, we constructed a dataset consisting of 23,000 real request traces, each annotated with an arrival timestamp.",
"Experiments involving real request traces were conducted by replaying these requests according to their actual arrival times.",
"For other scenarios, we simulated requests using a Poisson arrival process and controlled the request rate through RPS (Requests per Second).",
"Metric In the experiments, we focus on the throughput performance of various systems under defined SLOs.",
"We measure the TTFT and TBT across different RPS rates, where a higher RPS signifies improved throughput.",
"To assess whether the majority of requests satisfy the SLOs, we use the 90th percentile (P90) values of TTFT and TBT as the ultimate metrics.",
"As mentioned in §2 , the thresholds for TTFT and TBT are set by multiplying the lowest observed RPS values by factors of 10 and 5, respectively.",
"Exceeding these thresholds indicates a failure to meet the SLOs and the corresponding consumed resources are considered as wasted.",
"For ease of comparison, we normalize all TTFT and TBT values against these upper limits, establishing a baseline of 1.0.",
"Baseline We employ vLLM, one of the state-of-the-art open-source LLM serving systems, as our experimental baseline.",
"vLLM incorporates continuous batching and PagedAttention technologies, significantly boosting inference throughput.",
"Despite its strengths, vLLM s design, which couples the prefill and decoding stages of inference requests, can cause disruptions during decoding in scenarios involving long contexts.",
"ArXiv Summarization L-Eval",
""
],
"target_context_ids": [
5,
6,
7,
8,
9
],
"selected_paragraphs": [
"[paragraph id = 5] Dataset and Workload Building upon previous research [15 , 8 , 14 ], we selected or designed the datasets as outlined in Table 2 .",
"[paragraph id = 6] In addition to utilizing public datasets, we generated a batch of simulated data featuring predefined lengths and prefix cache ratios for our experiments.",
"[paragraph id = 7] To examine performance in real-world scenarios, we constructed a dataset consisting of 23,000 real request traces, each annotated with an arrival timestamp.",
"[paragraph id = 8] Experiments involving real request traces were conducted by replaying these requests according to their actual arrival times.",
"[paragraph id = 9] For other scenarios, we simulated requests using a Poisson arrival process and controlled the request rate through RPS (Requests per Second)."
],
"table_html": "<figure class=\"ltx_table\" id=\"S8.T2\">\n<figcaption class=\"ltx_caption ltx_centering\" style=\"font-size:90%;\"><span class=\"ltx_tag ltx_tag_table\">Table 2: </span>Datasets used in the end-to-end experiment.</figcaption>\n<table class=\"ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle\" id=\"S8.T2.4\">\n<thead class=\"ltx_thead\">\n<tr class=\"ltx_tr\" id=\"S8.T2.4.1.1\">\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S8.T2.4.1.1.1\"><span class=\"ltx_text\" id=\"S8.T2.4.1.1.1.1\" style=\"font-size:90%;\">Dataset</span></th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S8.T2.4.1.1.2\"><span class=\"ltx_text\" id=\"S8.T2.4.1.1.2.1\" style=\"font-size:90%;\">Avg Input Length</span></th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S8.T2.4.1.1.3\"><span class=\"ltx_text\" id=\"S8.T2.4.1.1.3.1\" style=\"font-size:90%;\">Avg Output Length</span></th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S8.T2.4.1.1.4\"><span class=\"ltx_text\" id=\"S8.T2.4.1.1.4.1\" style=\"font-size:90%;\">Cache Ratio</span></th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S8.T2.4.1.1.5\"><span class=\"ltx_text\" id=\"S8.T2.4.1.1.5.1\" style=\"font-size:90%;\">Arrival Pattern</span></th>\n</tr>\n</thead>\n<tbody class=\"ltx_tbody\">\n<tr class=\"ltx_tr\" id=\"S8.T2.4.2.1\">\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S8.T2.4.2.1.1\">\n<span class=\"ltx_text\" id=\"S8.T2.4.2.1.1.1\" style=\"font-size:90%;\">ArXiv Summarization </span><cite class=\"ltx_cite ltx_citemacro_cite\"><span class=\"ltx_text\" id=\"S8.T2.4.2.1.1.2.1\" style=\"font-size:90%;\">[</span><a class=\"ltx_ref\" href=\"https://arxiv.org/html/2407.00079v3#bib.bib26\" title=\"\">26</a><span class=\"ltx_text\" id=\"S8.T2.4.2.1.1.3.2\" style=\"font-size:90%;\">]</span></cite>\n</td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S8.T2.4.2.1.2\"><span class=\"ltx_text\" id=\"S8.T2.4.2.1.2.1\" style=\"font-size:90%;\">8088</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S8.T2.4.2.1.3\"><span class=\"ltx_text\" id=\"S8.T2.4.2.1.3.1\" style=\"font-size:90%;\">229</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S8.T2.4.2.1.4\"><span class=\"ltx_text\" id=\"S8.T2.4.2.1.4.1\" style=\"font-size:90%;\">~0%</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S8.T2.4.2.1.5\"><span class=\"ltx_text\" id=\"S8.T2.4.2.1.5.1\" style=\"font-size:90%;\">Poisson Process</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S8.T2.4.3.2\">\n<td class=\"ltx_td ltx_align_center\" id=\"S8.T2.4.3.2.1\">\n<span class=\"ltx_text\" id=\"S8.T2.4.3.2.1.1\" style=\"font-size:90%;\">L-Eval </span><cite class=\"ltx_cite ltx_citemacro_cite\"><span class=\"ltx_text\" id=\"S8.T2.4.3.2.1.2.1\" style=\"font-size:90%;\">[</span><a class=\"ltx_ref\" href=\"https://arxiv.org/html/2407.00079v3#bib.bib27\" title=\"\">27</a><span class=\"ltx_text\" id=\"S8.T2.4.3.2.1.3.2\" style=\"font-size:90%;\">]</span></cite>\n</td>\n<td class=\"ltx_td ltx_align_center\" id=\"S8.T2.4.3.2.2\"><span class=\"ltx_text\" id=\"S8.T2.4.3.2.2.1\" style=\"font-size:90%;\">19019</span></td>\n<td class=\"ltx_td ltx_align_center\" id=\"S8.T2.4.3.2.3\"><span class=\"ltx_text\" id=\"S8.T2.4.3.2.3.1\" style=\"font-size:90%;\">72</span></td>\n<td class=\"ltx_td ltx_align_center\" id=\"S8.T2.4.3.2.4\"><span class=\"ltx_text\" id=\"S8.T2.4.3.2.4.1\" style=\"font-size:90%;\">&gt;80%</span></td>\n<td class=\"ltx_td ltx_align_center\" id=\"S8.T2.4.3.2.5\"><span class=\"ltx_text\" id=\"S8.T2.4.3.2.5.1\" style=\"font-size:90%;\">Poisson Process</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S8.T2.4.4.3\">\n<td class=\"ltx_td ltx_align_center\" id=\"S8.T2.4.4.3.1\"><span class=\"ltx_text\" id=\"S8.T2.4.4.3.1.1\" style=\"font-size:90%;\">Simulated Data</span></td>\n<td class=\"ltx_td ltx_align_center\" id=\"S8.T2.4.4.3.2\"><span class=\"ltx_text\" id=\"S8.T2.4.4.3.2.1\" style=\"font-size:90%;\">16k, 32k, 64k, 128k</span></td>\n<td class=\"ltx_td ltx_align_center\" id=\"S8.T2.4.4.3.3\"><span class=\"ltx_text\" id=\"S8.T2.4.4.3.3.1\" style=\"font-size:90%;\">512</span></td>\n<td class=\"ltx_td ltx_align_center\" id=\"S8.T2.4.4.3.4\"><span class=\"ltx_text\" id=\"S8.T2.4.4.3.4.1\" style=\"font-size:90%;\">50%</span></td>\n<td class=\"ltx_td ltx_align_center\" id=\"S8.T2.4.4.3.5\"><span class=\"ltx_text\" id=\"S8.T2.4.4.3.5.1\" style=\"font-size:90%;\">Poisson Process</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S8.T2.4.5.4\">\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S8.T2.4.5.4.1\"><span class=\"ltx_text\" id=\"S8.T2.4.5.4.1.1\" style=\"font-size:90%;\">Real Data</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S8.T2.4.5.4.2\"><span class=\"ltx_text\" id=\"S8.T2.4.5.4.2.1\" style=\"font-size:90%;\">7955</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S8.T2.4.5.4.3\"><span class=\"ltx_text\" id=\"S8.T2.4.5.4.3.1\" style=\"font-size:90%;\">194</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S8.T2.4.5.4.4\"><span class=\"ltx_text\" id=\"S8.T2.4.5.4.4.1\" style=\"font-size:90%;\">~50%</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S8.T2.4.5.4.5\"><span class=\"ltx_text\" id=\"S8.T2.4.5.4.5.1\" style=\"font-size:90%;\">Timestamp-based</span></td>\n</tr>\n</tbody>\n</table>\n</figure>",
"perturb_sentence_id": 5,
"output": {
"perturbed_statement": "[paragraph id = 5] Dataset and Workload Building upon previous research [15 , 8 , 14 ], we selected or designed the datasets as outlined in Table 4. In addition to utilizing non-public datasets, we generated a batch of simulated data featuring predefined lengths and prefix cache ratios for our tests.",
"perturbed_explanation": "The original statement provides an accurate depiction of the dataset preparation process: 1. The datasets were outlined in Table 2, and public datasets were utilized for generating simulated data. The altered statement introduces inaccuracies: 1. It incorrectly specifies Table 4 instead of Table 2 as the reference for the dataset outline. 2. It claims non-public datasets were utilized, which contradicts the original statement that mentions public datasets."
}
}
]