[
{
"path": "table_paper/2407.00010v1.json",
"table_id": "1",
"section": "5.1",
"all_context": [
"The systems we profile are shown in Table 1 .",
"We consider these systems as they demonstrate three prominent CPU manufactures and different generations of GPUs.",
"We utilize PyTorch v2.0.1, Torchvision v0.15.2, Numpy v1.26.0, Huggingface v0.20.2, and Accelerate v0.26.1.",
"We note that the M1-Pro results only include the Llama-2 (7B) and Mistral (7B) results, as Falcon (7B) generally did not complete tasks in less than two orders of magnitude greater runtime.",
""
],
"target_context_ids": [
0,
1,
3
],
"selected_paragraphs": [
"[paragraph id = 0] The systems we profile are shown in Table 1 .",
"[paragraph id = 1] We consider these systems as they demonstrate three prominent CPU manufactures and different generations of GPUs.",
"[paragraph id = 3] We note that the M1-Pro results only include the Llama-2 (7B) and Mistral (7B) results, as Falcon (7B) generally did not complete tasks in less than two orders of magnitude greater runtime."
],
"table_html": "
\n
\n\n
\n
System Name
\n
CPU
\n
GPU(s) per Node
\n
DRAM per Node
\n
VRAM per GPU
\n
\n\n\n
\n
Macbook Pro
\n
10-core M1 Pro
\n
14-core M1 Pro
\n
32GB
\n
-
\n
\n
\n
Swing AMD+A100
\n
264-core AMD EPYC 7742
\n
8NVIDIA A100
\n
1TB
\n
40GB
\n
\n
\n
Palmetto Intel+V100
\n
40-Core Intel Xeon 6148G
\n
2NVIDIA V100
\n
376GB
\n
16GB
\n
\n\n
\nTable 1. Our System Configurations\n
",
"perturb_sentence_id": [
3,
4
],
"output": {
"perturbed_statement": "[paragraph id = 3] We note that the M1-Pro results only include the Llama-2 (7B) and Mistral (7B) results, as Falcon (7B) generally completed tasks in less than two orders of magnitude greater runtime.",
"perturbed_explanation": "The original explanation: The statement specifies that the M1-Pro results only include certain model results because Falcon (7B) generally did not perform efficiently, taking significantly longer to complete tasks. 2. The statement now incorrectly suggests that the Falcon (7B) completed tasks efficiently in less than two orders of magnitude greater runtime, which contradicts the reason given for its exclusion in the results. Instead, it should reflect that the Falcon (7B) had inefficient run times that excluded it from the results."
}
},
{
"path": "table_paper/2407.00010v1.json",
"table_id": "1",
"section": "5.3",
"all_context": [
"Here, we present the impacts on runtime, energy consumption per token, and throughput for LLMs across different hardware configurations while varying the number of input tokens.",
"We perform these experiments using the suite of systems outlined in Table 1 with the models outlined in Section 4.1 .",
"In our experiments on the Palmetto Intel+V100 system, the V100 GPU had an out-of-memory error beyond 1024 output tokens for Falcon (7B).",
"Our runtime measurements show a significant increase as input tokens grow.",
"As depicted in Figure 1(a) , all systems exhibit a nonlinear escalation in runtime with increasing token counts, with the M1-Pro system showing the most significant magnitude.",
"This trend highlights the computational burden imposed by larger input sizes, particularly on smaller systems that are not as well designed to handle extensive workloads.",
"For all systems, we notice that throughput follows a ”roofline model” with increasing input tokens (roofline, ).",
"Figure 1(b) illustrates these dynamics, indicating an increase in throughput for all systems until a certain point where inference becomes bound by compute and not by the overhead of the software, as described by roofline performance models (roofline, ).",
"Energy efficiency varies markedly across different systems.",
"The M1-Pro demonstrates consistently low energy consumption per token, particularly for smaller input sizes, as shown in Figure 1(c) .",
"This efficiency reflects the M1-Pro s design optimization for low-power operations.",
"In contrast, the Swing AMD+A100, while capable of handling more significant token inputs more efficiently, consumed more energy per token for small workloads yet became more energy efficient at larger input token sizes, underscoring a trade-off between workload size and energy efficiency.",
""
],
"target_context_ids": [
1
],
"selected_paragraphs": [
"[paragraph id = 1] We perform these experiments using the suite of systems outlined in Table 1 with the models outlined in Section 4.1 ."
],
"table_html": "
\n
\n\n
\n
System Name
\n
CPU
\n
GPU(s) per Node
\n
DRAM per Node
\n
VRAM per GPU
\n
\n\n\n
\n
Macbook Pro
\n
10-core M1 Pro
\n
14-core M1 Pro
\n
32GB
\n
-
\n
\n
\n
Swing AMD+A100
\n
264-core AMD EPYC 7742
\n
8NVIDIA A100
\n
1TB
\n
40GB
\n
\n
\n
Palmetto Intel+V100
\n
40-Core Intel Xeon 6148G
\n
2NVIDIA V100
\n
376GB
\n
16GB
\n
\n\n
\nTable 1. Our System Configurations\n
",
"perturb_sentence_id": [
1,
2
],
"output": {
"perturbed_statement": "[paragraph id = 1] We perform these experiments using the suite of systems outlined in Table 1 with the models outlined in Section 4.1. In our experiments on the Palmetto Intel+V100 system, the V100 GPU had an out-of-memory error beyond 2048 output tokens for Falcon (7B).",
"perturbed_explanation": "1. We perform these experiments using the suite of systems outlined in Table 1 with the models outlined in Section 4.1. In our experiments on the Palmetto Intel+V100 system, the V100 GPU had an out-of-memory error beyond 1024 output tokens for Falcon (7B). The statement is incorrect because it claims the V100 GPU had an out-of-memory error beyond 2048 output tokens, whereas it actually occurred beyond 1024 output tokens, according to the context."
}
}
]