[ { "path": "table_paper/2407.00056v1.json", "table_id": "1", "section": "6.1.2", "all_context": [ "To prove the effectiveness of our proposed MFQ and GIE module, we also compare our method on two public short video recommendation datasets: TikTok and MovieLens.", "The statistics of datasets are shown in Table 1 .", "Movielens333https://grouplens.org/datasets/movielens/ (Harper and Konstan, 2015 ) is a widely used dataset (Sun et al., 2019 ; Kang and McAuley, 2018 ; Sankar et al., 2020 ; Wu et al., 2022 ) for the recommendation task.", "The raw data is initially acquired by collecting movie descriptions from Movielens-10M and crawling the corresponding trailers from YouTube.", "Textual features are subsequently extracted from the descriptions using the Sentence2Vector (Arora et al., 2017 ).", "For visual modality, key frames are initially extracted from the retrieved videos and then processed by a pre-trained ResNet50 model [9] to obtain visual features.", "The acoustic features are obtained using VGGish [12], following a soundtrack separation procedure implemented with the FFmpeg software.", "TikTok444http://ai-lab-challenge.bytedance.com/tce/vc/ is published by TikTok, a micro-video sharing platform that enables users to create and share micro-videos with durations ranging from 3 to 15 seconds.", "TikTok comprises users, micro-videos, and their interactions, such as clicks.", "The features of the micro-videos in each modality are extracted and made available without providing the raw data.", "Specifically, the textual characteristics are extracted from the micro-video captions provided by users.", "" ], "target_context_ids": [ 1 ], "selected_paragraphs": [ "[paragraph id = 1] The statistics of datasets are shown in Table 1 ." ], "table_html": "
\n
Table 1. \nThe statistics of public datasets. V, A, and T represent the dimensions of visual, acoustic and textual features.\n
\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
\n\nDataset\n\n\n\n#Interactions\n\n\n\n#Items\n\n\n\n#Users\n\n\n\nSparsity\n\n\n\nV\n\n\n\nA\n\n\n\nT\n\n
\n\nTiktok\n\n\n\n726,065\n\n\n\n76,085\n\n\n\n36,656\n\n\n\n99.97%\n\n\n\n128\n\n\n\n128\n\n\n\n128\n\n
\n\nMovielens\n\n\n\n1,239,508\n\n\n\n5,986\n\n\n\n55,485\n\n\n\n99.63%\n\n\n\n2048\n\n\n\n128\n\n\n\n100\n\n
\n
\n
", "perturb_sentence_id": 1, "output": { "perturbed_statement": "[paragraph id = 1] The statistics of datasets are shown in Table 1. Movielens555 https://grouplens.org/datasets/movielens/ (Harper and Konstan, 2015) is a rarely used dataset (Sun et al., 2019; Kang and McAuley, 2018; Sankar et al., 2020; Wu et al., 2022) for unrelated analytical tasks.", "perturbed_explanation": "The Original Explanation remains valid in providing accurate insights regarding the Movielens dataset usage. However, the statement contains the following inaccuracies: (1) \"Movielens555\" does not exist; the reference should be to \"Movielens333\". (2) The dataset is described as \"rarely used\" and for \"unrelated analytical tasks,\" both of which are incorrect as the dataset is widely recognized and applied in recommendation tasks. These false details misrepresent the dataset's characteristics and applications." } }, { "path": "table_paper/2407.00056v1.json", "table_id": "2", "section": "6.4", "all_context": [ "Table 2 shows the performance of all models on the Kuaishou dataset.", "Note that given the large number of users and samples in Kuaishou dataset, an improvement of 0.5% in AUC, UAUC, and GAUC during offline evaluation holds significant value to bring obvious online gains for business.", "Table 3 presents the performance of several competitors on public Tiktok and Movielens datasets.", "First, our method surpasses all baselines by a significant margin on Kuaishou dataset.", "Our method MFQ significantly outperforms traditional live streaming recommendation models BDR and MTA in UAUC and GAUC for two main reasons.", "Firstly, BDR ignores the modeling of multi-modal content, while MTA lacks the connection to distinctive characteristics across various types of authors.", "In contrast, our MFQ successfully leverages the multi-modal content of the target live-streaming room and adopts learnable queries to extract streamer-aware content patterns.", "Additionally, our method GIE also outperforms the graph-based method EgoFusion which provides evidence that the metapath-guided behavior expansion process greatly enhances behavior representation and explores potential donation preferences.", "Secondly, our method exhibits generalizability to a common behavior-based model.", "Our method has seamlessly integrated into two widely used behavior-based methods, MMoE and SIM, both of which demonstrate significant performance improvements.", "Moreover, MMBee is not limited to these two behavior-based models and can be easily adapted to other methods such as DIN (Zhou et al., 2018 ) and DIEN (Zhou et al., 2019 ) as well.", "Thirdly, our method is not restricted to gifting prediction tasks and it also proves effectiveness in multi-modal recommendation tasks.", "As shown in Table 3 , our method exhibits great improvement when compared to several strong multi-modal recommendation baselines.", "This gain mainly comes from two folds: (1) The metapath-guided neighbors in our method enable better capture of user preferences, but other graph-based methods only rely on implicit learning from graph embeddings.", "(2) The MFQ module enhances the fusion of multi-modal features from short videos and clusters different videos with learnable queries initialized with item embedding, thereby benefiting further performance improvement of the recommendation model.", "" ], "target_context_ids": [ 0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11 ], "selected_paragraphs": [ "[paragraph id = 0] Table 2 shows the performance of all models on the Kuaishou dataset.", "[paragraph id = 1] Note that given the large number of users and samples in Kuaishou dataset, an improvement of 0.5% in AUC, UAUC, and GAUC during offline evaluation holds significant value to bring obvious online gains for business.", "[paragraph id = 3] First, our method surpasses all baselines by a significant margin on Kuaishou dataset.", "[paragraph id = 4] Our method MFQ significantly outperforms traditional live streaming recommendation models BDR and MTA in UAUC and GAUC for two main reasons.", "[paragraph id = 5] Firstly, BDR ignores the modeling of multi-modal content, while MTA lacks the connection to distinctive characteristics across various types of authors.", "[paragraph id = 6] In contrast, our MFQ successfully leverages the multi-modal content of the target live-streaming room and adopts learnable queries to extract streamer-aware content patterns.", "[paragraph id = 7] Additionally, our method GIE also outperforms the graph-based method EgoFusion which provides evidence that the metapath-guided behavior expansion process greatly enhances behavior representation and explores potential donation preferences.", "[paragraph id = 8] Secondly, our method exhibits generalizability to a common behavior-based model.", "[paragraph id = 9] Our method has seamlessly integrated into two widely used behavior-based methods, MMoE and SIM, both of which demonstrate significant performance improvements.", "[paragraph id = 10] Moreover, MMBee is not limited to these two behavior-based models and can be easily adapted to other methods such as DIN (Zhou et al., 2018 ) and DIEN (Zhou et al., 2019 ) as well.", "[paragraph id = 11] Thirdly, our method is not restricted to gifting prediction tasks and it also proves effectiveness in multi-modal recommendation tasks." ], "table_html": "
\n
Table 2. \nPerformances of different methods on Kuaishou dataset. represents the absolute improvement.\n
\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
MethodsGTR
\n\nAUC\n\n\n\nImpr.\n\n\n\nUAUC\n\n\n\nImpr.\n\n\n\nGAUC\n\n\n\nImpr.\n\n
\n\nMMoE (Ma et al., 2018)\n\n\n\n0.956230\n\n\n\n-\n\n\n\n0.730186\n\n\n\n-\n\n\n\n0.746711\n\n\n\n-\n\n
\n\nMMoE+BDR (Zhang et al., 2021)\n\n\n\n0.956908\n\n\n\n+0.0678 %\n\n\n\n0.730625\n\n\n\n+0.0439 %\n\n\n\n0.747136\n\n\n\n+0.0425 %\n\n
\n\nMMoE+MTA (Xi et al., 2023)\n\n\n\n0.957095\n\n\n\n+0.0865 %\n\n\n\n0.731450\n\n\n\n+0.1264 %\n\n\n\n0.747327\n\n\n\n+0.0616 %\n\n
\n\nMMoE+EgoFusion (Chen et al., 2022)\n\n\n\n0.956952\n\n\n\n+0.0722 %\n\n\n\n0.731418\n\n\n\n+0.1232 %\n\n\n\n0.747275\n\n\n\n+0.0564 %\n\n
\n\nMMoE+MFQ\n\n\n\n0.956902\n\n\n\n+0.0672 %\n\n\n\n0.731975\n\n\n\n+0.1789 %\n\n\n\n0.747275\n\n\n\n+0.1764 %\n\n
\n\nMMoE+GIE\n\n\n\n0.957064\n\n\n\n+0.0834 %\n\n\n\n0.733853\n\n\n\n+0.3667 %\n\n\n\n0.751239\n\n\n\n+0.4528 %\n\n
\n\nMMoE+Ours(MFQ+GIE)\n\n\n\n0.95723\n\n\n\n+0.1001 %\n\n\n\n0.735776\n\n\n\n+0.5590 %\n\n\n\n0.753017\n\n\n\n+0.6306 %\n\n
\n\nSIM (Pi et al., 2020)\n\n\n\n0.958656\n\n\n\n-\n\n\n\n0.732239\n\n\n\n-\n\n\n\n0.748383\n\n\n\n-\n\n
\n\nSIM+BDR (Zhang et al., 2021)\n\n\n\n0.958419\n\n\n\n-0.0237 %\n\n\n\n0.734757\n\n\n\n+0.2518 %\n\n\n\n0.750684\n\n\n\n+0.2301 %\n\n
\n\nSIM+MTA (Xi et al., 2023)\n\n\n\n0.958867\n\n\n\n+0.0211 %\n\n\n\n0.734921\n\n\n\n+0.2682 %\n\n\n\n0.750802\n\n\n\n+0.2419 %\n\n
\n\nSIM+EgoFusion (Chen et al., 2022)\n\n\n\n0.959387\n\n\n\n+0.0085 %\n\n\n\n0.735608\n\n\n\n+0.3369 %\n\n\n\n0.751669\n\n\n\n+0.3286 %\n\n
\n\nSIM+MFQ\n\n\n\n0.959202\n\n\n\n+0.0546 %\n\n\n\n0.735717\n\n\n\n+0.3478 %\n\n\n\n0.751780\n\n\n\n+0.3397 %\n\n
\n\nSIM+GIE\n\n\n\n0.959802\n\n\n\n+0.1146 %\n\n\n\n0.738309\n\n\n\n+0.6070 %\n\n\n\n0.755154\n\n\n\n+0.6771 %\n\n
\n\nSIM+Ours(MFQ+GIE)\n\n\n\n0.960302\n\n\n\n+0.1646 %\n\n\n\n0.743678\n\n\n\n+1.1439 %\n\n\n\n0.76044\n\n\n\n+1.2057 %\n\n
\n\np-value\n\n
\n
\n
", "perturb_sentence_id": 10, "output": { "perturbed_statement": "[paragraph id = 10] Moreover, MMBee is not limited to these two behavior-based models and can be easily adapted to other methods such as EgoFusion and DIN (Zhou et al., 2018 ) as well. Thirdly, our method is particularly restricted to gifting prediction tasks and it also is less effective in multi-modal recommendation tasks.", "perturbed_explanation": "Original Explanation: The statement suggests that MMBee demonstrates general adaptability beyond the two behavior-based models, such as DIN and DIEN, and shows efficacy in multi-modal recommendation tasks apart from gifting prediction tasks. 1. EgoFusion is mentioned as a graph-based method in the context, and not as a demonstrative example of models that MMBee is adaptable with. 2. It's emphasized that MMBee excels in multi-modal recommendation tasks, indicating its broad application scope, which the statement contradicts." } }, { "path": "table_paper/2407.00056v1.json", "table_id": "3", "section": "6.4", "all_context": [ "Table 2 shows the performance of all models on the Kuaishou dataset.", "Note that given the large number of users and samples in Kuaishou dataset, an improvement of 0.5% in AUC, UAUC, and GAUC during offline evaluation holds significant value to bring obvious online gains for business.", "Table 3 presents the performance of several competitors on public Tiktok and Movielens datasets.", "First, our method surpasses all baselines by a significant margin on Kuaishou dataset.", "Our method MFQ significantly outperforms traditional live streaming recommendation models BDR and MTA in UAUC and GAUC for two main reasons.", "Firstly, BDR ignores the modeling of multi-modal content, while MTA lacks the connection to distinctive characteristics across various types of authors.", "In contrast, our MFQ successfully leverages the multi-modal content of the target live-streaming room and adopts learnable queries to extract streamer-aware content patterns.", "Additionally, our method GIE also outperforms the graph-based method EgoFusion which provides evidence that the metapath-guided behavior expansion process greatly enhances behavior representation and explores potential donation preferences.", "Secondly, our method exhibits generalizability to a common behavior-based model.", "Our method has seamlessly integrated into two widely used behavior-based methods, MMoE and SIM, both of which demonstrate significant performance improvements.", "Moreover, MMBee is not limited to these two behavior-based models and can be easily adapted to other methods such as DIN (Zhou et al., 2018 ) and DIEN (Zhou et al., 2019 ) as well.", "Thirdly, our method is not restricted to gifting prediction tasks and it also proves effectiveness in multi-modal recommendation tasks.", "As shown in Table 3 , our method exhibits great improvement when compared to several strong multi-modal recommendation baselines.", "This gain mainly comes from two folds: (1) The metapath-guided neighbors in our method enable better capture of user preferences, but other graph-based methods only rely on implicit learning from graph embeddings.", "(2) The MFQ module enhances the fusion of multi-modal features from short videos and clusters different videos with learnable queries initialized with item embedding, thereby benefiting further performance improvement of the recommendation model.", "" ], "target_context_ids": [ 2, 12, 13 ], "selected_paragraphs": [ "[paragraph id = 2] Table 3 presents the performance of several competitors on public Tiktok and Movielens datasets.", "[paragraph id = 12] As shown in Table 3 , our method exhibits great improvement when compared to several strong multi-modal recommendation baselines.", "[paragraph id = 13] This gain mainly comes from two folds: (1) The metapath-guided neighbors in our method enable better capture of user preferences, but other graph-based methods only rely on implicit learning from graph embeddings." ], "table_html": "
\n
Table 3. \nPerformances of different methods on Tiktok and Movielens datasets.
\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
MethodsTikTokMovielens
\n\nRecall@10\n\n\n\nPrecision@10\n\n\n\nNDCG@10\n\n\n\nRecall@10\n\n\n\nPrecision@10\n\n\n\nNDCG@10\n\n
\n\nNGCF (Wang et al., 2019)\n\n\n\n0.0292\n\n\n\n0.0045\n\n\n\n0.0156\n\n\n\n0.1198\n\n\n\n0.0289\n\n\n\n0.0750\n\n
\n\nLightGCN (He et al., 2020)\n\n\n\n0.0448\n\n\n\n0.0082\n\n\n\n0.0261\n\n\n\n0.1992\n\n\n\n0.0479\n\n\n\n0.1324\n\n
\n\nMMGCN (Wei et al., 2019)\n\n\n\n0.0544\n\n\n\n0.0089\n\n\n\n0.0297\n\n\n\n0.2028\n\n\n\n0.0506\n\n\n\n0.1361\n\n
\n\nGRCN (Wei et al., 2020)\n\n\n\n0.0392\n\n\n\n0.0065\n\n\n\n0.0221\n\n\n\n0.1402\n\n\n\n0.0338\n\n\n\n0.0882\n\n
\n\nEgoGCN (Chen et al., 2022)\n\n\n\n0.0569\n\n\n\n0.0093\n\n\n\n0.0330\n\n\n\n0.2155\n\n\n\n0.0524\n\n\n\n0.1444\n\n
\n\nDIN (Zhou et al., 2018)\n\n\n\n0.0403\n\n\n\n0.0074\n\n\n\n0.0235\n\n\n\n0.1372\n\n\n\n0.0330\n\n\n\n0.0912\n\n
\n\nSASRec (Kang and McAuley, 2018)\n\n\n\n0.0435\n\n\n\n0.0043\n\n\n\n0.0215\n\n\n\n0.1914\n\n\n\n0.0191\n\n\n\n0.1006\n\n
\n\nSIM (Pi et al., 2020)\n\n\n\n0.0413\n\n\n\n0.0079\n\n\n\n0.0245\n\n\n\n0.1470\n\n\n\n0.0429\n\n\n\n0.1011\n\n
\n\nMMMLP (Liang et al., 2023)\n\n\n\n0.0509\n\n\n\n0.0081\n\n\n\n0.0297\n\n\n\n0.1842\n\n\n\n0.0484\n\n\n\n0.1328\n\n
\n\nMMSSL (Pi et al., 2020)\n\n\n\n0.0553\n\n\n\n0.0055\n\n\n\n0.0299\n\n\n\n0.2482\n\n\n\n0.0170\n\n\n\n0.1113\n\n
\n\nOurs\n\n\n\n0.0605\n\n\n\n0.0097\n\n\n\n0.0347\n\n\n\n0.2317\n\n\n\n0.0566\n\n\n\n0.1573\n\n
\n\np-value\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
\n
\n
", "perturb_sentence_id": 12, "output": { "perturbed_statement": "[paragraph id = 12] As shown in Table 3 , our method exhibits lesser improvement compared to several strong multi-modal recommendation baselines. This decrease mainly comes from two folds: (1) The metapath-guided neighbors in our method fail to adequately capture user preferences.", "perturbed_explanation": "The original explanation stated that the method showed significant improvement compared to strong multi-modal recommendation baselines due to the contribution of metapath-guided neighbors enhancing the understanding of user preferences. However, the statement now erroneously claims that the method demonstrated lesser improvement in comparison, which contradicts Table 3's content. Furthermore, it incorrectly claims the method's features are insufficient. These statements are not consistent with the data presented." } }, { "path": "table_paper/2407.00056v1.json", "table_id": "4", "section": "6.5", "all_context": [ "Graph-level Ablation: In order to investigate the importance of different metapath neighbors and the effect of graph embedding training, we remove five expanded sequences in turn and evaluate the performance of ablated graph embedding features.", "The results are presented in Table 7 , where we use to represent the removed part or feature.", "For example, means removing the metapath neighbors in recommendation model, denotes removing the learned graph node embedding layers but remaining the expanded sequence and represents removing all features of graph modeling.", "From table 7 , we can observe that drops -0.1100% of AUC and also leads to a significant drop in performance which means that the GIE modeling is a very important supplement to the observed history behaviors.", "This suggests that the explicit metapath-based behavior expansion process and implicit graph node embedding learning are all beneficial to model s performance.", "Furthermore, among five expanded behavior sequences, we observed the metapath of and are the most important sequences among them.", "Multi-modal Ablation: We also investigate the influence of the multi-modal feature in MFQ module.", "Specifically, denotes removing all multi-modal content and represents removing the learnable query and cross attention.", "Table 7 shows that when removing the multi-modal feature MMBee suffers significant performance drops.", "We further study the influence of different modalities and report the ablation results in Table 4 .", "We find visual modality has the most important impact, causing the most performance degradation when removed.", "The speech and comment modality have a lesser impact factor but still show an innegligible effect on the model s overall performance.", "Hyperparameters Ablation: We provide further experiment results about hyperparameters as follows: Dimension of MFQ.", "We compare 32/64/128 dimensions of MFQ on Kuaishou dataset and the speed is tested on 20*Tesla T4 GPUs measured in examples/second.", "Table 5 shows that 64 dimension holds the best trade-off with computation efficiency and accuracy.", "Segment Length.", "We additionally choose 10/20 consecutive live segments and compared them with 5 segments on Kuaishou dataset.", "Table 6 shows that 10 live segments get obvious gain but when it comes to 20 the further gain is modest.", "However, 10 segments significantly increase resource costs (including storage, training and serving) making it infeasible to deploy in production.", "So we use 5 segments in MMBee.", "" ], "target_context_ids": [ 1, 2, 3, 4, 5, 8, 9, 10, 11 ], "selected_paragraphs": [ "[paragraph id = 1] The results are presented in Table 7 , where we use to represent the removed part or feature.", "[paragraph id = 2] For example, means removing the metapath neighbors in recommendation model, denotes removing the learned graph node embedding layers but remaining the expanded sequence and represents removing all features of graph modeling.", "[paragraph id = 3] From table 7 , we can observe that drops -0.1100% of AUC and also leads to a significant drop in performance which means that the GIE modeling is a very important supplement to the observed history behaviors.", "[paragraph id = 4] This suggests that the explicit metapath-based behavior expansion process and implicit graph node embedding learning are all beneficial to model s performance.", "[paragraph id = 5] Furthermore, among five expanded behavior sequences, we observed the metapath of and are the most important sequences among them.", "[paragraph id = 8] Table 7 shows that when removing the multi-modal feature MMBee suffers significant performance drops.", "[paragraph id = 9] We further study the influence of different modalities and report the ablation results in Table 4 .", "[paragraph id = 10] We find visual modality has the most important impact, causing the most performance degradation when removed.", "[paragraph id = 11] The speech and comment modality have a lesser impact factor but still show an innegligible effect on the model s overall performance." ], "table_html": "
\n
Table 7. \nAblation Study on Graph and Mutli-modal level. The number in bold indicates a significant performance degradation.\n
\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
\n\nCategory\n\n\n\nOperator\n\n\n\nAUC\n\n\n\nImpr.\n\n\n\nUAUC\n\n\n\nImpr.\n\n\n\nGAUC\n\n\n\nImpr.\n\n
\n\n-\n\n\n\nSIM\n\n\n\n0.958656\n\n\n\n-0.1646%\n\n\n\n0.732239\n\n\n\n-1.1439%\n\n\n\n0.748383\n\n\n\n-1.2057 %\n\n
Graph\n\n\n\n\n\n0.959842\n\n\n\n-0.0460 %\n\n\n\n0.743492\n\n\n\n-0.0186 %\n\n\n\n0.76014\n\n\n\n-0.0300 %\n\n
\n\n\n\n\n\n0.959706\n\n\n\n-0.0596 %\n\n\n\n0.738322\n\n\n\n-0.5356 %\n\n\n\n0.755081\n\n\n\n-0.5359 %\n\n
\n\n\n\n\n\n0.960162\n\n\n\n-0.0140 %\n\n\n\n0.743248\n\n\n\n-0.0430 %\n\n\n\n0.75976\n\n\n\n-0.0680 %\n\n
\n\n\n\n\n\n0.960002\n\n\n\n-0.0300 %\n\n\n\n0.742931\n\n\n\n-0.0747 %\n\n\n\n0.759818\n\n\n\n-0.0622 %\n\n
\n\n\n\n\n\n0.959462\n\n\n\n-0.0840 %\n\n\n\n0.738378\n\n\n\n-0.5300 %\n\n\n\n0.754722\n\n\n\n-0.5718 %\n\n
\n\n\n\n\n\n0.959782\n\n\n\n-0.0520%\n\n\n\n0.736832\n\n\n\n-0.6846 %\n\n\n\n0.752625\n\n\n\n-0.7815 %\n\n
\n\n\n\n\n\n0.959202\n\n\n\n-0.1100%\n\n\n\n0.735608\n\n\n\n-0.8070 %\n\n\n\n0.751669\n\n\n\n-0.8771 %\n\n
Multi-modal\n\n\n\n\n\n0.959802\n\n\n\n-0.0500 %\n\n\n\n0.738309\n\n\n\n-0.5369 %\n\n\n\n0.755154\n\n\n\n-0.5286 %\n\n
\n\n\n\n\n\n0.960091\n\n\n\n-0.0211%\n\n\n\n0.740996\n\n\n\n-0.2682 %\n\n\n\n0.758021\n\n\n\n-0.2419 %\n\n
\n\n-\n\n\n\nOurs\n\n\n\n0.960302\n\n\n\n0.0000 %\n\n\n\n0.743678\n\n\n\n0.0000 %\n\n\n\n0.76044\n\n\n\n0.0000 %\n\n
\n
\n
", "perturb_sentence_id": 3, "output": { "perturbed_statement": "[paragraph id = 3] From table 7 , we can observe that drops -0.1100% of AUC but does not lead to a significant drop in performance which means that the GIE modeling might not be a very important supplement to the observed history behaviors. This suggests that the explicit metapath-based behavior expansion process and implicit graph node embedding learning have a minimal effect on the model’s performance.", "perturbed_explanation": "The original explanation discusses the significance of the GIE modeling and notes its contributions to enhancing the recommendation model's performance. 1. It demonstrates that the removal of GIE modeling from the process results in a measurable performance decrease (-0.1100% of AUC), highlighting its importance. 2. The altered statement downplays the impact and incorrectly states that GIE modeling might not be crucial despite its observed contributions. This contradicts the evidence, misrepresenting the utility of the GIE modeling approach." } }, { "path": "table_paper/2407.00056v1.json", "table_id": "6", "section": "6.5", "all_context": [ "Graph-level Ablation: In order to investigate the importance of different metapath neighbors and the effect of graph embedding training, we remove five expanded sequences in turn and evaluate the performance of ablated graph embedding features.", "The results are presented in Table 7 , where we use to represent the removed part or feature.", "For example, means removing the metapath neighbors in recommendation model, denotes removing the learned graph node embedding layers but remaining the expanded sequence and represents removing all features of graph modeling.", "From table 7 , we can observe that drops -0.1100% of AUC and also leads to a significant drop in performance which means that the GIE modeling is a very important supplement to the observed history behaviors.", "This suggests that the explicit metapath-based behavior expansion process and implicit graph node embedding learning are all beneficial to model s performance.", "Furthermore, among five expanded behavior sequences, we observed the metapath of and are the most important sequences among them.", "Multi-modal Ablation: We also investigate the influence of the multi-modal feature in MFQ module.", "Specifically, denotes removing all multi-modal content and represents removing the learnable query and cross attention.", "Table 7 shows that when removing the multi-modal feature MMBee suffers significant performance drops.", "We further study the influence of different modalities and report the ablation results in Table 4 .", "We find visual modality has the most important impact, causing the most performance degradation when removed.", "The speech and comment modality have a lesser impact factor but still show an innegligible effect on the model s overall performance.", "Hyperparameters Ablation: We provide further experiment results about hyperparameters as follows: Dimension of MFQ.", "We compare 32/64/128 dimensions of MFQ on Kuaishou dataset and the speed is tested on 20*Tesla T4 GPUs measured in examples/second.", "Table 5 shows that 64 dimension holds the best trade-off with computation efficiency and accuracy.", "Segment Length.", "We additionally choose 10/20 consecutive live segments and compared them with 5 segments on Kuaishou dataset.", "Table 6 shows that 10 live segments get obvious gain but when it comes to 20 the further gain is modest.", "However, 10 segments significantly increase resource costs (including storage, training and serving) making it infeasible to deploy in production.", "So we use 5 segments in MMBee.", "" ], "target_context_ids": [ 14, 15, 16, 17 ], "selected_paragraphs": [ "[paragraph id = 14] Table 5 shows that 64 dimension holds the best trade-off with computation efficiency and accuracy.", "[paragraph id = 15] Segment Length.", "[paragraph id = 16] We additionally choose 10/20 consecutive live segments and compared them with 5 segments on Kuaishou dataset.", "[paragraph id = 17] Table 6 shows that 10 live segments get obvious gain but when it comes to 20 the further gain is modest." ], "table_html": "
\n
Table 5. \nThe influence of dimension of MFQ.\n
\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
\n\nDimension\n\n\n\nSpeed\n\n\n\nFLOPs\n\n\n\nAUC Impr.\n\n
\n\n32\n\n\n\n144.17K\n\n\n\n154.61M\n\n\n\n0.0000%\n\n
\n\n64\n\n\n\n141.76K\n\n\n\n190.27M\n\n\n\n0.1744%\n\n
\n\n128\n\n\n\n132.73K\n\n\n\n229.30M\n\n\n\n0.2105%\n\n
\n
\n
", "perturb_sentence_id": 16, "output": { "perturbed_statement": "[paragraph id = 16] We additionally choose 15/25 consecutive live segments and compared them with 5 segments on Kuaishou dataset.Table 6 shows that 15 live segments get obvious gain but when it comes to 25 the further gain is modest.", "perturbed_explanation": "The original explanation mentions choosing 10/20 live segments for comparison, with 10 segments showing significant improvement, while the increase to 20 segments results in diminishing returns. 2. However, the statement introduces numbers 15/25 in place of 10/20, which is not consistent with the information provided." } }, { "path": "table_paper/2407.00056v1.json", "table_id": "7", "section": "6.5", "all_context": [ "Graph-level Ablation: In order to investigate the importance of different metapath neighbors and the effect of graph embedding training, we remove five expanded sequences in turn and evaluate the performance of ablated graph embedding features.", "The results are presented in Table 7 , where we use to represent the removed part or feature.", "For example, means removing the metapath neighbors in recommendation model, denotes removing the learned graph node embedding layers but remaining the expanded sequence and represents removing all features of graph modeling.", "From table 7 , we can observe that drops -0.1100% of AUC and also leads to a significant drop in performance which means that the GIE modeling is a very important supplement to the observed history behaviors.", "This suggests that the explicit metapath-based behavior expansion process and implicit graph node embedding learning are all beneficial to model s performance.", "Furthermore, among five expanded behavior sequences, we observed the metapath of and are the most important sequences among them.", "Multi-modal Ablation: We also investigate the influence of the multi-modal feature in MFQ module.", "Specifically, denotes removing all multi-modal content and represents removing the learnable query and cross attention.", "Table 7 shows that when removing the multi-modal feature MMBee suffers significant performance drops.", "We further study the influence of different modalities and report the ablation results in Table 4 .", "We find visual modality has the most important impact, causing the most performance degradation when removed.", "The speech and comment modality have a lesser impact factor but still show an innegligible effect on the model s overall performance.", "Hyperparameters Ablation: We provide further experiment results about hyperparameters as follows: Dimension of MFQ.", "We compare 32/64/128 dimensions of MFQ on Kuaishou dataset and the speed is tested on 20*Tesla T4 GPUs measured in examples/second.", "Table 5 shows that 64 dimension holds the best trade-off with computation efficiency and accuracy.", "Segment Length.", "We additionally choose 10/20 consecutive live segments and compared them with 5 segments on Kuaishou dataset.", "Table 6 shows that 10 live segments get obvious gain but when it comes to 20 the further gain is modest.", "However, 10 segments significantly increase resource costs (including storage, training and serving) making it infeasible to deploy in production.", "So we use 5 segments in MMBee.", "" ], "target_context_ids": [ 1, 2, 3, 4, 5 ], "selected_paragraphs": [ "[paragraph id = 1] The results are presented in Table 7 , where we use to represent the removed part or feature.", "[paragraph id = 2] For example, means removing the metapath neighbors in recommendation model, denotes removing the learned graph node embedding layers but remaining the expanded sequence and represents removing all features of graph modeling.", "[paragraph id = 3] From table 7 , we can observe that drops -0.1100% of AUC and also leads to a significant drop in performance which means that the GIE modeling is a very important supplement to the observed history behaviors.", "[paragraph id = 4] This suggests that the explicit metapath-based behavior expansion process and implicit graph node embedding learning are all beneficial to model s performance.", "[paragraph id = 5] Furthermore, among five expanded behavior sequences, we observed the metapath of and are the most important sequences among them." ], "table_html": "
\n
Table 6. \nThe influence of segments length.\n
\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
\n\nLength\n\n\n\nAUC Impr.\n\n\n\nUAUC Impr.\n\n\n\nGUC Impr.\n\n\n\nFLOPs\n\n\n\nSpeed\n\n
\n\n5\n\n\n\n0\n\n\n\n0\n\n\n\n0\n\n\n\n190.27M\n\n\n\n141.76K\n\n
\n\n10\n\n\n\n0.0237%\n\n\n\n0.2037%\n\n\n\n0.2384%\n\n\n\n194.09M\n\n\n\n122.60K\n\n
\n\n20\n\n\n\n0.0733%\n\n\n\n0.2369%\n\n\n\n0.2500%\n\n\n\n203.04M\n\n\n\n108.17K\n\n
\n
\n
", "perturb_sentence_id": 3, "output": { "perturbed_statement": "[paragraph id = 3] From table 7 , we can observe that increases +0.1100% of AUC and does not lead to any performance drop which means that the GIE modeling is not a vital supplement to the observed history behaviors. This suggests that the explicit metapath-based behavior expansion process and implicit graph node embedding learning are not beneficial to the model's performance.", "perturbed_explanation": "Original Explanation: [paragraph id = 3] From table 7, we can observe that drops -0.1100% of AUC and also leads to a significant drop in performance, which means that the GIE modeling is a very important supplement to the observed history behaviors. This suggests that the explicit metapath-based behavior expansion process and implicit graph node embedding learning are all beneficial to the model's performance. 1. The original statement identifies a performance drop and emphasizes the importance of GIE. 2. The perturbed statement misstates the impact, claiming an improvement of +0.1100% AUC and denying the necessity of GIE, which contradicts the observed and documented conclusions." } } ]