[ { "path": "table_paper/2407.00056v1.json", "table_id": "1", "section": "6.1.2", "all_context": [ "To prove the effectiveness of our proposed MFQ and GIE module, we also compare our method on two public short video recommendation datasets: TikTok and MovieLens.", "The statistics of datasets are shown in Table 1 .", "Movielens333https://grouplens.org/datasets/movielens/ (Harper and Konstan, 2015 ) is a widely used dataset (Sun et al., 2019 ; Kang and McAuley, 2018 ; Sankar et al., 2020 ; Wu et al., 2022 ) for the recommendation task.", "The raw data is initially acquired by collecting movie descriptions from Movielens-10M and crawling the corresponding trailers from YouTube.", "Textual features are subsequently extracted from the descriptions using the Sentence2Vector (Arora et al., 2017 ).", "For visual modality, key frames are initially extracted from the retrieved videos and then processed by a pre-trained ResNet50 model [9] to obtain visual features.", "The acoustic features are obtained using VGGish [12], following a soundtrack separation procedure implemented with the FFmpeg software.", "TikTok444http://ai-lab-challenge.bytedance.com/tce/vc/ is published by TikTok, a micro-video sharing platform that enables users to create and share micro-videos with durations ranging from 3 to 15 seconds.", "TikTok comprises users, micro-videos, and their interactions, such as clicks.", "The features of the micro-videos in each modality are extracted and made available without providing the raw data.", "Specifically, the textual characteristics are extracted from the micro-video captions provided by users.", "" ], "target_context_ids": [ 1 ], "selected_paragraphs": [ "[paragraph id = 1] The statistics of datasets are shown in Table 1 ." ], "table_html": "
\n
Table 1. \nThe statistics of public datasets. V, A, and T represent the dimensions of visual, acoustic and textual features.\n
\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
\n\nDataset\n\n\n\n#Interactions\n\n\n\n#Items\n\n\n\n#Users\n\n\n\nSparsity\n\n\n\nV\n\n\n\nA\n\n\n\nT\n\n
\n\nTiktok\n\n\n\n726,065\n\n\n\n76,085\n\n\n\n36,656\n\n\n\n99.97%\n\n\n\n128\n\n\n\n128\n\n\n\n128\n\n
\n\nMovielens\n\n\n\n1,239,508\n\n\n\n5,986\n\n\n\n55,485\n\n\n\n99.63%\n\n\n\n2048\n\n\n\n128\n\n\n\n100\n\n
\n
\n
", "perturb_sentence_id": [ 1, 2 ], "output": { "perturbed_statement": "[paragraph id = 1] The statistics of datasets are shown in Table 1 .Movielens333https://grouplens.org/datasets/movielens/ (Harper and Konstan, 2015 ) is a dataset primarily used for training machine learning models for image recognition tasks.", "perturbed_explanation": "1. Movielens is a widely used dataset specifically designed for recommendation tasks, where users are provided with suggestions based on their previous interactions or preferences. These recommendation tasks include applications like movie recommendations based on user ratings and behavior.\n2. The statement incorrectly describes the Movielens dataset as being used primarily for training machine learning models for image recognition tasks. However, Movielens is not related to image recognition but is used in recommendation systems, as detailed in the original statement." } }, { "path": "table_paper/2407.00056v1.json", "table_id": "2", "section": "6.4", "all_context": [ "Table 2 shows the performance of all models on the Kuaishou dataset.", "Note that given the large number of users and samples in Kuaishou dataset, an improvement of 0.5% in AUC, UAUC, and GAUC during offline evaluation holds significant value to bring obvious online gains for business.", "Table 3 presents the performance of several competitors on public Tiktok and Movielens datasets.", "First, our method surpasses all baselines by a significant margin on Kuaishou dataset.", "Our method MFQ significantly outperforms traditional live streaming recommendation models BDR and MTA in UAUC and GAUC for two main reasons.", "Firstly, BDR ignores the modeling of multi-modal content, while MTA lacks the connection to distinctive characteristics across various types of authors.", "In contrast, our MFQ successfully leverages the multi-modal content of the target live-streaming room and adopts learnable queries to extract streamer-aware content patterns.", "Additionally, our method GIE also outperforms the graph-based method EgoFusion which provides evidence that the metapath-guided behavior expansion process greatly enhances behavior representation and explores potential donation preferences.", "Secondly, our method exhibits generalizability to a common behavior-based model.", "Our method has seamlessly integrated into two widely used behavior-based methods, MMoE and SIM, both of which demonstrate significant performance improvements.", "Moreover, MMBee is not limited to these two behavior-based models and can be easily adapted to other methods such as DIN (Zhou et al., 2018 ) and DIEN (Zhou et al., 2019 ) as well.", "Thirdly, our method is not restricted to gifting prediction tasks and it also proves effectiveness in multi-modal recommendation tasks.", "As shown in Table 3 , our method exhibits great improvement when compared to several strong multi-modal recommendation baselines.", "This gain mainly comes from two folds: (1) The metapath-guided neighbors in our method enable better capture of user preferences, but other graph-based methods only rely on implicit learning from graph embeddings.", "(2) The MFQ module enhances the fusion of multi-modal features from short videos and clusters different videos with learnable queries initialized with item embedding, thereby benefiting further performance improvement of the recommendation model.", "" ], "target_context_ids": [ 0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11 ], "selected_paragraphs": [ "[paragraph id = 0] Table 2 shows the performance of all models on the Kuaishou dataset.", "[paragraph id = 1] Note that given the large number of users and samples in Kuaishou dataset, an improvement of 0.5% in AUC, UAUC, and GAUC during offline evaluation holds significant value to bring obvious online gains for business.", "[paragraph id = 3] First, our method surpasses all baselines by a significant margin on Kuaishou dataset.", "[paragraph id = 4] Our method MFQ significantly outperforms traditional live streaming recommendation models BDR and MTA in UAUC and GAUC for two main reasons.", "[paragraph id = 5] Firstly, BDR ignores the modeling of multi-modal content, while MTA lacks the connection to distinctive characteristics across various types of authors.", "[paragraph id = 6] In contrast, our MFQ successfully leverages the multi-modal content of the target live-streaming room and adopts learnable queries to extract streamer-aware content patterns.", "[paragraph id = 7] Additionally, our method GIE also outperforms the graph-based method EgoFusion which provides evidence that the metapath-guided behavior expansion process greatly enhances behavior representation and explores potential donation preferences.", "[paragraph id = 8] Secondly, our method exhibits generalizability to a common behavior-based model.", "[paragraph id = 9] Our method has seamlessly integrated into two widely used behavior-based methods, MMoE and SIM, both of which demonstrate significant performance improvements.", "[paragraph id = 10] Moreover, MMBee is not limited to these two behavior-based models and can be easily adapted to other methods such as DIN (Zhou et al., 2018 ) and DIEN (Zhou et al., 2019 ) as well.", "[paragraph id = 11] Thirdly, our method is not restricted to gifting prediction tasks and it also proves effectiveness in multi-modal recommendation tasks." ], "table_html": "
\n
Table 2. \nPerformances of different methods on Kuaishou dataset. represents the absolute improvement.\n
\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
MethodsGTR
\n\nAUC\n\n\n\nImpr.\n\n\n\nUAUC\n\n\n\nImpr.\n\n\n\nGAUC\n\n\n\nImpr.\n\n
\n\nMMoE (Ma et al., 2018)\n\n\n\n0.956230\n\n\n\n-\n\n\n\n0.730186\n\n\n\n-\n\n\n\n0.746711\n\n\n\n-\n\n
\n\nMMoE+BDR (Zhang et al., 2021)\n\n\n\n0.956908\n\n\n\n+0.0678 %\n\n\n\n0.730625\n\n\n\n+0.0439 %\n\n\n\n0.747136\n\n\n\n+0.0425 %\n\n
\n\nMMoE+MTA (Xi et al., 2023)\n\n\n\n0.957095\n\n\n\n+0.0865 %\n\n\n\n0.731450\n\n\n\n+0.1264 %\n\n\n\n0.747327\n\n\n\n+0.0616 %\n\n
\n\nMMoE+EgoFusion (Chen et al., 2022)\n\n\n\n0.956952\n\n\n\n+0.0722 %\n\n\n\n0.731418\n\n\n\n+0.1232 %\n\n\n\n0.747275\n\n\n\n+0.0564 %\n\n
\n\nMMoE+MFQ\n\n\n\n0.956902\n\n\n\n+0.0672 %\n\n\n\n0.731975\n\n\n\n+0.1789 %\n\n\n\n0.747275\n\n\n\n+0.1764 %\n\n
\n\nMMoE+GIE\n\n\n\n0.957064\n\n\n\n+0.0834 %\n\n\n\n0.733853\n\n\n\n+0.3667 %\n\n\n\n0.751239\n\n\n\n+0.4528 %\n\n
\n\nMMoE+Ours(MFQ+GIE)\n\n\n\n0.95723\n\n\n\n+0.1001 %\n\n\n\n0.735776\n\n\n\n+0.5590 %\n\n\n\n0.753017\n\n\n\n+0.6306 %\n\n
\n\nSIM (Pi et al., 2020)\n\n\n\n0.958656\n\n\n\n-\n\n\n\n0.732239\n\n\n\n-\n\n\n\n0.748383\n\n\n\n-\n\n
\n\nSIM+BDR (Zhang et al., 2021)\n\n\n\n0.958419\n\n\n\n-0.0237 %\n\n\n\n0.734757\n\n\n\n+0.2518 %\n\n\n\n0.750684\n\n\n\n+0.2301 %\n\n
\n\nSIM+MTA (Xi et al., 2023)\n\n\n\n0.958867\n\n\n\n+0.0211 %\n\n\n\n0.734921\n\n\n\n+0.2682 %\n\n\n\n0.750802\n\n\n\n+0.2419 %\n\n
\n\nSIM+EgoFusion (Chen et al., 2022)\n\n\n\n0.959387\n\n\n\n+0.0085 %\n\n\n\n0.735608\n\n\n\n+0.3369 %\n\n\n\n0.751669\n\n\n\n+0.3286 %\n\n
\n\nSIM+MFQ\n\n\n\n0.959202\n\n\n\n+0.0546 %\n\n\n\n0.735717\n\n\n\n+0.3478 %\n\n\n\n0.751780\n\n\n\n+0.3397 %\n\n
\n\nSIM+GIE\n\n\n\n0.959802\n\n\n\n+0.1146 %\n\n\n\n0.738309\n\n\n\n+0.6070 %\n\n\n\n0.755154\n\n\n\n+0.6771 %\n\n
\n\nSIM+Ours(MFQ+GIE)\n\n\n\n0.960302\n\n\n\n+0.1646 %\n\n\n\n0.743678\n\n\n\n+1.1439 %\n\n\n\n0.76044\n\n\n\n+1.2057 %\n\n
\n\np-value\n\n
\n
\n
", "perturb_sentence_id": [ 10, 11 ], "output": { "perturbed_statement": "[paragraph id = 10] Moreover, MMBee is not limited to these two behavior-based models and can be easily adapted to other methods such as DIN (Zhou et al., 2018 ) and DIEN (Zhou et al., 2019 ) as well. Thirdly, our method is specifically restricted to gifting prediction tasks and shows no effectiveness in multi-modal recommendation tasks.", "perturbed_explanation": "The original explanation highlights that the method is not restricted to gifting prediction tasks and is effective in multi-modal recommendation tasks, demonstrating its versatility. 2. The statement incorrectly claims that the method is specifically restricted to gifting prediction tasks and shows no effectiveness in multi-modal recommendation tasks. This contradicts the provided context, suggesting limitations that are not supported by the described capabilities of the method." } }, { "path": "table_paper/2407.00056v1.json", "table_id": "3", "section": "6.4", "all_context": [ "Table 2 shows the performance of all models on the Kuaishou dataset.", "Note that given the large number of users and samples in Kuaishou dataset, an improvement of 0.5% in AUC, UAUC, and GAUC during offline evaluation holds significant value to bring obvious online gains for business.", "Table 3 presents the performance of several competitors on public Tiktok and Movielens datasets.", "First, our method surpasses all baselines by a significant margin on Kuaishou dataset.", "Our method MFQ significantly outperforms traditional live streaming recommendation models BDR and MTA in UAUC and GAUC for two main reasons.", "Firstly, BDR ignores the modeling of multi-modal content, while MTA lacks the connection to distinctive characteristics across various types of authors.", "In contrast, our MFQ successfully leverages the multi-modal content of the target live-streaming room and adopts learnable queries to extract streamer-aware content patterns.", "Additionally, our method GIE also outperforms the graph-based method EgoFusion which provides evidence that the metapath-guided behavior expansion process greatly enhances behavior representation and explores potential donation preferences.", "Secondly, our method exhibits generalizability to a common behavior-based model.", "Our method has seamlessly integrated into two widely used behavior-based methods, MMoE and SIM, both of which demonstrate significant performance improvements.", "Moreover, MMBee is not limited to these two behavior-based models and can be easily adapted to other methods such as DIN (Zhou et al., 2018 ) and DIEN (Zhou et al., 2019 ) as well.", "Thirdly, our method is not restricted to gifting prediction tasks and it also proves effectiveness in multi-modal recommendation tasks.", "As shown in Table 3 , our method exhibits great improvement when compared to several strong multi-modal recommendation baselines.", "This gain mainly comes from two folds: (1) The metapath-guided neighbors in our method enable better capture of user preferences, but other graph-based methods only rely on implicit learning from graph embeddings.", "(2) The MFQ module enhances the fusion of multi-modal features from short videos and clusters different videos with learnable queries initialized with item embedding, thereby benefiting further performance improvement of the recommendation model.", "" ], "target_context_ids": [ 2, 12, 13 ], "selected_paragraphs": [ "[paragraph id = 2] Table 3 presents the performance of several competitors on public Tiktok and Movielens datasets.", "[paragraph id = 12] As shown in Table 3 , our method exhibits great improvement when compared to several strong multi-modal recommendation baselines.", "[paragraph id = 13] This gain mainly comes from two folds: (1) The metapath-guided neighbors in our method enable better capture of user preferences, but other graph-based methods only rely on implicit learning from graph embeddings." ], "table_html": "
\n
Table 3. \nPerformances of different methods on Tiktok and Movielens datasets.
\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
MethodsTikTokMovielens
\n\nRecall@10\n\n\n\nPrecision@10\n\n\n\nNDCG@10\n\n\n\nRecall@10\n\n\n\nPrecision@10\n\n\n\nNDCG@10\n\n
\n\nNGCF (Wang et al., 2019)\n\n\n\n0.0292\n\n\n\n0.0045\n\n\n\n0.0156\n\n\n\n0.1198\n\n\n\n0.0289\n\n\n\n0.0750\n\n
\n\nLightGCN (He et al., 2020)\n\n\n\n0.0448\n\n\n\n0.0082\n\n\n\n0.0261\n\n\n\n0.1992\n\n\n\n0.0479\n\n\n\n0.1324\n\n
\n\nMMGCN (Wei et al., 2019)\n\n\n\n0.0544\n\n\n\n0.0089\n\n\n\n0.0297\n\n\n\n0.2028\n\n\n\n0.0506\n\n\n\n0.1361\n\n
\n\nGRCN (Wei et al., 2020)\n\n\n\n0.0392\n\n\n\n0.0065\n\n\n\n0.0221\n\n\n\n0.1402\n\n\n\n0.0338\n\n\n\n0.0882\n\n
\n\nEgoGCN (Chen et al., 2022)\n\n\n\n0.0569\n\n\n\n0.0093\n\n\n\n0.0330\n\n\n\n0.2155\n\n\n\n0.0524\n\n\n\n0.1444\n\n
\n\nDIN (Zhou et al., 2018)\n\n\n\n0.0403\n\n\n\n0.0074\n\n\n\n0.0235\n\n\n\n0.1372\n\n\n\n0.0330\n\n\n\n0.0912\n\n
\n\nSASRec (Kang and McAuley, 2018)\n\n\n\n0.0435\n\n\n\n0.0043\n\n\n\n0.0215\n\n\n\n0.1914\n\n\n\n0.0191\n\n\n\n0.1006\n\n
\n\nSIM (Pi et al., 2020)\n\n\n\n0.0413\n\n\n\n0.0079\n\n\n\n0.0245\n\n\n\n0.1470\n\n\n\n0.0429\n\n\n\n0.1011\n\n
\n\nMMMLP (Liang et al., 2023)\n\n\n\n0.0509\n\n\n\n0.0081\n\n\n\n0.0297\n\n\n\n0.1842\n\n\n\n0.0484\n\n\n\n0.1328\n\n
\n\nMMSSL (Pi et al., 2020)\n\n\n\n0.0553\n\n\n\n0.0055\n\n\n\n0.0299\n\n\n\n0.2482\n\n\n\n0.0170\n\n\n\n0.1113\n\n
\n\nOurs\n\n\n\n0.0605\n\n\n\n0.0097\n\n\n\n0.0347\n\n\n\n0.2317\n\n\n\n0.0566\n\n\n\n0.1573\n\n
\n\np-value\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
\n
\n
", "perturb_sentence_id": [ 12, 13 ], "output": { "perturbed_statement": "[paragraph id = 12] As shown in Table 3, our method exhibits minimal improvement when compared to several weak multi-modal recommendation baselines. This gain mainly comes from two folds: (1) The metapath-guided neighbors in our method enable better capture of user preferences, but other graph-based methods only rely on implicit learning from graph embeddings.", "perturbed_explanation": "The original explanation states that the method exhibits significant improvement when compared to several strong multi-modal recommendation baselines, emphasizing the effectiveness of the method. It also suggests that the gain stems from the use of metapath-guided neighbors for better user preference capture. 2) The statement falsely claims that the method shows minimal improvement and compares it to weak baselines, misrepresenting the scale of its performance enhancement and the robustness of the competitors it is measured against." } }, { "path": "table_paper/2407.00056v1.json", "table_id": "4", "section": "6.5", "all_context": [ "Graph-level Ablation: In order to investigate the importance of different metapath neighbors and the effect of graph embedding training, we remove five expanded sequences in turn and evaluate the performance of ablated graph embedding features.", "The results are presented in Table 7 , where we use to represent the removed part or feature.", "For example, means removing the metapath neighbors in recommendation model, denotes removing the learned graph node embedding layers but remaining the expanded sequence and represents removing all features of graph modeling.", "From table 7 , we can observe that drops -0.1100% of AUC and also leads to a significant drop in performance which means that the GIE modeling is a very important supplement to the observed history behaviors.", "This suggests that the explicit metapath-based behavior expansion process and implicit graph node embedding learning are all beneficial to model s performance.", "Furthermore, among five expanded behavior sequences, we observed the metapath of and are the most important sequences among them.", "Multi-modal Ablation: We also investigate the influence of the multi-modal feature in MFQ module.", "Specifically, denotes removing all multi-modal content and represents removing the learnable query and cross attention.", "Table 7 shows that when removing the multi-modal feature MMBee suffers significant performance drops.", "We further study the influence of different modalities and report the ablation results in Table 4 .", "We find visual modality has the most important impact, causing the most performance degradation when removed.", "The speech and comment modality have a lesser impact factor but still show an innegligible effect on the model s overall performance.", "Hyperparameters Ablation: We provide further experiment results about hyperparameters as follows: Dimension of MFQ.", "We compare 32/64/128 dimensions of MFQ on Kuaishou dataset and the speed is tested on 20*Tesla T4 GPUs measured in examples/second.", "Table 5 shows that 64 dimension holds the best trade-off with computation efficiency and accuracy.", "Segment Length.", "We additionally choose 10/20 consecutive live segments and compared them with 5 segments on Kuaishou dataset.", "Table 6 shows that 10 live segments get obvious gain but when it comes to 20 the further gain is modest.", "However, 10 segments significantly increase resource costs (including storage, training and serving) making it infeasible to deploy in production.", "So we use 5 segments in MMBee.", "" ], "target_context_ids": [ 1, 2, 3, 4, 5, 8, 9, 10, 11 ], "selected_paragraphs": [ "[paragraph id = 1] The results are presented in Table 7 , where we use to represent the removed part or feature.", "[paragraph id = 2] For example, means removing the metapath neighbors in recommendation model, denotes removing the learned graph node embedding layers but remaining the expanded sequence and represents removing all features of graph modeling.", "[paragraph id = 3] From table 7 , we can observe that drops -0.1100% of AUC and also leads to a significant drop in performance which means that the GIE modeling is a very important supplement to the observed history behaviors.", "[paragraph id = 4] This suggests that the explicit metapath-based behavior expansion process and implicit graph node embedding learning are all beneficial to model s performance.", "[paragraph id = 5] Furthermore, among five expanded behavior sequences, we observed the metapath of and are the most important sequences among them.", "[paragraph id = 8] Table 7 shows that when removing the multi-modal feature MMBee suffers significant performance drops.", "[paragraph id = 9] We further study the influence of different modalities and report the ablation results in Table 4 .", "[paragraph id = 10] We find visual modality has the most important impact, causing the most performance degradation when removed.", "[paragraph id = 11] The speech and comment modality have a lesser impact factor but still show an innegligible effect on the model s overall performance." ], "table_html": "
\n
Table 7. \nAblation Study on Graph and Mutli-modal level. The number in bold indicates a significant performance degradation.\n
\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
\n\nCategory\n\n\n\nOperator\n\n\n\nAUC\n\n\n\nImpr.\n\n\n\nUAUC\n\n\n\nImpr.\n\n\n\nGAUC\n\n\n\nImpr.\n\n
\n\n-\n\n\n\nSIM\n\n\n\n0.958656\n\n\n\n-0.1646%\n\n\n\n0.732239\n\n\n\n-1.1439%\n\n\n\n0.748383\n\n\n\n-1.2057 %\n\n
Graph\n\n\n\n\n\n0.959842\n\n\n\n-0.0460 %\n\n\n\n0.743492\n\n\n\n-0.0186 %\n\n\n\n0.76014\n\n\n\n-0.0300 %\n\n
\n\n\n\n\n\n0.959706\n\n\n\n-0.0596 %\n\n\n\n0.738322\n\n\n\n-0.5356 %\n\n\n\n0.755081\n\n\n\n-0.5359 %\n\n
\n\n\n\n\n\n0.960162\n\n\n\n-0.0140 %\n\n\n\n0.743248\n\n\n\n-0.0430 %\n\n\n\n0.75976\n\n\n\n-0.0680 %\n\n
\n\n\n\n\n\n0.960002\n\n\n\n-0.0300 %\n\n\n\n0.742931\n\n\n\n-0.0747 %\n\n\n\n0.759818\n\n\n\n-0.0622 %\n\n
\n\n\n\n\n\n0.959462\n\n\n\n-0.0840 %\n\n\n\n0.738378\n\n\n\n-0.5300 %\n\n\n\n0.754722\n\n\n\n-0.5718 %\n\n
\n\n\n\n\n\n0.959782\n\n\n\n-0.0520%\n\n\n\n0.736832\n\n\n\n-0.6846 %\n\n\n\n0.752625\n\n\n\n-0.7815 %\n\n
\n\n\n\n\n\n0.959202\n\n\n\n-0.1100%\n\n\n\n0.735608\n\n\n\n-0.8070 %\n\n\n\n0.751669\n\n\n\n-0.8771 %\n\n
Multi-modal\n\n\n\n\n\n0.959802\n\n\n\n-0.0500 %\n\n\n\n0.738309\n\n\n\n-0.5369 %\n\n\n\n0.755154\n\n\n\n-0.5286 %\n\n
\n\n\n\n\n\n0.960091\n\n\n\n-0.0211%\n\n\n\n0.740996\n\n\n\n-0.2682 %\n\n\n\n0.758021\n\n\n\n-0.2419 %\n\n
\n\n-\n\n\n\nOurs\n\n\n\n0.960302\n\n\n\n0.0000 %\n\n\n\n0.743678\n\n\n\n0.0000 %\n\n\n\n0.76044\n\n\n\n0.0000 %\n\n
\n
\n
", "perturb_sentence_id": [ 3, 4 ], "output": { "perturbed_statement": "[paragraph id = 3] From table 7 , we can observe that drops -0.1100% of AUC and also leads to a significant drop in performance which means that the visual modality is a very important supplement to the observed history behaviors.This suggests that the explicit metapath-based behavior expansion process and implicit graph node embedding learning are all beneficial to model s performance.", "perturbed_explanation": "1. The original explanation highlights that GIE (Graph Intelligence Engine) modeling is crucial for supplementing observed history behaviors, indicating that both explicit metapath-based behaviors and implicit graph node embedding learning are beneficial. \n2. The altered statement incorrectly attributes the significant drop in performance to the importance of the visual modality instead of the GIE modeling. The context clearly states that removing GIE modeling affects performance, while table 7 and paragraph details do not indicate any direct relation between the visual modality and the GIE process. In fact, the visual modality's impact is discussed separately, particularly in relation to table 4 and not in table 7." } }, { "path": "table_paper/2407.00056v1.json", "table_id": "6", "section": "6.5", "all_context": [ "Graph-level Ablation: In order to investigate the importance of different metapath neighbors and the effect of graph embedding training, we remove five expanded sequences in turn and evaluate the performance of ablated graph embedding features.", "The results are presented in Table 7 , where we use to represent the removed part or feature.", "For example, means removing the metapath neighbors in recommendation model, denotes removing the learned graph node embedding layers but remaining the expanded sequence and represents removing all features of graph modeling.", "From table 7 , we can observe that drops -0.1100% of AUC and also leads to a significant drop in performance which means that the GIE modeling is a very important supplement to the observed history behaviors.", "This suggests that the explicit metapath-based behavior expansion process and implicit graph node embedding learning are all beneficial to model s performance.", "Furthermore, among five expanded behavior sequences, we observed the metapath of and are the most important sequences among them.", "Multi-modal Ablation: We also investigate the influence of the multi-modal feature in MFQ module.", "Specifically, denotes removing all multi-modal content and represents removing the learnable query and cross attention.", "Table 7 shows that when removing the multi-modal feature MMBee suffers significant performance drops.", "We further study the influence of different modalities and report the ablation results in Table 4 .", "We find visual modality has the most important impact, causing the most performance degradation when removed.", "The speech and comment modality have a lesser impact factor but still show an innegligible effect on the model s overall performance.", "Hyperparameters Ablation: We provide further experiment results about hyperparameters as follows: Dimension of MFQ.", "We compare 32/64/128 dimensions of MFQ on Kuaishou dataset and the speed is tested on 20*Tesla T4 GPUs measured in examples/second.", "Table 5 shows that 64 dimension holds the best trade-off with computation efficiency and accuracy.", "Segment Length.", "We additionally choose 10/20 consecutive live segments and compared them with 5 segments on Kuaishou dataset.", "Table 6 shows that 10 live segments get obvious gain but when it comes to 20 the further gain is modest.", "However, 10 segments significantly increase resource costs (including storage, training and serving) making it infeasible to deploy in production.", "So we use 5 segments in MMBee.", "" ], "target_context_ids": [ 14, 15, 16, 17 ], "selected_paragraphs": [ "[paragraph id = 14] Table 5 shows that 64 dimension holds the best trade-off with computation efficiency and accuracy.", "[paragraph id = 15] Segment Length.", "[paragraph id = 16] We additionally choose 10/20 consecutive live segments and compared them with 5 segments on Kuaishou dataset.", "[paragraph id = 17] Table 6 shows that 10 live segments get obvious gain but when it comes to 20 the further gain is modest." ], "table_html": "
\n
Table 5. \nThe influence of dimension of MFQ.\n
\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
\n\nDimension\n\n\n\nSpeed\n\n\n\nFLOPs\n\n\n\nAUC Impr.\n\n
\n\n32\n\n\n\n144.17K\n\n\n\n154.61M\n\n\n\n0.0000%\n\n
\n\n64\n\n\n\n141.76K\n\n\n\n190.27M\n\n\n\n0.1744%\n\n
\n\n128\n\n\n\n132.73K\n\n\n\n229.30M\n\n\n\n0.2105%\n\n
\n
\n
", "perturb_sentence_id": [ 16, 17 ], "output": { "perturbed_statement": "[paragraph id = 16] We additionally choose 15/25 consecutive live segments and compared them with 5 segments on Kuaishou dataset. Table 6 shows that 15 live segments get obvious gain but when it comes to 25 the further gain is modest.", "perturbed_explanation": "1. The original statement correctly states that 10 and 20 consecutive live segments were compared with 5 segments, resulting in an obvious gain for 10 segments and a modest further gain for 20 segments. \n2. The statement incorrectly claims that 15 and 25 consecutive live segments were compared. Additionally, it states that 15 segments result in an obvious gain and 25 segments result in a modest further gain, which contradicts the actual values of 10 and 20 segments as described in the context." } }, { "path": "table_paper/2407.00056v1.json", "table_id": "7", "section": "6.5", "all_context": [ "Graph-level Ablation: In order to investigate the importance of different metapath neighbors and the effect of graph embedding training, we remove five expanded sequences in turn and evaluate the performance of ablated graph embedding features.", "The results are presented in Table 7 , where we use to represent the removed part or feature.", "For example, means removing the metapath neighbors in recommendation model, denotes removing the learned graph node embedding layers but remaining the expanded sequence and represents removing all features of graph modeling.", "From table 7 , we can observe that drops -0.1100% of AUC and also leads to a significant drop in performance which means that the GIE modeling is a very important supplement to the observed history behaviors.", "This suggests that the explicit metapath-based behavior expansion process and implicit graph node embedding learning are all beneficial to model s performance.", "Furthermore, among five expanded behavior sequences, we observed the metapath of and are the most important sequences among them.", "Multi-modal Ablation: We also investigate the influence of the multi-modal feature in MFQ module.", "Specifically, denotes removing all multi-modal content and represents removing the learnable query and cross attention.", "Table 7 shows that when removing the multi-modal feature MMBee suffers significant performance drops.", "We further study the influence of different modalities and report the ablation results in Table 4 .", "We find visual modality has the most important impact, causing the most performance degradation when removed.", "The speech and comment modality have a lesser impact factor but still show an innegligible effect on the model s overall performance.", "Hyperparameters Ablation: We provide further experiment results about hyperparameters as follows: Dimension of MFQ.", "We compare 32/64/128 dimensions of MFQ on Kuaishou dataset and the speed is tested on 20*Tesla T4 GPUs measured in examples/second.", "Table 5 shows that 64 dimension holds the best trade-off with computation efficiency and accuracy.", "Segment Length.", "We additionally choose 10/20 consecutive live segments and compared them with 5 segments on Kuaishou dataset.", "Table 6 shows that 10 live segments get obvious gain but when it comes to 20 the further gain is modest.", "However, 10 segments significantly increase resource costs (including storage, training and serving) making it infeasible to deploy in production.", "So we use 5 segments in MMBee.", "" ], "target_context_ids": [ 1, 2, 3, 4, 5 ], "selected_paragraphs": [ "[paragraph id = 1] The results are presented in Table 7 , where we use to represent the removed part or feature.", "[paragraph id = 2] For example, means removing the metapath neighbors in recommendation model, denotes removing the learned graph node embedding layers but remaining the expanded sequence and represents removing all features of graph modeling.", "[paragraph id = 3] From table 7 , we can observe that drops -0.1100% of AUC and also leads to a significant drop in performance which means that the GIE modeling is a very important supplement to the observed history behaviors.", "[paragraph id = 4] This suggests that the explicit metapath-based behavior expansion process and implicit graph node embedding learning are all beneficial to model s performance.", "[paragraph id = 5] Furthermore, among five expanded behavior sequences, we observed the metapath of and are the most important sequences among them." ], "table_html": "
\n
Table 6. \nThe influence of segments length.\n
\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
\n\nLength\n\n\n\nAUC Impr.\n\n\n\nUAUC Impr.\n\n\n\nGUC Impr.\n\n\n\nFLOPs\n\n\n\nSpeed\n\n
\n\n5\n\n\n\n0\n\n\n\n0\n\n\n\n0\n\n\n\n190.27M\n\n\n\n141.76K\n\n
\n\n10\n\n\n\n0.0237%\n\n\n\n0.2037%\n\n\n\n0.2384%\n\n\n\n194.09M\n\n\n\n122.60K\n\n
\n\n20\n\n\n\n0.0733%\n\n\n\n0.2369%\n\n\n\n0.2500%\n\n\n\n203.04M\n\n\n\n108.17K\n\n
\n
\n
", "perturb_sentence_id": [ 3, 4 ], "output": { "perturbed_statement": "[paragraph id = 3] From table 7 , we can observe that drops -1.5000% of AUC and also leads to a significant drop in performance which means that the GIE modeling is a very important supplement to the observed history behaviors. This suggests that the explicit metapath-based behavior expansion process and implicit graph node embedding learning are all beneficial to model s performance.", "perturbed_explanation": "Original Explanation: The statement discusses the drop in AUC when a particular feature is removed, highlighting the importance of GIE modeling as a supplement to observed history behaviors. This implies that both metapath-based behavior expansion and graph node embedding learning contribute positively to the model's performance. 1. The statement incorrectly states the drop in AUC as -1.5000% instead of -0.1100%, exaggerating the impact of removing the feature. 2. This incorrect drop value could mislead readers about the significance of the feature's contribution to model performance and the importance of GIE modeling." } } ]