[ { "path": "table_paper/2407.00062v1.json", "table_id": "2", "section": "5.1.1", "all_context": [ "Let s say we are rating item i, we go through a node s neighbors and calculate the mean, median, or mode of their ratings for item i (if it exists).", "If no ratings for i exist, we return a random rating.", "Note, this algorithm mutates the underlying data source, whereby a recommended value for a node, can be used by another node in its rating inference.", "The performance of these recommenders can be seen in Table 2 .", "We observe that the mean and median of neighbors were the best algorithms in this range.", "The mode version did not perform as well.", "This is because mode cannot produce decimal-precision ratings it also defaults to randomness when a mode can t be determined.", "" ], "target_context_ids": [ 3, 4, 5, 6 ], "selected_paragraphs": [ "[paragraph id = 3] The performance of these recommenders can be seen in Table 2 .", "[paragraph id = 4] We observe that the mean and median of neighbors were the best algorithms in this range.", "[paragraph id = 5] The mode version did not perform as well.", "[paragraph id = 6] This is because mode cannot produce decimal-precision ratings it also defaults to randomness when a mode can t be determined." ], "table_html": "
\n
Table 2: Mean and Standard Deviation of MAE for Neighbourhood Algorithms
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
Algorithm
Mean of Neighbours1.270.25
Median of Neighbours1.270.19
Mode of Neighbours1.320.23
Universal Random1.300.38
Random1.670.21
\n
", "perturb_sentence_id": [ 3, 4 ], "output": { "perturbed_statement": "[paragraph id = 3] The performance of these recommenders can be seen in Table 2. We observe that the mode version was the best algorithm in this range.", "perturbed_explanation": "1. The original explanation states that the mean and median of neighbors were the best algorithms in this range, as evidenced by Table 2. 2. The statement erroneously claims that the mode version was the best algorithm when, in fact, paragraph 5 states that the mode version did not perform as well, and paragraph 6 provides reasons for the mode's poor performance. Therefore, stating that the mode was the best contradicts the information presented in paragraphs 5 and 6." } }, { "path": "table_paper/2407.00062v1.json", "table_id": "4", "section": "5.1.3", "all_context": [ "We now move to investigate recommendation systems that consider the entire trust graph.", "The Monte-Carlo (MC) Random Walk model works by starting at the node we are trying to produce a rating for and then performing a random walk.", "As the distance from the original node, , increases, so does the probability that a random rating is returned.", "This random walk continues until a node with a rating is reached, or this random rating is produced.", "We run this experiment times, for some fixed , for each node and take the mean rating from the nodes reached via random walk.", "We then extended this algorithm by choosing the neighbor, at each step of the random walk based on a weighted random incorporating the Jaccard index.", "The performance of both MC algorithms, as compared with the others can be seen in Table 4 .", "We observe that the Monte Carlo algorithm outperforms the previous models.", "Interestingly, the Jaccard weighted version performed better.", "This implies that Jaccard indexes seem to hold more valuable information than mere trust connections.", "" ], "target_context_ids": [ 6, 7, 8, 9 ], "selected_paragraphs": [ "[paragraph id = 6] The performance of both MC algorithms, as compared with the others can be seen in Table 4 .", "[paragraph id = 7] We observe that the Monte Carlo algorithm outperforms the previous models.", "[paragraph id = 8] Interestingly, the Jaccard weighted version performed better.", "[paragraph id = 9] This implies that Jaccard indexes seem to hold more valuable information than mere trust connections." ], "table_html": "
\n
Table 4: Mean and Standard Deviation of MAE for Random Walk Algorithms
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
Algorithm
Jaccard Monte-Carlo Random Walk1.140.22
Monte-Carlo Random Walk1.160.23
Median of Neighbours1.270.19
Universal Random1.300.38
Random1.670.21
\n
", "perturb_sentence_id": [ 6, 7 ], "output": { "perturbed_statement": "[paragraph id = 6] The performance of both MC algorithms, as compared with the others can be seen in Table 4. We observe that the Monte Carlo algorithm underperforms compared to the previous models.", "perturbed_explanation": "The original explanation: The statement claimed the Monte Carlo algorithm outperforms the previous models. 1. The Monte Carlo algorithm is noted for its effective performance in the context given. 2. The statement incorrectly claims that the Monte Carlo algorithm underperforms, which contradicts the context suggesting otherwise. This change implies that the algorithm is less effective, while the context suggests it actually performs better, thus making the statement incorrect." } }, { "path": "table_paper/2407.00062v1.json", "table_id": "5", "section": "5.1.4", "all_context": [ "These recommenders work by considering every other node, which has rated item i, in the graph.", "We then perform a summation of the Jaccard values for every node corresponding to each rating value.", "In the Majority of Majorities version, we then take the rating that has received the highest contribution of Jaccard scores.", "However, in the Weighted Average version, we instead take the weighted average of these ratings, weighted by the contribution of Jaccard scores.", "We will call these techniques JMoM and WA moving forward.", "(These methods are inspired by the popular majority based opinion diffusion models, cf.", "[38 , 39 , 40 ].)", "As can be observed from Table 5 , the JMoM performed reasonably well.", "Its mean accuracy is better than the random walk.", "However, it was less robust.", "The WA version performed very well significantly beating both the JMoM version and the random walk.", "By overcoming the constraint of only considering direct neighbors, the JMoM and Jaccard WA models can produce decent results, with a very simple algorithm.", "Note that these approaches can be used with any user-to-user similarity metric.", "" ], "target_context_ids": [ 7, 8, 9, 10, 11 ], "selected_paragraphs": [ "[paragraph id = 7] As can be observed from Table 5 , the JMoM performed reasonably well.", "[paragraph id = 8] Its mean accuracy is better than the random walk.", "[paragraph id = 9] However, it was less robust.", "[paragraph id = 10] The WA version performed very well significantly beating both the JMoM version and the random walk.", "[paragraph id = 11] By overcoming the constraint of only considering direct neighbors, the JMoM and Jaccard WA models can produce decent results, with a very simple algorithm." ], "table_html": "
\n
Table 5: Mean and Standard Deviation of MAE for Full-graph Jaccard Algorithms
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
Algorithm
Jaccard WA1.050.25
Jaccard Monte-Carlo Random Walk1.140.22
Jaccard MoM1.130.30
Universal Random1.300.38
Random1.670.21
\n
", "perturb_sentence_id": [ 7, 8 ], "output": { "perturbed_statement": "[paragraph id = 7] As can be observed from Table 5, the JMoM performed poorly, with its mean accuracy being worse than the random walk.", "perturbed_explanation": "1. The original statement asserts that the JMoM performed reasonably well and had a better mean accuracy than the random walk. 2. The statement is incorrect because it claims that the JMoM performed poorly, with its mean accuracy being worse than the random walk, which contradicts paragraph 11 that indicates the JMoM version, along with the Jaccard WA model, can produce decent results, thereby implying its performance was better than or comparable to the random walk." } }, { "path": "table_paper/2407.00062v1.json", "table_id": "6", "section": "5.2", "all_context": [ "In the Item Jaccard model, we used the Jaccard-Item Index which is based on items rated.", "We also used the weighted average framework approach but with the Item-Rating Difference.", "The performance of these alternative WA models can be seen in Table 6 .", "We can see that the Item-Jaccard form of WA outperforms the previous Jaccard WA.", "Item-Rating Difference doesn t seem to perform very well and is quite inconsistent in its performance.", "A potential reason for the Item-Jaccard WA model performing so well is that a user s social circle may not be the best predictor for who is similar to them, rather relating people by their items themselves does a better job.", "" ], "target_context_ids": [ 2, 4, 5 ], "selected_paragraphs": [ "[paragraph id = 2] The performance of these alternative WA models can be seen in Table 6 .", "[paragraph id = 4] Item-Rating Difference doesn t seem to perform very well and is quite inconsistent in its performance.", "[paragraph id = 5] A potential reason for the Item-Jaccard WA model performing so well is that a user s social circle may not be the best predictor for who is similar to them, rather relating people by their items themselves does a better job." ], "table_html": "
\n
Table 6: Mean and Standard Deviation of MAE for Item Rating Algorithms
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
Algorithm
Item-Jaccard WA1.020.23
Jaccard WA1.050.25
Item-Rating Difference WA1.170.49
Universal Random1.300.38
Random1.670.21
\n
", "perturb_sentence_id": [ 2, 3 ], "output": { "perturbed_statement": "[paragraph id = 2] The performance of these alternative WA models can be seen in Table 6. We can see that the Item-Rating Difference form of WA outperforms the previous Jaccard WA.", "perturbed_explanation": "1. Item-Jaccard WA is mentioned to perform well because it relates people by their items rather than their social circles, which is suggested to be a better method for determining similarity. \n2. The statement is incorrect because it claims that the Item-Rating Difference form of WA outperforms the previous Jaccard WA. However, the context notes that the Item-Rating Difference model does not perform well and is inconsistent, contrary to the claim of outperforming other models." } }, { "path": "table_paper/2407.00062v1.json", "table_id": "7", "section": "5.3", "all_context": [ "The Intra-Item Information concerns itself with the relationships between items themselves.", "In a similar fashion to the previous algorithms, we manipulated the WA approach to work with Intra-Item similarity data.", "This is slightly different from the WA framework as we iterate through each user s items rather than using a user-user similarity metric.", "The performance of the algorithms, one based on and one based on the Pearson , can be seen in Table 7 .", "It is evident both of the intra-item-based models underperform our previous models.", "The intra-item similarity using the Jaccard approach outperforms the Pearson correlation approach.", "This is a very interesting result, potentially this could be a better metric for determining the similarity of items and could be applied in [16 ].", "It is evident the intra-item similarity conveys some information as we perform better than random, hence this could be useful in combination models in the coming sections.", "" ], "target_context_ids": [ 3, 4, 5, 6, 7 ], "selected_paragraphs": [ "[paragraph id = 3] The performance of the algorithms, one based on and one based on the Pearson , can be seen in Table 7 .", "[paragraph id = 4] It is evident both of the intra-item-based models underperform our previous models.", "[paragraph id = 5] The intra-item similarity using the Jaccard approach outperforms the Pearson correlation approach.", "[paragraph id = 6] This is a very interesting result, potentially this could be a better metric for determining the similarity of items and could be applied in [16 ].", "[paragraph id = 7] It is evident the intra-item similarity conveys some information as we perform better than random, hence this could be useful in combination models in the coming sections." ], "table_html": "
\n
Table 7: Mean and Standard Deviation of MAE for Intra-Item Algorithms
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
Algorithm
Item-Jaccard WA1.020.23
Intra-Item WA1.240.14
Universal Random1.300.38
Intra-Item WA (Pearson)1.340.13
Random1.670.21
\n
", "perturb_sentence_id": [ 6, 7 ], "output": { "perturbed_statement": "[paragraph id = 6] This is a very interesting result, potentially this could be a better metric for determining the correlation of items and could be applied in [16 ]. It is evident the intra-item similarity conveys no useful information as we perform worse than random, hence this could be useful in combination models in the coming sections.", "perturbed_explanation": "1. The original statement suggests that the intra-item similarity could be a better metric for determining the similarity of items, indicating that it provides valuable information beyond random results. \n2. The statement is incorrect because it claims that the intra-item similarity conveys no useful information and performs worse than random. However, the context states that intra-item similarity performs better than random, demonstrating its potential usefulness, and specifically mentions that the Jaccard approach outperforms the Pearson correlation approach." } }, { "path": "table_paper/2407.00062v1.json", "table_id": "9", "section": "6.2", "all_context": [ "This model combines the Intra-Item graph with the Item graph.", "Thus combing information about how nodes rated individual items and how the items are interrelated.", "The general idea was to slightly alter the item rating similarity metric, , to provide a bias taking into account the similarity of items to the item we are recommending.", "Note for intra-item similarity we use rather than as we found it was more performant in the previous section.", "The results can be seen in Table 9 : Though the WIRD model performed quite well still not better than the Item-Jaccard WA model.", "However, it performs better than just the intra-item information or item-rating difference WA alone.", "Hence, we have successfully combined the intra-item and item-rating information to achieve a better result.", "" ], "target_context_ids": [ 4, 5, 6 ], "selected_paragraphs": [ "[paragraph id = 4] The results can be seen in Table 9 : Though the WIRD model performed quite well still not better than the Item-Jaccard WA model.", "[paragraph id = 5] However, it performs better than just the intra-item information or item-rating difference WA alone.", "[paragraph id = 6] Hence, we have successfully combined the intra-item and item-rating information to achieve a better result." ], "table_html": "
\n
Table 9: Mean and Standard Deviation of MAE for WIRD WA Algorithm
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
Algorithm
WIRD WA1.050.28
Item-Jaccard WA1.020.23
Intra-Item WA1.340.13
Item-Rating Difference WA1.170.49
Universal Random1.300.38
Random1.670.21
\n
", "perturb_sentence_id": [ 4, 5 ], "output": { "perturbed_statement": "[paragraph id = 4] The results can be seen in Table 9 : The WIRD model performed worse than both the intra-item information and item-rating difference WA alone. However, it performs better than the Item-Jaccard WA model.", "perturbed_explanation": "1. The original explanation suggests comparing the performance of different models, indicating that the WIRD model, despite performing well, did better than models using just intra-item information or item-rating difference WA. 2. The statement incorrectly claims that the WIRD model performed worse than both intra-item information and item-rating difference WA, which contradicts the fact that the combination of intra-item and item-rating info achieved a better result. This alteration misrepresents the comparative performance outcomes of the WIRD model." } }, { "path": "table_paper/2407.00062v1.json", "table_id": "11", "section": "9", "all_context": [ "We successfully broke down the concept of a recommender system to its core principles, starting with the information it ingests.", "Through breaking down the different information forms, we noticed that the most important feature when assigning the similarity between two users for their preferences is the way they ve rated other items and which items they have engaged with.", "The trust and intra-item-based algorithms led to recommenders with higher MAE as seen in Table 11 .", "All other models included the item-rating information in some way, which had significantly lower MAE.", "This result makes logical sense as it s more likely we are similar in opinion to a person who we share interests with than a person who we share friends with.", "However, the problem with item-rating information is the information that we lack when cold-starting a user.", "In the challenging case of cold start users, the algorithms that only use the Trust and Intra-Item information would be very beneficial, but these algorithms, unsuperisingly, perform not as well.", "Among these, we found that the Jaccard WA approach provided the best performance, outperforming random walk and intra-item-based approaches.", "A further idea that could be explored is simply using an approach like the Universal Random, which was one of our baseline recommenders.", "This recommender performed reasonably when averaged across our data sets and predicts preferences based on the distribution of current ratings for said item.", "While combining information, we found it wasn t easy to outperform the Item-Jaccard-based recommender.", "The one case where we managed to improve upon the base Item-Jaccard model was when we performed addition with the Jaccard Index.", "So the similarity between the two users is based on the sum of the Jaccard Index and Item-Jaccard Index.", "We think that the performance of the recommender is increased as the Jaccard provides some marginal information which can impact the ratings in the case when a user trusts a celebrity.", "That way the user has a non-zero Jaccard index with many other nodes who also trust said celebrity, hence, slightly shifting the rating prediction in the direction of these users.", "We attempted combination models of various forms, combining the similarity scores with multiplication, summation, and a maximum function.", "However, we discovered that scaling the distributions had minimal impact on the resulting performance.", "We also concluded that simply using an equally weighted summation of the two scores leads to the simplest and most effective model.", "Hence, the highest-performing recommendation model combined trust information and item rating information.", "Though, it is not suitable for cold-start users.", "We also found that recommenders based on the WA framework outperformed random-walk methods.", "Furthermore, the WA framework outperforms a similar approach called Majority of Majorities.", "We believe this approach to collaborative filtering is effective because it is not constrained by the structure of the graph (as is a random walk) and can produce fine-grained recommendations that can be non-integer values.", "Whereas, the MoM approach forces a specific integer rating to be assigned and hence is not as representative of the user s preference.", "When it came to the intra-item information, we found it performed quite poorly from the perspective of .", "However, an interesting observation, as per Table 11 , was that the algorithms that included intra-item information were the most consistent in their performance—featuring the lowest across all data sets.", "Thus, it can be deduced that the intra-item information is additive from a stability perspective, making a recommender perform with similar accuracy for all users.", "This idea was reinforced when we created the fully combined model, which utilized all information types.", "This was the “Jaccard Item-Jaccard JII Combination WA” model.", "We found this model performed worse than the Jaccard Item-Jaccard WA model in , however, it performed better in mean .", "Hence, in a scenario where providing consistently good predictions for all users is of importance, the introduction of intra-item information could facilitate this.", "Through our experimentation with opposing intra-item similarity metrics, we determined that the Intra-Item Jaccard approach outperformed the Pearson Correlation metric on downstream tasks.", "This was determined by building models upon these metrics and comparing the resulting accuracy.", "When further testing with the Intra-Item Jaccard similarity metric was undertaken, the resulting models performed better than random, and in combination led to stable models as described earlier.", "A recurring theme in these experiments was the very impressive performance of the Jaccard Index when used in a variety of applications.", "It was the go-to for drawing similarity scores using all information forms and is a simple but logical way to reason about the similarity of sets.", "Testing the performance of our algorithms on an adversarial network provided some interesting insights.", "The most obvious of which is that item rating information is highly susceptible to an attack with fake accounts.", "The trust graph is relatively robust to adversaries, as one must influence individuals to shift the dynamics of this information.", "This is represented in the data by the recommendation systems involving the item rating information suffering from the most severe reductions in their MAE.", "We found that the least impacted algorithm was the MC Random Walk.", "As the algorithm propagates from a user, through the network to derive the rating, it will arrive at a legitimate node with a rating before venturing into the adversaries.", "The only case where the adversaries have an impact is when no ratings are found and the randomness means a rating is taken uniformly from the network.", "In these cases, the volume of ratings from adversaries impacts results.", "The next best algorithm for being robust in the face of adversaries was Jaccard WA.", "This is a Weighted Average based entirely on the Jaccard Index.", "This algorithm performs better due to the value it places on the trust graph, very rarely do nodes have a non-zero Jaccard similarity with the bad actors, hence the high performance.", "Future Work: In investigating intra-item information, we found that the ”Intra-Item Jaccard” metric was more effective than Pearson Correlation in determining item similarity.", "This result suggests further exploration of the Trust-Walker [16 ], which may lead to improved performance.", "Further research is needed to improve the computational efficiency of the WA approach.", "Unlike, random-walk methods, the WA-based recommenders require iterating over all nodes in the graph.", "Optimizations such as vectorization or caching similarities between users could be explored.", "Furthermore, this work focused on fake accounts and fake ratings, but other attacks such as bribing popular nodes or adversarial censorship should be investigated in future studies.", "" ], "target_context_ids": [ 2, 3, 7, 25, 26, 27, 28, 29, 30 ], "selected_paragraphs": [ "[paragraph id = 2] The trust and intra-item-based algorithms led to recommenders with higher MAE as seen in Table 11 .", "[paragraph id = 3] All other models included the item-rating information in some way, which had significantly lower MAE.", "[paragraph id = 7] Among these, we found that the Jaccard WA approach provided the best performance, outperforming random walk and intra-item-based approaches.", "[paragraph id = 25] However, an interesting observation, as per Table 11 , was that the algorithms that included intra-item information were the most consistent in their performance—featuring the lowest across all data sets.", "[paragraph id = 26] Thus, it can be deduced that the intra-item information is additive from a stability perspective, making a recommender perform with similar accuracy for all users.", "[paragraph id = 27] This idea was reinforced when we created the fully combined model, which utilized all information types.", "[paragraph id = 28] This was the “Jaccard Item-Jaccard JII Combination WA” model.", "[paragraph id = 29] We found this model performed worse than the Jaccard Item-Jaccard WA model in , however, it performed better in mean .", "[paragraph id = 30] Hence, in a scenario where providing consistently good predictions for all users is of importance, the introduction of intra-item information could facilitate this." ], "table_html": "
\n
Table 11: Mean and Standard Deviation of MAE for All Algorithms on Different Data Sets
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
AlgorithmEpinionsFilmTrustCiaoDVD
Jaccard Item-Jaccard WA1.000.260.660.080.530.28
Item-Jaccard WA1.020.230.670.080.580.32
Jaccard WA1.050.251.140.081.730.36
WIRD WA1.050.280.670.040.720.37
Jaccard Item-Jaccard JII Combination WA1.070.220.690.060.630.30
JWIRD WA1.090.270.670.070.720.40
Jaccard MoM1.130.301.190.091.770.37
Jaccard Monte-Carlo Random Walk1.140.221.200.081.810.49
Monte-Carlo Random Walk1.160.231.200.081.820.35
Item-Rating Difference WA1.170.490.670.080.790.41
Jaccard Intra-Item WA1.200.131.070.061.750.32
Intra-Item WA1.240.141.180.081.700.45
Median of Neighbours1.270.191.260.081.760.28
Mean of Neighbours1.270.251.250.071.670.43
Universal Random1.300.380.890.110.720.58
Jaccard Weighted Neighbours1.310.321.220.101.660.44
Mode of Neighbours1.320.231.230.091.670.43
Intra-Item WA (Pearson)1.340.131.210.071.670.50
Random1.670.211.340.071.820.31
\n
", "perturb_sentence_id": [ 25, 26 ], "output": { "perturbed_statement": "[paragraph id = 25] However, an interesting observation, as per Table 11, was that the algorithms that included intra-item information were the least consistent in their performance—featuring the highest MAE across all data sets. Thus, it can be deduced that the intra-item information is detrimental from a stability perspective, making a recommender perform with varying accuracy for all users.", "perturbed_explanation": "1. The original explanation mentions that the algorithms that included intra-item information were the most consistent in their performance, featuring the lowest MAE across all data sets. This suggests that intra-item information contributes positively to stability and accuracy for all users. 2. The statement incorrectly claims that the algorithms featuring intra-item information were the least consistent, with the highest MAE across all data sets, and that the intra-item information is detrimental. This directly contradicts the actual observation made in the context, which states that intra-item information is beneficial for stability and accuracy in recommendations." } } ]