[ { "path": "table_paper/2407.00104v1.json", "table_id": "1", "section": "2.1", "all_context": [ "The entire database was provided by the Dermatology Unit of the “Hospital Universitario Virgen Macarena” and were sent over 2 years from 60 primary care centers.", "The dataset comprises 1559 dermoscopic images divided into 3 subsets.", "Four dermatologists provided different types of annotation according to the subsets.", "Specifically: The first subset consisted of 1089 dermoscopic images.", "Initially, the labeling annotations for these images were the presence or absence of each of the dermoscopic features involved in the diagnosis of BCC.", "A second subset of 334 images is additionally enriched with dermatologist delineations of BCC dermoscopic patterns within each image.", "More than one segmented area may appear on an image if there are multiple patterns in the BCC lesion.", "In the Figure 3 an example is shown.", "The third subset is made up of 136 non-BCC images, mostly consisting of nevus lesions, from the ISIC archive [8 ].", "Table 1 summarizes the distribution of labels in the database.", "As can be seen in this table, the database has a significant class imbalance, with SW and MG underrepresented.", "Several techniques have been used to address this problem.", "" ], "target_context_ids": [ 9, 10 ], "selected_paragraphs": [ "[paragraph id = 9] Table 1 summarizes the distribution of labels in the database.", "[paragraph id = 10] As can be seen in this table, the database has a significant class imbalance, with SW and MG underrepresented." ], "table_html": "
\n
Table 1: Sample distribution for binary and multilabel codification.
\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
Binary codificationMulti-label codification
BCCNon-BCCPigment NetworkUlcerationOvoid NestsMultiglobulesMaple Leaf-likeSpoke WheelArborizing Telangiectasia
775784557385333191244178455
\n
\n
", "perturb_sentence_id": 9, "output": { "perturbed_statement": "[paragraph id = 9] Table 1 summarizes the distribution of labels in the database. As can be seen in this table, the database has a balanced class distribution, with no labels significantly underrepresented.", "perturbed_explanation": "The original explanation: Table 1 highlights the label distribution within the database, showing a significant imbalance with SW and MG labels being underrepresented. The altered statement is incorrect due to the following reasons: 1. It inaccurately claims a balanced class distribution, directly contradicting the information from Table 1. 2. Such misrepresentation could lead to misunderstandings about the data characteristics and their implications." } }, { "path": "table_paper/2407.00104v1.json", "table_id": "2", "section": "2.1.1", "all_context": [ "Each image may contain multiple dermoscopic patterns.", "Therefore, a one-hot coding scheme was used to encode the labels during image annotation and subsequently to process the dermatologists annotations.", "Each image label is a binary word and each BCC dermoscopic pattern is a digit, where means presence and means absence.", "The seven patterns that can appear in a BCC lesion are[4 , 18 , 17 ]: Pigment Network (PN) (negative criterion), Ulceration (U), Ovoid Nests (ON), Multiglobules (MG), Maple Leaf-like (ML), Spoke Wheel (SW), Arborizing Telangiectasia (AT) (Figure 2 ).", "Thus, each label is a vector of dimensions .", "In Table 2 there are some examples of this process.", "" ], "target_context_ids": [ 2, 4, 5 ], "selected_paragraphs": [ "[paragraph id = 2] Each image label is a binary word and each BCC dermoscopic pattern is a digit, where means presence and means absence.", "[paragraph id = 4] Thus, each label is a vector of dimensions .", "[paragraph id = 5] In Table 2 there are some examples of this process." ], "table_html": "
\n
Table 2: Example of multilabel and binary encoding for BCC diagnosis
\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
CodificationMulti-labelBinaryDiagnostic
Example 1[0 1 0 1 1 0 1]1Presence of BCC
Example 2[1 0 0 0 0 0 0]0Absence of BCC
Example 3[0 0 0 0 0 0 0]0Absence of BCC
\n
\n
", "perturb_sentence_id": 5, "output": { "perturbed_statement": "[paragraph id = 5] In Table 5 there are some examples of this process.", "perturbed_explanation": "Original Explanation: Table 2 is identified as containing examples of the discussed process. 2. The statement mistakenly references Table 5 instead, which is not mentioned in the provided context as containing examples." } }, { "path": "table_paper/2407.00104v1.json", "table_id": "3", "section": "3.2", "all_context": [ "This section analyzes the performance of the AI tool for BCC detection in conjunction with the labels provided to explain this classification.", "Table 3 presents metrics that summarize this performance.", "The metrics are averaged over all folds.", "This table has three parts.", "The first part shows the performance of the AI tool in the binary classification.", "The second part shows its performance in detecting BCC dermoscopic patterns.", "Finally, the third part represents the accuracy of the labels that provide the clinical explanation.", "Overall, the BCC/non-BCC diagnostic performance is high, around for all metrics.", "However, the BCC pattern detection performance has to be analysed with a deeper insight.", "Minority classes tend to attain low recall because the AI tool trained with unbalanced databases tends to favor majority classes.", "As shown in Sect.", "2.1 , SW, MG and ML are underrepresented classes.", "Strategies such as data augmentation and advanced sampling, a one-vs-all strategy combined with stratified k-fold cross-validation helped to achieve a more balanced classification across patterns, thereby improving overall model performance.", "However, the metrics achieved should not be analyzed in the same way as BCC/non-BCC performance.", "They should only be evaluated to the extent that they provide a correct explanation for the binary classification.", "It is not relevant if the AI tool misses a specific BCC pattern, but if it misses any BCC pattern, as clinicians diagnose skin lesions in the same way.", "This further evaluation is summarized in the third part of Table 3 .", "As shown in this table, 73 percent of non-BCC lesions without any BCC pattern, 95 percent of non-BCC lesions with PN, and 99 percent of BCC lesions with some BCC pattern are correctly labeled as such.", "" ], "target_context_ids": [ 1, 4, 5, 6, 7, 8, 9, 12, 13, 14, 15, 16, 17 ], "selected_paragraphs": [ "[paragraph id = 1] Table 3 presents metrics that summarize this performance.", "[paragraph id = 4] The first part shows the performance of the AI tool in the binary classification.", "[paragraph id = 5] The second part shows its performance in detecting BCC dermoscopic patterns.", "[paragraph id = 6] Finally, the third part represents the accuracy of the labels that provide the clinical explanation.", "[paragraph id = 7] Overall, the BCC/non-BCC diagnostic performance is high, around for all metrics.", "[paragraph id = 8] However, the BCC pattern detection performance has to be analysed with a deeper insight.", "[paragraph id = 9] Minority classes tend to attain low recall because the AI tool trained with unbalanced databases tends to favor majority classes.", "[paragraph id = 12] Strategies such as data augmentation and advanced sampling, a one-vs-all strategy combined with stratified k-fold cross-validation helped to achieve a more balanced classification across patterns, thereby improving overall model performance.", "[paragraph id = 13] However, the metrics achieved should not be analyzed in the same way as BCC/non-BCC performance.", "[paragraph id = 14] They should only be evaluated to the extent that they provide a correct explanation for the binary classification.", "[paragraph id = 15] It is not relevant if the AI tool misses a specific BCC pattern, but if it misses any BCC pattern, as clinicians diagnose skin lesions in the same way.", "[paragraph id = 16] This further evaluation is summarized in the third part of Table 3 .", "[paragraph id = 17] As shown in this table, 73 percent of non-BCC lesions without any BCC pattern, 95 percent of non-BCC lesions with PN, and 99 percent of BCC lesions with some BCC pattern are correctly labeled as such." ], "table_html": "
\n
Table 3: Evaluation using binary and multilabel classification metrics, fine-tuned binary classifier, and physician-guided analysis.
\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
RecallSpecificityPrecisionAccuracy
BCC/Non-BCC
0.890.890.900.90
Pattern detection
Pigment Network0.940.960.970.95
Ulceration0.810.750.520.77
OvidNests0.650.840.530.84
Multiglobules0.610.810.320.80
Maple Leaf-like0.500.820.340.77
Spoke Wheel0.600.870.370.84
Arborizing Telangiectasia0.890.760.610.80
Clinical-inspired XAI
All 0’s---0.73
Pigment Network0.940.960.970.95
BCC pattern detection0.840.880.710.99
\n
\n
", "perturb_sentence_id": 17, "output": { "perturbed_statement": "[paragraph id = 17] As shown in this table, 73 percent of BCC lesions without any BCC pattern, 95 percent of non-BCC lesions with PN, and 99 percent of non-BCC lesions with some BCC pattern are correctly labeled as such.", "perturbed_explanation": "1. Analyzing the given information, it is stated that 73 percent of non-BCC lesions without any BCC pattern, 95 percent of non-BCC lesions with PN, and 99 percent of BCC lesions with some BCC pattern are accurately labeled. 2. The statement incorrectly suggests that 73 percent of BCC lesions without BCC patterns and 99 percent of non-BCC lesions exhibiting BCC patterns are correctly labeled, which contradicts the context since it specifically mentions the performance metrics for non-BCC lesions without patterns and BCC lesions with patterns." } }, { "path": "table_paper/2407.00104v1.json", "table_id": "4", "section": "3.3", "all_context": [ "This section aims to quantify the accuracy of the AI tool in focusing on the correct part of the lesion, specifically the BCC dermoscopic patterns identified by clinicians.", "To this end, BCC pattern areas delineated by dermatologists will be compared with model activated areas.", "This will provide a quantitative measure of the model s agreement with human diagnostic criteria and demonstrate its ability to accurately identify critical features of BCC lesions.", "To quantify the accuracy of the model activation areas with respect to the areas of clinical interest the conditional probability density functions of the normalized GradCAM values within and outside the area segmented by dermatologist were estimated.", "Let the GradCAM value at position .", "Let denote Fg the area segmented by the dermatologist and Bg the background.", "is the probability density function of GradCAM values for pixels and w is the probability density function of GradCAM values for pixels .", "Figure 4 illustrates this analysis.", "Figure 4(a) shows the original BCC lesion.", "Figure 4(b) shows the Grad-CAM map.", "Figure 4(c) shows the dermatologist s segmentation overlaid on the Grad-CAM map.", "Figure 4(d) shows an example of the two conditional probability density functions.", "The orange curve represents , and the blue curve represents .", "The orange curve is centered near 0, indicating low activation outside the mask, while the blue curve shows significant Grad-CAM information within the clinical segmentation, indicating that the model extracts features from the same region as the specialist.", "Table 4 summarizes the information extracted from these probability density function.", "Specifically, mean, standard deviation of for and respectively, and the intersection area between and are shown.", "This table shows that correctly predicted samples have a larger mean standard deviation than incorrectly predicted samples.", "In addition, the intersection area is larger in these cases.", "These facts prove that the model is not able to pay attention to the areas of clinical interest in the incorrect predictions.", "" ], "target_context_ids": [ 3, 14, 15, 16, 17 ], "selected_paragraphs": [ "[paragraph id = 3] To quantify the accuracy of the model activation areas with respect to the areas of clinical interest the conditional probability density functions of the normalized GradCAM values within and outside the area segmented by dermatologist were estimated.", "[paragraph id = 14] Table 4 summarizes the information extracted from these probability density function.", "[paragraph id = 15] Specifically, mean, standard deviation of for and respectively, and the intersection area between and are shown.", "[paragraph id = 16] This table shows that correctly predicted samples have a larger mean standard deviation than incorrectly predicted samples.", "[paragraph id = 17] In addition, the intersection area is larger in these cases." ], "table_html": "
\n
Table 4: Statistics derived from estimation of conditional probability density functions of GradCAM within and outside the region of clinical interest.
\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
PredictionIntersection\n \n\n\nMean Fg\n\n \n\n\nMean Bg\n\n \n\n\nStd Fg\n\n \n\n\nStd Bg\n
Correct0.240.570.160.140.22
Incorrect0.320.330.140.010.21
\n
\n
", "perturb_sentence_id": 14, "output": { "perturbed_statement": "[paragraph id = 14] Table 4 summarizes the information extracted from these probability density function. Specifically, the median and variance of for and respectively, and the intersection area between and are shown.", "perturbed_explanation": "1. The original statement describes that Table 4 provides a summary including the mean, standard deviation, and intersection area related to the probability density functions. 2. The statement is incorrect because it inaccurately states that the table contains the median and variance instead of the actual metrics, mean and standard deviation, as outlined in the summary." } } ]