[ { "path": "table_paper/2407.00104v1.json", "table_id": "1", "section": "2.1", "all_context": [ "The entire database was provided by the Dermatology Unit of the “Hospital Universitario Virgen Macarena” and were sent over 2 years from 60 primary care centers.", "The dataset comprises 1559 dermoscopic images divided into 3 subsets.", "Four dermatologists provided different types of annotation according to the subsets.", "Specifically: The first subset consisted of 1089 dermoscopic images.", "Initially, the labeling annotations for these images were the presence or absence of each of the dermoscopic features involved in the diagnosis of BCC.", "A second subset of 334 images is additionally enriched with dermatologist delineations of BCC dermoscopic patterns within each image.", "More than one segmented area may appear on an image if there are multiple patterns in the BCC lesion.", "In the Figure 3 an example is shown.", "The third subset is made up of 136 non-BCC images, mostly consisting of nevus lesions, from the ISIC archive [8 ].", "Table 1 summarizes the distribution of labels in the database.", "As can be seen in this table, the database has a significant class imbalance, with SW and MG underrepresented.", "Several techniques have been used to address this problem.", "" ], "target_context_ids": [ 9, 10 ], "selected_paragraphs": [ "[paragraph id = 9] Table 1 summarizes the distribution of labels in the database.", "[paragraph id = 10] As can be seen in this table, the database has a significant class imbalance, with SW and MG underrepresented." ], "table_html": "
\n
Table 1: Sample distribution for binary and multilabel codification.
\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
Binary codificationMulti-label codification
BCCNon-BCCPigment NetworkUlcerationOvoid NestsMultiglobulesMaple Leaf-likeSpoke WheelArborizing Telangiectasia
775784557385333191244178455
\n
\n
", "perturb_sentence_id": [ 9, 10 ], "output": { "perturbed_statement": "[paragraph id = 9] Table 1 summarizes the distribution of labels in the database. As can be seen in this table, the database has a significant class imbalance, with SW and LT underrepresented.", "perturbed_explanation": "The original explanation would indicate that the class imbalance involves labels SW and MG being underrepresented. 1. The statement incorrectly claims that the labels SW and LT are underrepresented, rather than SW and MG. Since LT is not mentioned as underrepresented in the actual class imbalance issue, stating this creates an inaccuracy about which labels have fewer instances." } }, { "path": "table_paper/2407.00104v1.json", "table_id": "2", "section": "2.1.1", "all_context": [ "Each image may contain multiple dermoscopic patterns.", "Therefore, a one-hot coding scheme was used to encode the labels during image annotation and subsequently to process the dermatologists annotations.", "Each image label is a binary word and each BCC dermoscopic pattern is a digit, where means presence and means absence.", "The seven patterns that can appear in a BCC lesion are[4 , 18 , 17 ]: Pigment Network (PN) (negative criterion), Ulceration (U), Ovoid Nests (ON), Multiglobules (MG), Maple Leaf-like (ML), Spoke Wheel (SW), Arborizing Telangiectasia (AT) (Figure 2 ).", "Thus, each label is a vector of dimensions .", "In Table 2 there are some examples of this process.", "" ], "target_context_ids": [ 2, 4, 5 ], "selected_paragraphs": [ "[paragraph id = 2] Each image label is a binary word and each BCC dermoscopic pattern is a digit, where means presence and means absence.", "[paragraph id = 4] Thus, each label is a vector of dimensions .", "[paragraph id = 5] In Table 2 there are some examples of this process." ], "table_html": "
\n
Table 2: Example of multilabel and binary encoding for BCC diagnosis
\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
CodificationMulti-labelBinaryDiagnostic
Example 1[0 1 0 1 1 0 1]1Presence of BCC
Example 2[1 0 0 0 0 0 0]0Absence of BCC
Example 3[0 0 0 0 0 0 0]0Absence of BCC
\n
\n
", "perturb_sentence_id": [ 5, 6 ], "output": { "perturbed_statement": "[paragraph id = 5] In Figure 3 there are some examples of this process.", "perturbed_explanation": "The original explanation would be: 1. The statement suggests that examples of a process are found in Table 2. Here is why the statement is incorrect: 2. The statement is incorrect because it refers to Figure 3, which is not mentioned in the context. The context mentions dimensions and labels but does not indicate the presence of a figure or specify any particular figure number." } }, { "path": "table_paper/2407.00104v1.json", "table_id": "3", "section": "3.2", "all_context": [ "This section analyzes the performance of the AI tool for BCC detection in conjunction with the labels provided to explain this classification.", "Table 3 presents metrics that summarize this performance.", "The metrics are averaged over all folds.", "This table has three parts.", "The first part shows the performance of the AI tool in the binary classification.", "The second part shows its performance in detecting BCC dermoscopic patterns.", "Finally, the third part represents the accuracy of the labels that provide the clinical explanation.", "Overall, the BCC/non-BCC diagnostic performance is high, around for all metrics.", "However, the BCC pattern detection performance has to be analysed with a deeper insight.", "Minority classes tend to attain low recall because the AI tool trained with unbalanced databases tends to favor majority classes.", "As shown in Sect.", "2.1 , SW, MG and ML are underrepresented classes.", "Strategies such as data augmentation and advanced sampling, a one-vs-all strategy combined with stratified k-fold cross-validation helped to achieve a more balanced classification across patterns, thereby improving overall model performance.", "However, the metrics achieved should not be analyzed in the same way as BCC/non-BCC performance.", "They should only be evaluated to the extent that they provide a correct explanation for the binary classification.", "It is not relevant if the AI tool misses a specific BCC pattern, but if it misses any BCC pattern, as clinicians diagnose skin lesions in the same way.", "This further evaluation is summarized in the third part of Table 3 .", "As shown in this table, 73 percent of non-BCC lesions without any BCC pattern, 95 percent of non-BCC lesions with PN, and 99 percent of BCC lesions with some BCC pattern are correctly labeled as such.", "" ], "target_context_ids": [ 1, 4, 5, 6, 7, 8, 9, 12, 13, 14, 15, 16, 17 ], "selected_paragraphs": [ "[paragraph id = 1] Table 3 presents metrics that summarize this performance.", "[paragraph id = 4] The first part shows the performance of the AI tool in the binary classification.", "[paragraph id = 5] The second part shows its performance in detecting BCC dermoscopic patterns.", "[paragraph id = 6] Finally, the third part represents the accuracy of the labels that provide the clinical explanation.", "[paragraph id = 7] Overall, the BCC/non-BCC diagnostic performance is high, around for all metrics.", "[paragraph id = 8] However, the BCC pattern detection performance has to be analysed with a deeper insight.", "[paragraph id = 9] Minority classes tend to attain low recall because the AI tool trained with unbalanced databases tends to favor majority classes.", "[paragraph id = 12] Strategies such as data augmentation and advanced sampling, a one-vs-all strategy combined with stratified k-fold cross-validation helped to achieve a more balanced classification across patterns, thereby improving overall model performance.", "[paragraph id = 13] However, the metrics achieved should not be analyzed in the same way as BCC/non-BCC performance.", "[paragraph id = 14] They should only be evaluated to the extent that they provide a correct explanation for the binary classification.", "[paragraph id = 15] It is not relevant if the AI tool misses a specific BCC pattern, but if it misses any BCC pattern, as clinicians diagnose skin lesions in the same way.", "[paragraph id = 16] This further evaluation is summarized in the third part of Table 3 .", "[paragraph id = 17] As shown in this table, 73 percent of non-BCC lesions without any BCC pattern, 95 percent of non-BCC lesions with PN, and 99 percent of BCC lesions with some BCC pattern are correctly labeled as such." ], "table_html": "
\n
Table 3: Evaluation using binary and multilabel classification metrics, fine-tuned binary classifier, and physician-guided analysis.
\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
RecallSpecificityPrecisionAccuracy
BCC/Non-BCC
0.890.890.900.90
Pattern detection
Pigment Network0.940.960.970.95
Ulceration0.810.750.520.77
OvidNests0.650.840.530.84
Multiglobules0.610.810.320.80
Maple Leaf-like0.500.820.340.77
Spoke Wheel0.600.870.370.84
Arborizing Telangiectasia0.890.760.610.80
Clinical-inspired XAI
All 0’s---0.73
Pigment Network0.940.960.970.95
BCC pattern detection0.840.880.710.99
\n
\n
", "perturb_sentence_id": [ 17, 18 ], "output": { "perturbed_statement": "[paragraph id = 17] As shown in this table, 73 percent of non-BCC lesions with any BCC pattern, 95 percent of non-BCC lesions without PN, and 99 percent of BCC lesions without any BCC pattern are correctly labeled as such.", "perturbed_explanation": "The original statement specifies the correct labeling percentages for various categories: 1. '73 percent of non-BCC lesions without any BCC pattern', 2. '95 percent of non-BCC lesions with PN', and 3. '99 percent of BCC lesions with some BCC pattern'. \n1. '73 percent of non-BCC lesions with any BCC pattern': This is incorrect because it falsely attributes a percentage of correct labeling to non-BCC lesions with the presence of any BCC pattern, which contradicts the context that states it is meant for lesions without any BCC pattern.\n2. '95 percent of non-BCC lesions without PN': This statement reverses the presence of PN, which should be included to match the context correctly.\n3. '99 percent of BCC lesions without any BCC pattern': This changes 'with some BCC pattern' to 'without any BCC pattern', thus making it incorrect by stating the opposite condition." } }, { "path": "table_paper/2407.00104v1.json", "table_id": "4", "section": "3.3", "all_context": [ "This section aims to quantify the accuracy of the AI tool in focusing on the correct part of the lesion, specifically the BCC dermoscopic patterns identified by clinicians.", "To this end, BCC pattern areas delineated by dermatologists will be compared with model activated areas.", "This will provide a quantitative measure of the model s agreement with human diagnostic criteria and demonstrate its ability to accurately identify critical features of BCC lesions.", "To quantify the accuracy of the model activation areas with respect to the areas of clinical interest the conditional probability density functions of the normalized GradCAM values within and outside the area segmented by dermatologist were estimated.", "Let the GradCAM value at position .", "Let denote Fg the area segmented by the dermatologist and Bg the background.", "is the probability density function of GradCAM values for pixels and w is the probability density function of GradCAM values for pixels .", "Figure 4 illustrates this analysis.", "Figure 4(a) shows the original BCC lesion.", "Figure 4(b) shows the Grad-CAM map.", "Figure 4(c) shows the dermatologist s segmentation overlaid on the Grad-CAM map.", "Figure 4(d) shows an example of the two conditional probability density functions.", "The orange curve represents , and the blue curve represents .", "The orange curve is centered near 0, indicating low activation outside the mask, while the blue curve shows significant Grad-CAM information within the clinical segmentation, indicating that the model extracts features from the same region as the specialist.", "Table 4 summarizes the information extracted from these probability density function.", "Specifically, mean, standard deviation of for and respectively, and the intersection area between and are shown.", "This table shows that correctly predicted samples have a larger mean standard deviation than incorrectly predicted samples.", "In addition, the intersection area is larger in these cases.", "These facts prove that the model is not able to pay attention to the areas of clinical interest in the incorrect predictions.", "" ], "target_context_ids": [ 3, 14, 15, 16, 17 ], "selected_paragraphs": [ "[paragraph id = 3] To quantify the accuracy of the model activation areas with respect to the areas of clinical interest the conditional probability density functions of the normalized GradCAM values within and outside the area segmented by dermatologist were estimated.", "[paragraph id = 14] Table 4 summarizes the information extracted from these probability density function.", "[paragraph id = 15] Specifically, mean, standard deviation of for and respectively, and the intersection area between and are shown.", "[paragraph id = 16] This table shows that correctly predicted samples have a larger mean standard deviation than incorrectly predicted samples.", "[paragraph id = 17] In addition, the intersection area is larger in these cases." ], "table_html": "
\n
Table 4: Statistics derived from estimation of conditional probability density functions of GradCAM within and outside the region of clinical interest.
\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
PredictionIntersection\n \n\n\nMean Fg\n\n \n\n\nMean Bg\n\n \n\n\nStd Fg\n\n \n\n\nStd Bg\n
Correct0.240.570.160.140.22
Incorrect0.320.330.140.010.21
\n
\n
", "perturb_sentence_id": [ 14, 15 ], "output": { "perturbed_statement": "[paragraph id = 14] Table 4 summarizes the information extracted from these probability density function. Specifically, only the standard deviation of for and respectively is shown.", "perturbed_explanation": "Original Explanation: 1. The statement originally mentions that the table summarizes mean, standard deviation, and intersection area. 2. This indicates that Table 4 contains multiple statistical measures, providing a comprehensive summary of the data. Perturbed Explanation: 3. The altered statement incorrectly claims that Table 4 only shows the standard deviation, omitting the mean and intersection area. This is inaccurate since the context mentions that both mean and intersection area are part of the summarized information." } } ]