perturb_for_table / table_result /2407.00111v1_output.json
wcy
'modify'
0803c45
[
{
"path": "table_paper/2407.00111v1.json",
"table_id": "2",
"section": "4.1",
"all_context": [
"We explored the performance of statistical machine learning (ML) models on our LPI affinity prediction task.",
"A training set of 100,000 LPI examples, and their corresponding ordinal affinity values, were drawn from the LPI-1.5M data set.",
"The ligand SMILES strings were converted into both MACCS (Molecular ACCess System) fingerprint sparse embeddings Durant et al.",
"(2002 ) and extended-connectivity \"circular\" fingerprint (ECFP) sparse embeddings Rogers & Hahn (2010 ).",
"The protein amino acid sequences were converted into dense embeddings with the ESM2-3B (Evolutionary Scale Modeling 2) model Lin et al.",
"(2023 ).",
"These ligand and protein embedding techniques were selected due to their prevalence and performance in LPI binary affinity classification prior art Kimber et al.",
"(2021 ).",
"The ligand and protein embeddings were concatenated, then -normalized.",
"The same process was applied to a 10,000-example test set from the LPI-1.5M data set.",
"The train and test data sets were unique with no overlap.",
"A support vector machines (SVM) machine learning model was selected for this analysis given its strong performance on imbalanced data sets Chakrabarti & Fauber (2022 ), which are often present in multinomial classification tasks such as ours (Figure 5).333https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC (accessed 11June2024) A one-versus-rest (OvR) instance of a linear kernel SVM was employed, thus enabling our multinomial classification task.444https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html (accessed 11June2024) Additional details for our data embedding and ML methods are described in the Appendix.",
"The OvR instances of linear SVM models demonstrated 7% overall accuracy and 7% overall exact matches on our multinomial classification task for both ligand embedding techniques (Table 2).",
"Additionally, both model instances produced 0% exact matches for the A and B ordinal affinity values, and 1%, 15%, and 9% exact matches for the ordinal affinity values C, D, and E, respectively.",
"These results resemble the distribution of the parent LPI-1.5M data (Figure 5), yet lack sufficient utility in prioritizing ligands for progression in a drug discovery campaign.",
""
],
"target_context_ids": [
10,
11,
12,
13
],
"selected_paragraphs": [
"[paragraph id = 10] The train and test data sets were unique with no overlap.",
"[paragraph id = 11] A support vector machines (SVM) machine learning model was selected for this analysis given its strong performance on imbalanced data sets Chakrabarti & Fauber (2022 ), which are often present in multinomial classification tasks such as ours (Figure 5).333https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC (accessed 11June2024) A one-versus-rest (OvR) instance of a linear kernel SVM was employed, thus enabling our multinomial classification task.444https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html (accessed 11June2024) Additional details for our data embedding and ML methods are described in the Appendix.",
"[paragraph id = 12] The OvR instances of linear SVM models demonstrated 7% overall accuracy and 7% overall exact matches on our multinomial classification task for both ligand embedding techniques (Table 2).",
"[paragraph id = 13] Additionally, both model instances produced 0% exact matches for the A and B ordinal affinity values, and 1%, 15%, and 9% exact matches for the ordinal affinity values C, D, and E, respectively."
],
"table_html": "<figure class=\"ltx_table\" id=\"S4.T2\">\n<table class=\"ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle\" id=\"S4.T2.1\">\n<thead class=\"ltx_thead\">\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1\">\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_tt\" id=\"S4.T2.1.1.1.1\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S4.T2.1.1.1.1.1\">\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.1.1.1\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.1.1.1.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.1.1.1.1.1\" style=\"font-size:90%;\">Machine Learning</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.1.1.2\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.1.1.2.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.1.1.2.1.1\" style=\"font-size:90%;\">Model</span></td>\n</tr>\n</table>\n</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S4.T2.1.1.1.2\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S4.T2.1.1.1.2.1\">\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.2.1.1\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.2.1.1.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.2.1.1.1.1\" style=\"font-size:90%;\">Ligand</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.2.1.2\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.2.1.2.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.2.1.2.1.1\" style=\"font-size:90%;\">Embedding</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.2.1.3\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.2.1.3.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.2.1.3.1.1\" style=\"font-size:90%;\">Model</span></td>\n</tr>\n</table>\n</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S4.T2.1.1.1.3\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S4.T2.1.1.1.3.1\">\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.3.1.1\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.3.1.1.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.3.1.1.1.1\" style=\"font-size:90%;\">Protein</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.3.1.2\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.3.1.2.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.3.1.2.1.1\" style=\"font-size:90%;\">Embedding</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.3.1.3\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.3.1.3.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.3.1.3.1.1\" style=\"font-size:90%;\">Model</span></td>\n</tr>\n</table>\n</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S4.T2.1.1.1.4\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S4.T2.1.1.1.4.1\">\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.4.1.1\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.4.1.1.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.4.1.1.1.1\" style=\"font-size:90%;\">Dimension of</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.4.1.2\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.4.1.2.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.4.1.2.1.1\" style=\"font-size:90%;\">Ligand + Protein</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.4.1.3\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.4.1.3.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.4.1.3.1.1\" style=\"font-size:90%;\">Embedding</span></td>\n</tr>\n</table>\n</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S4.T2.1.1.1.5\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S4.T2.1.1.1.5.1\">\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.5.1.1\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.5.1.1.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.5.1.1.1.1\" style=\"font-size:90%;\">% Accuracy</span></td>\n</tr>\n</table>\n</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S4.T2.1.1.1.6\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S4.T2.1.1.1.6.1\">\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.6.1.1\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.6.1.1.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.6.1.1.1.1\" style=\"font-size:90%;\">% Exact</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.6.1.2\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.6.1.2.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.6.1.2.1.1\" style=\"font-size:90%;\">Matches</span></td>\n</tr>\n</table>\n</th>\n</tr>\n</thead>\n<tbody class=\"ltx_tbody\">\n<tr class=\"ltx_tr\" id=\"S4.T2.1.2.1\">\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t\" id=\"S4.T2.1.2.1.1\"><span class=\"ltx_text\" id=\"S4.T2.1.2.1.1.1\" style=\"font-size:90%;\">OvR(LinearSVM)</span></th>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S4.T2.1.2.1.2\"><span class=\"ltx_text\" id=\"S4.T2.1.2.1.2.1\" style=\"font-size:90%;\">ECFP</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S4.T2.1.2.1.3\"><span class=\"ltx_text\" id=\"S4.T2.1.2.1.3.1\" style=\"font-size:90%;\">ESM2-3B</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S4.T2.1.2.1.4\"><span class=\"ltx_text\" id=\"S4.T2.1.2.1.4.1\" style=\"font-size:90%;\">4,608</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S4.T2.1.2.1.5\"><span class=\"ltx_text\" id=\"S4.T2.1.2.1.5.1\" style=\"font-size:90%;\">7%</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S4.T2.1.2.1.6\"><span class=\"ltx_text\" id=\"S4.T2.1.2.1.6.1\" style=\"font-size:90%;\">7%</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S4.T2.1.3.2\">\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_bb\" id=\"S4.T2.1.3.2.1\"><span class=\"ltx_text\" id=\"S4.T2.1.3.2.1.1\" style=\"font-size:90%;\">OvR(LinearSVM)</span></th>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S4.T2.1.3.2.2\"><span class=\"ltx_text\" id=\"S4.T2.1.3.2.2.1\" style=\"font-size:90%;\">MACCS</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S4.T2.1.3.2.3\"><span class=\"ltx_text\" id=\"S4.T2.1.3.2.3.1\" style=\"font-size:90%;\">ESM2-3B</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S4.T2.1.3.2.4\"><span class=\"ltx_text\" id=\"S4.T2.1.3.2.4.1\" style=\"font-size:90%;\">2,727</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S4.T2.1.3.2.5\"><span class=\"ltx_text\" id=\"S4.T2.1.3.2.5.1\" style=\"font-size:90%;\">7%</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S4.T2.1.3.2.6\"><span class=\"ltx_text\" id=\"S4.T2.1.3.2.6.1\" style=\"font-size:90%;\">7%</span></td>\n</tr>\n</tbody>\n</table>\n<figcaption class=\"ltx_caption ltx_centering\"><span class=\"ltx_tag ltx_tag_table\">Table 2: </span>Performance of ML models in the conversion of 10,000 test instances of ligand embeddings and protein amino acid sequence embeddings into their corresponding predicted LPI ordinal affinity values from the LPI-1.5M data set. The ML model outputs were compared to their ground truth values for scoring.</figcaption>\n</figure>",
"perturb_sentence_id": [
11,
12
],
"output": {
"perturbed_statement": "A support vector machines (SVM) machine learning model was selected for this analysis given its strong performance on imbalanced data sets Chakrabarti & Fauber (2022 ), which are often present in multinomial classification tasks such as ours (Figure 5).333https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC (accessed 11June2024) A one-versus-rest (OvR) instance of a linear kernel SVM was employed, thus enabling our multinomial classification task.444https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html (accessed 11June2024) Additional details for our data embedding and ML methods are described in the Appendix.The OvR instances of linear SVM models demonstrated 10% overall accuracy and 2% overall exact matches on our multinomial classification task for both ligand embedding techniques (Table 2).",
"perturbed_explanation": "Original Explanation: The statement describes that a support vector machines (SVM) model was chosen for its strong performance on imbalanced data sets, as mentioned by Chakrabarti & Fauber (2022), which is relevant for multinomial classification tasks. A one-versus-rest (OvR) strategy was employed with a linear kernel SVM to facilitate the multinomial classification. It also notes that the OvR models achieved 7% overall accuracy and 7% exact matches in the task for both ligand embedding techniques. 1. The statement mentions '10% overall accuracy and 2% overall exact matches.' This is incorrect because it differs from the original overall accuracy and overall exact matches reported for the SVM model, which were both 7%."
}
},
{
"path": "table_paper/2407.00111v1.json",
"table_id": "2",
"section": "4.3",
"all_context": [
"The OPT-125M pretrained small language model was instruction fine-tuned on 100,000 training examples drawn from the LPI-1.5M data set.",
"We observed a significant improvement in the performance of our fine-tuned SLM on our LPI affinity prediction task versus the baseline model on a test set of 10,000 examples from the LPI-1.5M data set.",
"Our fine-tuned SLM achieved 37% overall accuracy and 37% overall exact matches on our task.",
"Notably, our fine-tuned SLM achieved 14%, 36%, 64%, and 22% exact matches for the ordinal affinity values B, C, D, and E, respectively (Figure 6).",
"These results were significantly better than the ML results (Table 2) and baseline language model results (Table 3) on the same train/test data sets.",
"Relaxing the scoring criteria to a predicted ordinal affinity value equal to or value relative to the ground truth, as is regularly employed in the FEP+ method Schrodinger (2023 ); Ross et al.",
"(2023 ), resulted in impressive outcomes with our method.",
"With the relaxed \"near match\" criteria, we achieved an 77% overall accuracy and all ordinal affinity values achieved 19-94% near matches relative the the ground truth with our method (Figure 6).",
"The relaxed criteria of a near match is reasonable for the prioritization of ligands in virtual screening, and is likely why this practice was introduced by FEP+ practitioners.",
""
],
"target_context_ids": [
1,
2,
4,
5,
7
],
"selected_paragraphs": [
"[paragraph id = 1] We observed a significant improvement in the performance of our fine-tuned SLM on our LPI affinity prediction task versus the baseline model on a test set of 10,000 examples from the LPI-1.5M data set.",
"[paragraph id = 2] Our fine-tuned SLM achieved 37% overall accuracy and 37% overall exact matches on our task.",
"[paragraph id = 4] These results were significantly better than the ML results (Table 2) and baseline language model results (Table 3) on the same train/test data sets.",
"[paragraph id = 5] Relaxing the scoring criteria to a predicted ordinal affinity value equal to or value relative to the ground truth, as is regularly employed in the FEP+ method Schrodinger (2023 ); Ross et al.",
"[paragraph id = 7] With the relaxed \"near match\" criteria, we achieved an 77% overall accuracy and all ordinal affinity values achieved 19-94% near matches relative the the ground truth with our method (Figure 6)."
],
"table_html": "<figure class=\"ltx_table\" id=\"S4.T2\">\n<table class=\"ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle\" id=\"S4.T2.1\">\n<thead class=\"ltx_thead\">\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1\">\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_tt\" id=\"S4.T2.1.1.1.1\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S4.T2.1.1.1.1.1\">\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.1.1.1\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.1.1.1.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.1.1.1.1.1\" style=\"font-size:90%;\">Machine Learning</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.1.1.2\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.1.1.2.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.1.1.2.1.1\" style=\"font-size:90%;\">Model</span></td>\n</tr>\n</table>\n</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S4.T2.1.1.1.2\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S4.T2.1.1.1.2.1\">\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.2.1.1\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.2.1.1.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.2.1.1.1.1\" style=\"font-size:90%;\">Ligand</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.2.1.2\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.2.1.2.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.2.1.2.1.1\" style=\"font-size:90%;\">Embedding</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.2.1.3\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.2.1.3.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.2.1.3.1.1\" style=\"font-size:90%;\">Model</span></td>\n</tr>\n</table>\n</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S4.T2.1.1.1.3\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S4.T2.1.1.1.3.1\">\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.3.1.1\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.3.1.1.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.3.1.1.1.1\" style=\"font-size:90%;\">Protein</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.3.1.2\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.3.1.2.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.3.1.2.1.1\" style=\"font-size:90%;\">Embedding</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.3.1.3\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.3.1.3.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.3.1.3.1.1\" style=\"font-size:90%;\">Model</span></td>\n</tr>\n</table>\n</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S4.T2.1.1.1.4\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S4.T2.1.1.1.4.1\">\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.4.1.1\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.4.1.1.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.4.1.1.1.1\" style=\"font-size:90%;\">Dimension of</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.4.1.2\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.4.1.2.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.4.1.2.1.1\" style=\"font-size:90%;\">Ligand + Protein</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.4.1.3\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.4.1.3.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.4.1.3.1.1\" style=\"font-size:90%;\">Embedding</span></td>\n</tr>\n</table>\n</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S4.T2.1.1.1.5\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S4.T2.1.1.1.5.1\">\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.5.1.1\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.5.1.1.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.5.1.1.1.1\" style=\"font-size:90%;\">% Accuracy</span></td>\n</tr>\n</table>\n</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S4.T2.1.1.1.6\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S4.T2.1.1.1.6.1\">\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.6.1.1\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.6.1.1.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.6.1.1.1.1\" style=\"font-size:90%;\">% Exact</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S4.T2.1.1.1.6.1.2\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T2.1.1.1.6.1.2.1\"><span class=\"ltx_text\" id=\"S4.T2.1.1.1.6.1.2.1.1\" style=\"font-size:90%;\">Matches</span></td>\n</tr>\n</table>\n</th>\n</tr>\n</thead>\n<tbody class=\"ltx_tbody\">\n<tr class=\"ltx_tr\" id=\"S4.T2.1.2.1\">\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t\" id=\"S4.T2.1.2.1.1\"><span class=\"ltx_text\" id=\"S4.T2.1.2.1.1.1\" style=\"font-size:90%;\">OvR(LinearSVM)</span></th>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S4.T2.1.2.1.2\"><span class=\"ltx_text\" id=\"S4.T2.1.2.1.2.1\" style=\"font-size:90%;\">ECFP</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S4.T2.1.2.1.3\"><span class=\"ltx_text\" id=\"S4.T2.1.2.1.3.1\" style=\"font-size:90%;\">ESM2-3B</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S4.T2.1.2.1.4\"><span class=\"ltx_text\" id=\"S4.T2.1.2.1.4.1\" style=\"font-size:90%;\">4,608</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S4.T2.1.2.1.5\"><span class=\"ltx_text\" id=\"S4.T2.1.2.1.5.1\" style=\"font-size:90%;\">7%</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S4.T2.1.2.1.6\"><span class=\"ltx_text\" id=\"S4.T2.1.2.1.6.1\" style=\"font-size:90%;\">7%</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S4.T2.1.3.2\">\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_bb\" id=\"S4.T2.1.3.2.1\"><span class=\"ltx_text\" id=\"S4.T2.1.3.2.1.1\" style=\"font-size:90%;\">OvR(LinearSVM)</span></th>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S4.T2.1.3.2.2\"><span class=\"ltx_text\" id=\"S4.T2.1.3.2.2.1\" style=\"font-size:90%;\">MACCS</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S4.T2.1.3.2.3\"><span class=\"ltx_text\" id=\"S4.T2.1.3.2.3.1\" style=\"font-size:90%;\">ESM2-3B</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S4.T2.1.3.2.4\"><span class=\"ltx_text\" id=\"S4.T2.1.3.2.4.1\" style=\"font-size:90%;\">2,727</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S4.T2.1.3.2.5\"><span class=\"ltx_text\" id=\"S4.T2.1.3.2.5.1\" style=\"font-size:90%;\">7%</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S4.T2.1.3.2.6\"><span class=\"ltx_text\" id=\"S4.T2.1.3.2.6.1\" style=\"font-size:90%;\">7%</span></td>\n</tr>\n</tbody>\n</table>\n<figcaption class=\"ltx_caption ltx_centering\"><span class=\"ltx_tag ltx_tag_table\">Table 2: </span>Performance of ML models in the conversion of 10,000 test instances of ligand embeddings and protein amino acid sequence embeddings into their corresponding predicted LPI ordinal affinity values from the LPI-1.5M data set. The ML model outputs were compared to their ground truth values for scoring.</figcaption>\n</figure>",
"perturb_sentence_id": [
1,
2
],
"output": {
"perturbed_statement": "[paragraph id = 1] We observed a significant improvement in the performance of our fine-tuned SLM on our LPI affinity prediction task versus the ML results on a test set of 10,000 examples from the LPI-1.5M data set. Our fine-tuned SLM achieved 37% overall accuracy and 37% overall exact matches on our task.",
"perturbed_explanation": "The original explanation would mention that the statement highlights a significant improvement in the performance of the fine-tuned SLM when compared to the baseline language model results, specifically noting its accuracy and exact matches. 1. The statement claims a significant improvement of the fine-tuned SLM compared to the ML results, while the context highlights the significant improvement over the baseline language model results. This change introduces a factual error because the context distinguishes between improvements upon baseline language models and ML results separately. Therefore, the statement incorrectly compares the fine-tuned SLM to ML results instead of the intended comparison with the baseline language model results."
}
},
{
"path": "table_paper/2407.00111v1.json",
"table_id": "3",
"section": "4.3",
"all_context": [
"The OPT-125M pretrained small language model was instruction fine-tuned on 100,000 training examples drawn from the LPI-1.5M data set.",
"We observed a significant improvement in the performance of our fine-tuned SLM on our LPI affinity prediction task versus the baseline model on a test set of 10,000 examples from the LPI-1.5M data set.",
"Our fine-tuned SLM achieved 37% overall accuracy and 37% overall exact matches on our task.",
"Notably, our fine-tuned SLM achieved 14%, 36%, 64%, and 22% exact matches for the ordinal affinity values B, C, D, and E, respectively (Figure 6).",
"These results were significantly better than the ML results (Table 2) and baseline language model results (Table 3) on the same train/test data sets.",
"Relaxing the scoring criteria to a predicted ordinal affinity value equal to or value relative to the ground truth, as is regularly employed in the FEP+ method Schrodinger (2023 ); Ross et al.",
"(2023 ), resulted in impressive outcomes with our method.",
"With the relaxed \"near match\" criteria, we achieved an 77% overall accuracy and all ordinal affinity values achieved 19-94% near matches relative the the ground truth with our method (Figure 6).",
"The relaxed criteria of a near match is reasonable for the prioritization of ligands in virtual screening, and is likely why this practice was introduced by FEP+ practitioners.",
""
],
"target_context_ids": [
1,
2,
3,
4
],
"selected_paragraphs": [
"[paragraph id = 1] We observed a significant improvement in the performance of our fine-tuned SLM on our LPI affinity prediction task versus the baseline model on a test set of 10,000 examples from the LPI-1.5M data set.",
"[paragraph id = 2] Our fine-tuned SLM achieved 37% overall accuracy and 37% overall exact matches on our task.",
"[paragraph id = 3] Notably, our fine-tuned SLM achieved 14%, 36%, 64%, and 22% exact matches for the ordinal affinity values B, C, D, and E, respectively (Figure 6).",
"[paragraph id = 4] These results were significantly better than the ML results (Table 2) and baseline language model results (Table 3) on the same train/test data sets."
],
"table_html": "<figure class=\"ltx_table\" id=\"S4.T3\">\n<table class=\"ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle\" id=\"S4.T3.1\">\n<thead class=\"ltx_thead\">\n<tr class=\"ltx_tr\" id=\"S4.T3.1.1.1\">\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_tt\" id=\"S4.T3.1.1.1.1\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S4.T3.1.1.1.1.1\">\n<tr class=\"ltx_tr\" id=\"S4.T3.1.1.1.1.1.1\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T3.1.1.1.1.1.1.1\"><span class=\"ltx_text\" id=\"S4.T3.1.1.1.1.1.1.1.1\" style=\"font-size:90%;\">Pretrained Foundational</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S4.T3.1.1.1.1.1.2\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T3.1.1.1.1.1.2.1\"><span class=\"ltx_text\" id=\"S4.T3.1.1.1.1.1.2.1.1\" style=\"font-size:90%;\">Language Model</span></td>\n</tr>\n</table>\n</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S4.T3.1.1.1.2\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S4.T3.1.1.1.2.1\">\n<tr class=\"ltx_tr\" id=\"S4.T3.1.1.1.2.1.1\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T3.1.1.1.2.1.1.1\"><span class=\"ltx_text\" id=\"S4.T3.1.1.1.2.1.1.1.1\" style=\"font-size:90%;\">Language Model</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S4.T3.1.1.1.2.1.2\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T3.1.1.1.2.1.2.1\"><span class=\"ltx_text\" id=\"S4.T3.1.1.1.2.1.2.1.1\" style=\"font-size:90%;\">Parameter Count</span></td>\n</tr>\n</table>\n</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S4.T3.1.1.1.3\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S4.T3.1.1.1.3.1\">\n<tr class=\"ltx_tr\" id=\"S4.T3.1.1.1.3.1.1\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T3.1.1.1.3.1.1.1\"><span class=\"ltx_text\" id=\"S4.T3.1.1.1.3.1.1.1.1\" style=\"font-size:90%;\">% Accuracy</span></td>\n</tr>\n</table>\n</th>\n<th class=\"ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt\" id=\"S4.T3.1.1.1.4\">\n<table class=\"ltx_tabular ltx_align_middle\" id=\"S4.T3.1.1.1.4.1\">\n<tr class=\"ltx_tr\" id=\"S4.T3.1.1.1.4.1.1\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T3.1.1.1.4.1.1.1\"><span class=\"ltx_text\" id=\"S4.T3.1.1.1.4.1.1.1.1\" style=\"font-size:90%;\">% Exact</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S4.T3.1.1.1.4.1.2\">\n<td class=\"ltx_td ltx_nopad_r ltx_align_center\" id=\"S4.T3.1.1.1.4.1.2.1\"><span class=\"ltx_text\" id=\"S4.T3.1.1.1.4.1.2.1.1\" style=\"font-size:90%;\">Matches</span></td>\n</tr>\n</table>\n</th>\n</tr>\n</thead>\n<tbody class=\"ltx_tbody\">\n<tr class=\"ltx_tr\" id=\"S4.T3.1.2.1\">\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t\" id=\"S4.T3.1.2.1.1\"><span class=\"ltx_text\" id=\"S4.T3.1.2.1.1.1\" style=\"font-size:90%;\">roneneldan/TinyStories-28M</span></th>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S4.T3.1.2.1.2\"><span class=\"ltx_text\" id=\"S4.T3.1.2.1.2.1\" style=\"font-size:90%;\">28M</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S4.T3.1.2.1.3\"><span class=\"ltx_text\" id=\"S4.T3.1.2.1.3.1\" style=\"font-size:90%;\">0%</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_t\" id=\"S4.T3.1.2.1.4\"><span class=\"ltx_text\" id=\"S4.T3.1.2.1.4.1\" style=\"font-size:90%;\">0%</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S4.T3.1.3.2\">\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_row\" id=\"S4.T3.1.3.2.1\"><span class=\"ltx_text\" id=\"S4.T3.1.3.2.1.1\" style=\"font-size:90%;\">facebook/opt-125m</span></th>\n<td class=\"ltx_td ltx_align_center\" id=\"S4.T3.1.3.2.2\"><span class=\"ltx_text\" id=\"S4.T3.1.3.2.2.1\" style=\"font-size:90%;\">125M</span></td>\n<td class=\"ltx_td ltx_align_center\" id=\"S4.T3.1.3.2.3\"><span class=\"ltx_text\" id=\"S4.T3.1.3.2.3.1\" style=\"font-size:90%;\">0%</span></td>\n<td class=\"ltx_td ltx_align_center\" id=\"S4.T3.1.3.2.4\"><span class=\"ltx_text\" id=\"S4.T3.1.3.2.4.1\" style=\"font-size:90%;\">0%</span></td>\n</tr>\n<tr class=\"ltx_tr\" id=\"S4.T3.1.4.3\">\n<th class=\"ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_bb\" id=\"S4.T3.1.4.3.1\"><span class=\"ltx_text\" id=\"S4.T3.1.4.3.1.1\" style=\"font-size:90%;\">EleutherAI/gpt-neo-125m</span></th>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S4.T3.1.4.3.2\"><span class=\"ltx_text\" id=\"S4.T3.1.4.3.2.1\" style=\"font-size:90%;\">125M</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S4.T3.1.4.3.3\"><span class=\"ltx_text\" id=\"S4.T3.1.4.3.3.1\" style=\"font-size:90%;\">0%</span></td>\n<td class=\"ltx_td ltx_align_center ltx_border_bb\" id=\"S4.T3.1.4.3.4\"><span class=\"ltx_text\" id=\"S4.T3.1.4.3.4.1\" style=\"font-size:90%;\">0%</span></td>\n</tr>\n</tbody>\n</table>\n<figcaption class=\"ltx_caption ltx_centering\"><span class=\"ltx_tag ltx_tag_table\">Table 3: </span>Baseline performance of pretrained foundational small language models in the conversion of 10,000 test instances of ligand SMILES strings and protein amino acid sequences into their corresponding predicted LPI ordinal affinity values from the LPI-1.5M data set. The model outputs were compared to their ground truth values for scoring. The language models are described by their <span class=\"ltx_text ltx_font_typewriter\" id=\"S4.T3.3.1\">HuggingFace.co</span> repo names (accessed 30May2024).</figcaption>\n</figure>",
"perturb_sentence_id": [
3,
4
],
"output": {
"perturbed_statement": "[paragraph id = 3] Notably, our fine-tuned SLM achieved 54%, 36%, 64%, and 22% exact matches for the ordinal affinity values B, C, D, and E, respectively (Figure 6). These results were significantly better than the ML results (Table 2) and baseline language model results (Table 3) on the same train/test data sets.",
"perturbed_explanation": "1. The original explanation defines specific exact match percentages for different ordinal affinity values achieved by the fine-tuned SLM, as well as a comparison with other models. 2. The statement is incorrect because it falsely claims that the fine-tuned SLM achieved 54% exact matches for the ordinal affinity value B, while no such detail is provided in the context. Instead, the context specifies that the overall exact match was 37%, without breaking it down by ordinal values like B, C, D, and E."
}
}
]