[
{
"path": "table_paper/2407.00111v1.json",
"table_id": "2",
"section": "4.1",
"all_context": [
"We explored the performance of statistical machine learning (ML) models on our LPI affinity prediction task.",
"A training set of 100,000 LPI examples, and their corresponding ordinal affinity values, were drawn from the LPI-1.5M data set.",
"The ligand SMILES strings were converted into both MACCS (Molecular ACCess System) fingerprint sparse embeddings Durant et al.",
"(2002 ) and extended-connectivity \"circular\" fingerprint (ECFP) sparse embeddings Rogers & Hahn (2010 ).",
"The protein amino acid sequences were converted into dense embeddings with the ESM2-3B (Evolutionary Scale Modeling 2) model Lin et al.",
"(2023 ).",
"These ligand and protein embedding techniques were selected due to their prevalence and performance in LPI binary affinity classification prior art Kimber et al.",
"(2021 ).",
"The ligand and protein embeddings were concatenated, then -normalized.",
"The same process was applied to a 10,000-example test set from the LPI-1.5M data set.",
"The train and test data sets were unique with no overlap.",
"A support vector machines (SVM) machine learning model was selected for this analysis given its strong performance on imbalanced data sets Chakrabarti & Fauber (2022 ), which are often present in multinomial classification tasks such as ours (Figure 5).333https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC (accessed 11June2024) A one-versus-rest (OvR) instance of a linear kernel SVM was employed, thus enabling our multinomial classification task.444https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html (accessed 11June2024) Additional details for our data embedding and ML methods are described in the Appendix.",
"The OvR instances of linear SVM models demonstrated 7% overall accuracy and 7% overall exact matches on our multinomial classification task for both ligand embedding techniques (Table 2).",
"Additionally, both model instances produced 0% exact matches for the A and B ordinal affinity values, and 1%, 15%, and 9% exact matches for the ordinal affinity values C, D, and E, respectively.",
"These results resemble the distribution of the parent LPI-1.5M data (Figure 5), yet lack sufficient utility in prioritizing ligands for progression in a drug discovery campaign.",
""
],
"target_context_ids": [
10,
11,
12,
13
],
"selected_paragraphs": [
"[paragraph id = 10] The train and test data sets were unique with no overlap.",
"[paragraph id = 11] A support vector machines (SVM) machine learning model was selected for this analysis given its strong performance on imbalanced data sets Chakrabarti & Fauber (2022 ), which are often present in multinomial classification tasks such as ours (Figure 5).333https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC (accessed 11June2024) A one-versus-rest (OvR) instance of a linear kernel SVM was employed, thus enabling our multinomial classification task.444https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html (accessed 11June2024) Additional details for our data embedding and ML methods are described in the Appendix.",
"[paragraph id = 12] The OvR instances of linear SVM models demonstrated 7% overall accuracy and 7% overall exact matches on our multinomial classification task for both ligand embedding techniques (Table 2).",
"[paragraph id = 13] Additionally, both model instances produced 0% exact matches for the A and B ordinal affinity values, and 1%, 15%, and 9% exact matches for the ordinal affinity values C, D, and E, respectively."
],
"table_html": "
\n
\n\n
\n
\n
\n
\n
Machine Learning
\n
\n
\n
Model
\n
\n
\n
\n
\n
\n
\n
Ligand
\n
\n
\n
Embedding
\n
\n
\n
Model
\n
\n
\n
\n
\n
\n
\n
Protein
\n
\n
\n
Embedding
\n
\n
\n
Model
\n
\n
\n
\n
\n
\n
\n
Dimension of
\n
\n
\n
Ligand + Protein
\n
\n
\n
Embedding
\n
\n
\n
\n
\n
\n
\n
% Accuracy
\n
\n
\n
\n
\n
\n
\n
% Exact
\n
\n
\n
Matches
\n
\n
\n
\n
\n\n\n
\n
OvR(LinearSVM)
\n
ECFP
\n
ESM2-3B
\n
4,608
\n
7%
\n
7%
\n
\n
\n
OvR(LinearSVM)
\n
MACCS
\n
ESM2-3B
\n
2,727
\n
7%
\n
7%
\n
\n\n
\nTable 2: Performance of ML models in the conversion of 10,000 test instances of ligand embeddings and protein amino acid sequence embeddings into their corresponding predicted LPI ordinal affinity values from the LPI-1.5M data set. The ML model outputs were compared to their ground truth values for scoring.\n
",
"perturb_sentence_id": 11,
"output": {
"perturbed_statement": "[paragraph id = 11] A support vector machines (SVM) machine learning model was selected for this analysis given its strong performance on imbalanced data sets Chakrabarti & Fauber (2022 ), which are statistically rare in multinomial classification tasks such as ours (Figure 5).333https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC (accessed 11June2024) A one-versus-rest (OvR) instance of a nonlinear kernel SVM was employed, thus enabling our multinomial classification task.444https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html (accessed 11June2024) Additional details for our data embedding and ML methods are described in the Appendix.The OvR instances of linear SVM models demonstrated 7% overall accuracy and 7% overall exact matches on our multinomial classification task for both ligand embedding techniques (Table 2).",
"perturbed_explanation": "1. The original explanation mentions that a linear kernel was selected as an integral part of the SVM model used in the analysis, designed for addressing multinomial classification task, as linear kernels are computationally efficient and effective in certain scenarios of data separation. 2. The statement incorrectly claims that a nonlinear kernel was utilized, which introduces a contradiction to the detailed choice mentioned. This change misrepresents the methodological decisions made in the study and could lead to misunderstandings regarding the applied techniques. To clarify, linear SVMs differ significantly from their nonlinear counterparts in their handling of data distributions."
}
},
{
"path": "table_paper/2407.00111v1.json",
"table_id": "2",
"section": "4.3",
"all_context": [
"The OPT-125M pretrained small language model was instruction fine-tuned on 100,000 training examples drawn from the LPI-1.5M data set.",
"We observed a significant improvement in the performance of our fine-tuned SLM on our LPI affinity prediction task versus the baseline model on a test set of 10,000 examples from the LPI-1.5M data set.",
"Our fine-tuned SLM achieved 37% overall accuracy and 37% overall exact matches on our task.",
"Notably, our fine-tuned SLM achieved 14%, 36%, 64%, and 22% exact matches for the ordinal affinity values B, C, D, and E, respectively (Figure 6).",
"These results were significantly better than the ML results (Table 2) and baseline language model results (Table 3) on the same train/test data sets.",
"Relaxing the scoring criteria to a predicted ordinal affinity value equal to or value relative to the ground truth, as is regularly employed in the FEP+ method Schrodinger (2023 ); Ross et al.",
"(2023 ), resulted in impressive outcomes with our method.",
"With the relaxed \"near match\" criteria, we achieved an 77% overall accuracy and all ordinal affinity values achieved 19-94% near matches relative the the ground truth with our method (Figure 6).",
"The relaxed criteria of a near match is reasonable for the prioritization of ligands in virtual screening, and is likely why this practice was introduced by FEP+ practitioners.",
""
],
"target_context_ids": [
1,
2,
4,
5,
7
],
"selected_paragraphs": [
"[paragraph id = 1] We observed a significant improvement in the performance of our fine-tuned SLM on our LPI affinity prediction task versus the baseline model on a test set of 10,000 examples from the LPI-1.5M data set.",
"[paragraph id = 2] Our fine-tuned SLM achieved 37% overall accuracy and 37% overall exact matches on our task.",
"[paragraph id = 4] These results were significantly better than the ML results (Table 2) and baseline language model results (Table 3) on the same train/test data sets.",
"[paragraph id = 5] Relaxing the scoring criteria to a predicted ordinal affinity value equal to or value relative to the ground truth, as is regularly employed in the FEP+ method Schrodinger (2023 ); Ross et al.",
"[paragraph id = 7] With the relaxed \"near match\" criteria, we achieved an 77% overall accuracy and all ordinal affinity values achieved 19-94% near matches relative the the ground truth with our method (Figure 6)."
],
"table_html": "
\n
\n\n
\n
\n
\n
\n
Machine Learning
\n
\n
\n
Model
\n
\n
\n
\n
\n
\n
\n
Ligand
\n
\n
\n
Embedding
\n
\n
\n
Model
\n
\n
\n
\n
\n
\n
\n
Protein
\n
\n
\n
Embedding
\n
\n
\n
Model
\n
\n
\n
\n
\n
\n
\n
Dimension of
\n
\n
\n
Ligand + Protein
\n
\n
\n
Embedding
\n
\n
\n
\n
\n
\n
\n
% Accuracy
\n
\n
\n
\n
\n
\n
\n
% Exact
\n
\n
\n
Matches
\n
\n
\n
\n
\n\n\n
\n
OvR(LinearSVM)
\n
ECFP
\n
ESM2-3B
\n
4,608
\n
7%
\n
7%
\n
\n
\n
OvR(LinearSVM)
\n
MACCS
\n
ESM2-3B
\n
2,727
\n
7%
\n
7%
\n
\n\n
\nTable 2: Performance of ML models in the conversion of 10,000 test instances of ligand embeddings and protein amino acid sequence embeddings into their corresponding predicted LPI ordinal affinity values from the LPI-1.5M data set. The ML model outputs were compared to their ground truth values for scoring.\n
",
"perturb_sentence_id": 1,
"output": {
"perturbed_statement": "[paragraph id = 1] We observed a significant improvement in the performance of our fine-tuned SLM on our LPI affinity prediction task versus the baseline model on a test set of 500 examples from the LPI-1.5M data set. Our fine-tuned SLM achieved 37% overall accuracy and 57% overall exact matches on our task.",
"perturbed_explanation": "The original explanation highlights the key performance of the fine-tuned SLM on the test set. The statement specifies a test set size of 10,000 examples and reports both 37% accuracy and exact matches. However, the statement now incorrectly mentions that the test set size was 500 examples and that exact matches reached 57%. These altered details conflict with the reported test set size and performance figures, rendering the statement inaccurate."
}
},
{
"path": "table_paper/2407.00111v1.json",
"table_id": "3",
"section": "4.3",
"all_context": [
"The OPT-125M pretrained small language model was instruction fine-tuned on 100,000 training examples drawn from the LPI-1.5M data set.",
"We observed a significant improvement in the performance of our fine-tuned SLM on our LPI affinity prediction task versus the baseline model on a test set of 10,000 examples from the LPI-1.5M data set.",
"Our fine-tuned SLM achieved 37% overall accuracy and 37% overall exact matches on our task.",
"Notably, our fine-tuned SLM achieved 14%, 36%, 64%, and 22% exact matches for the ordinal affinity values B, C, D, and E, respectively (Figure 6).",
"These results were significantly better than the ML results (Table 2) and baseline language model results (Table 3) on the same train/test data sets.",
"Relaxing the scoring criteria to a predicted ordinal affinity value equal to or value relative to the ground truth, as is regularly employed in the FEP+ method Schrodinger (2023 ); Ross et al.",
"(2023 ), resulted in impressive outcomes with our method.",
"With the relaxed \"near match\" criteria, we achieved an 77% overall accuracy and all ordinal affinity values achieved 19-94% near matches relative the the ground truth with our method (Figure 6).",
"The relaxed criteria of a near match is reasonable for the prioritization of ligands in virtual screening, and is likely why this practice was introduced by FEP+ practitioners.",
""
],
"target_context_ids": [
1,
2,
3,
4
],
"selected_paragraphs": [
"[paragraph id = 1] We observed a significant improvement in the performance of our fine-tuned SLM on our LPI affinity prediction task versus the baseline model on a test set of 10,000 examples from the LPI-1.5M data set.",
"[paragraph id = 2] Our fine-tuned SLM achieved 37% overall accuracy and 37% overall exact matches on our task.",
"[paragraph id = 3] Notably, our fine-tuned SLM achieved 14%, 36%, 64%, and 22% exact matches for the ordinal affinity values B, C, D, and E, respectively (Figure 6).",
"[paragraph id = 4] These results were significantly better than the ML results (Table 2) and baseline language model results (Table 3) on the same train/test data sets."
],
"table_html": "
\n
\n\n
\n
\n
\n
\n
Pretrained Foundational
\n
\n
\n
Language Model
\n
\n
\n
\n
\n
\n
\n
Language Model
\n
\n
\n
Parameter Count
\n
\n
\n
\n
\n
\n
\n
% Accuracy
\n
\n
\n
\n
\n
\n
\n
% Exact
\n
\n
\n
Matches
\n
\n
\n
\n
\n\n\n
\n
roneneldan/TinyStories-28M
\n
28M
\n
0%
\n
0%
\n
\n
\n
facebook/opt-125m
\n
125M
\n
0%
\n
0%
\n
\n
\n
EleutherAI/gpt-neo-125m
\n
125M
\n
0%
\n
0%
\n
\n\n
\nTable 3: Baseline performance of pretrained foundational small language models in the conversion of 10,000 test instances of ligand SMILES strings and protein amino acid sequences into their corresponding predicted LPI ordinal affinity values from the LPI-1.5M data set. The model outputs were compared to their ground truth values for scoring. The language models are described by their HuggingFace.co repo names (accessed 30May2024).\n
",
"perturb_sentence_id": 3,
"output": {
"perturbed_statement": "[paragraph id = 3] Notably, our fine-tuned SLM achieved 44%, 26%, 34%, and 12% exact matches for the ordinal affinity values B, C, D, and E, respectively (Figure 6).These results were significantly better than the ML results (Table 2) and baseline language model results (Table 3) on the same train/test data sets.",
"perturbed_explanation": "The original explanation: This data point indicates that the fine-tuned SLM model's performance significantly improved over the results from previous models, as evidenced by the higher exact match percentages for each ordinal affinity value. 1. In the statement, the exact match percentages for B, C, D, and E should be 14%, 36%, 64%, and 22%, respectively; this discrepancy alters the correct reported performance numbers and suggests a different performance overview. 2. Consequently, the provided numbers no longer confirm the stated superior performance without factual alignment with verified results."
}
}
]