Push model using huggingface_hub.

Browse files

Files changed (1) hide show

README.md +11 -9

README.md CHANGED Viewed

@@ -21,7 +21,7 @@ tags:
 ## Model Description
-`biomed.sm.mv-te-84m` is a biomedical foundation model for small molecules created using MELLON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of molecules in a foundation model setting. While models based on single view representation typically performs well on some downstream tasks and not others, the multi-view model performs robustly across a wide range of property prediction tasks encompassing ligand-protein binding, molecular solubility, metabolism and toxicity. It has been applied to screen compounds against a large (> 100 targets) set of G Protein-Coupled receptors (GPCRs) to identify strong binders for 33 targets related to Alzheimer’s disease, which are validated through structure-based modeling and identification of key binding motifs (Multi-view biomedical foundation models for molecule-target and property prediction)[https://arxiv.org/abs/2410.19704].
 Source code is made available in [this repository](https://github.com/BiomedSciAI/biomed-multi-view).
@@ -35,7 +35,9 @@ The embeddings from these single-view pre-trained encoders are combined using an
 ## Intended Use and Limitations
-The model is intended for (1) Molecular property prediction.  The pre-trained model may be fine-tuned for both regression and classification tasks. Examples include but are not limited to binding affinity, solubility and toxicity. (2)  Pre-trained model embeddings may be used as the basis for similarity measures to search a chemical library. (3) Small molecule embeddings provided by the model may be combined with protein embeddings to fine-tune on tasks that utilize both small molecule and protein representation.  (4) Select task-specific fine-tune models are given as examples. Through listed activities, model may aid in aspects of the molecular discovery such as lead finding or optimization.
 The model’s domain of applicability is small, drug-like molecules. It intended for use with molecules less than 1000 Da molecular weight.  The MMELON approach itself may be extended to include proteins and other macromolecules but does not at present provide embeddings for such entities.  The model is at present not intended for molecular generation.  Molecules must be given as a valid SMILES string that represents a valid chemically bonded graph. Invalid inputs will impact performance or lead to error.
 ## Usage
@@ -43,7 +45,7 @@ The model’s domain of applicability is small, drug-like molecules. It intended
 Using `SmallMoleculeMultiView` requires [https://github.com/BiomedSciAI/biomed-multi-view](https://github.com/BiomedSciAI/biomed-multi-view)
 ## Installation
-Follow these steps to set up the `biomed.multi-view` codebase on your system.
 ### Prerequisites
 * Operating System: Linux or macOS
@@ -53,7 +55,7 @@ Follow these steps to set up the `biomed.multi-view` codebase on your system.
 ### Step 1: Set up the project directory
-Choose a root directory where you want to install biomed.multi-view. For example:
 ```bash
 export ROOT_DIR=~/biomed-multiview
@@ -79,7 +81,7 @@ cd $ROOT_DIR/code
 git clone https://github.com/BiomedSciAI/biomed-multi-view.git
 # Navigate into the cloned repository
-cd biomed.multi-view
 ```
 Note: If you prefer using SSH, ensure that your SSH keys are set up with GitHub and use the following command:
 ```bash
@@ -153,15 +155,15 @@ from bmfm_sm.api.dataset_registry import DatasetRegistry
 dataset_registry = DatasetRegistry()
 # Example SMILES string
-example_smiles = "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O"
 # Get dataset information for dataset
-ds = dataset_registry.get_dataset_info("BACE")
 # Load the finetuned model for the dataset
 finetuned_model_ds = SmallMoleculeMultiViewModel.from_finetuned(
     ds,
-    model_path="ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-BACE-101",
     inference_mode=True,
     huggingface=True
 )
@@ -176,7 +178,7 @@ print("Prediction:", prediction)
 ##### Output:
 ```bash
-Prediction: {'prediction': [0.85], 'label': None}
 ```
 For more advanced usage, see our detailed examples at: https://github.com/BiomedSciAI/biomed-multi-view

 ## Model Description
+`biomed.sm.mv-te-84m` is a biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of molecules in a foundation model setting. While models based on single view representation typically performs well on some downstream tasks and not others, the multi-view model performs robustly across a wide range of property prediction tasks encompassing ligand-protein binding, molecular solubility, metabolism and toxicity. It has been applied to screen compounds against a large (> 100 targets) set of G Protein-Coupled receptors (GPCRs) to identify strong binders for 33 targets related to Alzheimer’s disease, which are validated through structure-based modeling and identification of key binding motifs [Multi-view biomedical foundation models for molecule-target and property prediction](https://arxiv.org/abs/2410.19704).
 Source code is made available in [this repository](https://github.com/BiomedSciAI/biomed-multi-view).
 ## Intended Use and Limitations
+The model is intended for (1) Molecular property prediction.  The pre-trained model may be fine-tuned for both regression and classification tasks. Examples include but are not limited to binding affinity, solubility and toxicity. (2)  Pre-trained model embeddings may be used as the basis for similarity measures to search a chemical library. (3) Small molecule embeddings provided by the model may be combined with protein embeddings to fine-tune on tasks that utilize both small molecule and protein representation.  (4) Select task-specific fine-tuned models are given as examples. Through listed activities, model may aid in aspects of the molecular discovery such as lead finding or optimization.
 The model’s domain of applicability is small, drug-like molecules. It intended for use with molecules less than 1000 Da molecular weight.  The MMELON approach itself may be extended to include proteins and other macromolecules but does not at present provide embeddings for such entities.  The model is at present not intended for molecular generation.  Molecules must be given as a valid SMILES string that represents a valid chemically bonded graph. Invalid inputs will impact performance or lead to error.
 ## Usage
 Using `SmallMoleculeMultiView` requires [https://github.com/BiomedSciAI/biomed-multi-view](https://github.com/BiomedSciAI/biomed-multi-view)
 ## Installation
+Follow these steps to set up the `biomed-multi-view` codebase on your system.
 ### Prerequisites
 * Operating System: Linux or macOS
 ### Step 1: Set up the project directory
+Choose a root directory where you want to install `biomed-multi-view`. For example:
 ```bash
 export ROOT_DIR=~/biomed-multiview
 git clone https://github.com/BiomedSciAI/biomed-multi-view.git
 # Navigate into the cloned repository
+cd biomed-multi-view
 ```
 Note: If you prefer using SSH, ensure that your SSH keys are set up with GitHub and use the following command:
 ```bash
 dataset_registry = DatasetRegistry()
 # Example SMILES string
+example_smiles = "CC(C)C1CCC(C)CC1O"
 # Get dataset information for dataset
+ds = dataset_registry.get_dataset_info("ESOL")
 # Load the finetuned model for the dataset
 finetuned_model_ds = SmallMoleculeMultiViewModel.from_finetuned(
     ds,
+    model_path="ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-ESOL-101",
     inference_mode=True,
     huggingface=True
 )
 ##### Output:
 ```bash
+Prediction: {'prediction': [-2.53], 'label': None}
 ```
 For more advanced usage, see our detailed examples at: https://github.com/BiomedSciAI/biomed-multi-view