thrumbel commited on
Commit
8b3902d
·
verified ·
1 Parent(s): 07c5b76

Push model using huggingface_hub.

Browse files
Files changed (1) hide show
  1. README.md +11 -9
README.md CHANGED
@@ -21,7 +21,7 @@ tags:
21
 
22
  ## Model Description
23
 
24
- `biomed.sm.mv-te-84m` is a biomedical foundation model for small molecules created using MELLON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of molecules in a foundation model setting. While models based on single view representation typically performs well on some downstream tasks and not others, the multi-view model performs robustly across a wide range of property prediction tasks encompassing ligand-protein binding, molecular solubility, metabolism and toxicity. It has been applied to screen compounds against a large (> 100 targets) set of G Protein-Coupled receptors (GPCRs) to identify strong binders for 33 targets related to Alzheimer’s disease, which are validated through structure-based modeling and identification of key binding motifs (Multi-view biomedical foundation models for molecule-target and property prediction)[https://arxiv.org/abs/2410.19704].
25
 
26
  Source code is made available in [this repository](https://github.com/BiomedSciAI/biomed-multi-view).
27
 
@@ -35,7 +35,9 @@ The embeddings from these single-view pre-trained encoders are combined using an
35
 
36
  ## Intended Use and Limitations
37
 
38
- The model is intended for (1) Molecular property prediction. The pre-trained model may be fine-tuned for both regression and classification tasks. Examples include but are not limited to binding affinity, solubility and toxicity. (2) Pre-trained model embeddings may be used as the basis for similarity measures to search a chemical library. (3) Small molecule embeddings provided by the model may be combined with protein embeddings to fine-tune on tasks that utilize both small molecule and protein representation. (4) Select task-specific fine-tune models are given as examples. Through listed activities, model may aid in aspects of the molecular discovery such as lead finding or optimization.
 
 
39
  The model’s domain of applicability is small, drug-like molecules. It intended for use with molecules less than 1000 Da molecular weight. The MMELON approach itself may be extended to include proteins and other macromolecules but does not at present provide embeddings for such entities. The model is at present not intended for molecular generation. Molecules must be given as a valid SMILES string that represents a valid chemically bonded graph. Invalid inputs will impact performance or lead to error.
40
 
41
  ## Usage
@@ -43,7 +45,7 @@ The model’s domain of applicability is small, drug-like molecules. It intended
43
  Using `SmallMoleculeMultiView` requires [https://github.com/BiomedSciAI/biomed-multi-view](https://github.com/BiomedSciAI/biomed-multi-view)
44
 
45
  ## Installation
46
- Follow these steps to set up the `biomed.multi-view` codebase on your system.
47
 
48
  ### Prerequisites
49
  * Operating System: Linux or macOS
@@ -53,7 +55,7 @@ Follow these steps to set up the `biomed.multi-view` codebase on your system.
53
 
54
 
55
  ### Step 1: Set up the project directory
56
- Choose a root directory where you want to install biomed.multi-view. For example:
57
 
58
  ```bash
59
  export ROOT_DIR=~/biomed-multiview
@@ -79,7 +81,7 @@ cd $ROOT_DIR/code
79
  git clone https://github.com/BiomedSciAI/biomed-multi-view.git
80
 
81
  # Navigate into the cloned repository
82
- cd biomed.multi-view
83
  ```
84
  Note: If you prefer using SSH, ensure that your SSH keys are set up with GitHub and use the following command:
85
  ```bash
@@ -153,15 +155,15 @@ from bmfm_sm.api.dataset_registry import DatasetRegistry
153
  dataset_registry = DatasetRegistry()
154
 
155
  # Example SMILES string
156
- example_smiles = "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O"
157
 
158
  # Get dataset information for dataset
159
- ds = dataset_registry.get_dataset_info("BACE")
160
 
161
  # Load the finetuned model for the dataset
162
  finetuned_model_ds = SmallMoleculeMultiViewModel.from_finetuned(
163
  ds,
164
- model_path="ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-BACE-101",
165
  inference_mode=True,
166
  huggingface=True
167
  )
@@ -176,7 +178,7 @@ print("Prediction:", prediction)
176
 
177
  ##### Output:
178
  ```bash
179
- Prediction: {'prediction': [0.85], 'label': None}
180
  ```
181
 
182
  For more advanced usage, see our detailed examples at: https://github.com/BiomedSciAI/biomed-multi-view
 
21
 
22
  ## Model Description
23
 
24
+ `biomed.sm.mv-te-84m` is a biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of molecules in a foundation model setting. While models based on single view representation typically performs well on some downstream tasks and not others, the multi-view model performs robustly across a wide range of property prediction tasks encompassing ligand-protein binding, molecular solubility, metabolism and toxicity. It has been applied to screen compounds against a large (> 100 targets) set of G Protein-Coupled receptors (GPCRs) to identify strong binders for 33 targets related to Alzheimer’s disease, which are validated through structure-based modeling and identification of key binding motifs [Multi-view biomedical foundation models for molecule-target and property prediction](https://arxiv.org/abs/2410.19704).
25
 
26
  Source code is made available in [this repository](https://github.com/BiomedSciAI/biomed-multi-view).
27
 
 
35
 
36
  ## Intended Use and Limitations
37
 
38
+ The model is intended for (1) Molecular property prediction. The pre-trained model may be fine-tuned for both regression and classification tasks. Examples include but are not limited to binding affinity, solubility and toxicity. (2) Pre-trained model embeddings may be used as the basis for similarity measures to search a chemical library. (3) Small molecule embeddings provided by the model may be combined with protein embeddings to fine-tune on tasks that utilize both small molecule and protein representation. (4) Select task-specific fine-tuned models are given as examples. Through listed activities, model may aid in aspects of the molecular discovery such as lead finding or optimization.
39
+
40
+
41
  The model’s domain of applicability is small, drug-like molecules. It intended for use with molecules less than 1000 Da molecular weight. The MMELON approach itself may be extended to include proteins and other macromolecules but does not at present provide embeddings for such entities. The model is at present not intended for molecular generation. Molecules must be given as a valid SMILES string that represents a valid chemically bonded graph. Invalid inputs will impact performance or lead to error.
42
 
43
  ## Usage
 
45
  Using `SmallMoleculeMultiView` requires [https://github.com/BiomedSciAI/biomed-multi-view](https://github.com/BiomedSciAI/biomed-multi-view)
46
 
47
  ## Installation
48
+ Follow these steps to set up the `biomed-multi-view` codebase on your system.
49
 
50
  ### Prerequisites
51
  * Operating System: Linux or macOS
 
55
 
56
 
57
  ### Step 1: Set up the project directory
58
+ Choose a root directory where you want to install `biomed-multi-view`. For example:
59
 
60
  ```bash
61
  export ROOT_DIR=~/biomed-multiview
 
81
  git clone https://github.com/BiomedSciAI/biomed-multi-view.git
82
 
83
  # Navigate into the cloned repository
84
+ cd biomed-multi-view
85
  ```
86
  Note: If you prefer using SSH, ensure that your SSH keys are set up with GitHub and use the following command:
87
  ```bash
 
155
  dataset_registry = DatasetRegistry()
156
 
157
  # Example SMILES string
158
+ example_smiles = "CC(C)C1CCC(C)CC1O"
159
 
160
  # Get dataset information for dataset
161
+ ds = dataset_registry.get_dataset_info("ESOL")
162
 
163
  # Load the finetuned model for the dataset
164
  finetuned_model_ds = SmallMoleculeMultiViewModel.from_finetuned(
165
  ds,
166
+ model_path="ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-ESOL-101",
167
  inference_mode=True,
168
  huggingface=True
169
  )
 
178
 
179
  ##### Output:
180
  ```bash
181
+ Prediction: {'prediction': [-2.53], 'label': None}
182
  ```
183
 
184
  For more advanced usage, see our detailed examples at: https://github.com/BiomedSciAI/biomed-multi-view