jsunn-y
/

ProCALM

jsunn-y commited on Nov 14, 2024

Commit

63e8805

•

2 Parent(s): 509f19e 7b7f7dd

Merge branch 'main' of https://huggingface.co/jsunn-y/ProCALM

Files changed (1) hide show

README.md CHANGED Viewed

@@ -5,9 +5,9 @@ license: bsd-3-clause
 # ProCALM
 [ProCALM](https://github.com/jsunn-y/ProCALM/tree/main) (Protein Conditionally Adapted Language Model) is a suite of models where [ProGen2-base](https://github.com/enijkamp/progen2) is finetuned with conditional adapters for conditional generation of functional enzymes, based on EC number, taxonomy, or both.
-ProCALM models share `tokenizer.json` and individual models are organized into subfolders. We have uploaded the most relevant models here, but please reach out if you would like to use other models from our paper. `1.5B` and `9B` refer to checkpoints trained to 1.5 and 9 billion tokens, respectively.
-## Quickstart
 Usage details with examples can be found in [github](https://github.com/jsunn-y/ProCALM/tree/main) under "Generation" and in our paper. Example framework for generation from pretrained models:
 ```
 from tokenizers import Tokenizer
@@ -23,7 +23,7 @@ with torch.no_grad():
 as_lists = lambda batch: [batch[i, ...].detach().cpu().numpy().tolist() for i in range(batch.shape[0])]
 sequences = tokenizer.decode_batch(as_lists(tokens_batch))
 ```
-Note that condition_encodings is a representation of the conditioning, which can be calculated using the dictionaries `.pt` provided in our github under `data`.
 ## Summary of Available Models

 # ProCALM
 [ProCALM](https://github.com/jsunn-y/ProCALM/tree/main) (Protein Conditionally Adapted Language Model) is a suite of models where [ProGen2-base](https://github.com/enijkamp/progen2) is finetuned with conditional adapters for conditional generation of functional enzymes, based on EC number, taxonomy, or both.
+ProCALM models share `tokenizer.json`, and individual models are organized into subfolders. We have uploaded the most relevant models here, but please reach out if you would like to use other models from our paper. `1.5B` and `9B` refer to checkpoints trained to 1.5 and 9 billion tokens, respectively.
+## General Usage
 Usage details with examples can be found in [github](https://github.com/jsunn-y/ProCALM/tree/main) under "Generation" and in our paper. Example framework for generation from pretrained models:
 ```
 from tokenizers import Tokenizer
 as_lists = lambda batch: [batch[i, ...].detach().cpu().numpy().tolist() for i in range(batch.shape[0])]
 sequences = tokenizer.decode_batch(as_lists(tokens_batch))
 ```
+Note that `condition_encodings` is a representation of the conditioning, which can be calculated using the dictionaries `.pt` provided in our github under `data`.
 ## Summary of Available Models