jsunn-y commited on
Commit
63e8805
2 Parent(s): 509f19e 7b7f7dd

Merge branch 'main' of https://huggingface.co/jsunn-y/ProCALM

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -5,9 +5,9 @@ license: bsd-3-clause
5
  # ProCALM
6
  [ProCALM](https://github.com/jsunn-y/ProCALM/tree/main) (Protein Conditionally Adapted Language Model) is a suite of models where [ProGen2-base](https://github.com/enijkamp/progen2) is finetuned with conditional adapters for conditional generation of functional enzymes, based on EC number, taxonomy, or both.
7
 
8
- ProCALM models share `tokenizer.json` and individual models are organized into subfolders. We have uploaded the most relevant models here, but please reach out if you would like to use other models from our paper. `1.5B` and `9B` refer to checkpoints trained to 1.5 and 9 billion tokens, respectively.
9
 
10
- ## Quickstart
11
  Usage details with examples can be found in [github](https://github.com/jsunn-y/ProCALM/tree/main) under "Generation" and in our paper. Example framework for generation from pretrained models:
12
  ```
13
  from tokenizers import Tokenizer
@@ -23,7 +23,7 @@ with torch.no_grad():
23
  as_lists = lambda batch: [batch[i, ...].detach().cpu().numpy().tolist() for i in range(batch.shape[0])]
24
  sequences = tokenizer.decode_batch(as_lists(tokens_batch))
25
  ```
26
- Note that condition_encodings is a representation of the conditioning, which can be calculated using the dictionaries `.pt` provided in our github under `data`.
27
 
28
  ## Summary of Available Models
29
 
 
5
  # ProCALM
6
  [ProCALM](https://github.com/jsunn-y/ProCALM/tree/main) (Protein Conditionally Adapted Language Model) is a suite of models where [ProGen2-base](https://github.com/enijkamp/progen2) is finetuned with conditional adapters for conditional generation of functional enzymes, based on EC number, taxonomy, or both.
7
 
8
+ ProCALM models share `tokenizer.json`, and individual models are organized into subfolders. We have uploaded the most relevant models here, but please reach out if you would like to use other models from our paper. `1.5B` and `9B` refer to checkpoints trained to 1.5 and 9 billion tokens, respectively.
9
 
10
+ ## General Usage
11
  Usage details with examples can be found in [github](https://github.com/jsunn-y/ProCALM/tree/main) under "Generation" and in our paper. Example framework for generation from pretrained models:
12
  ```
13
  from tokenizers import Tokenizer
 
23
  as_lists = lambda batch: [batch[i, ...].detach().cpu().numpy().tolist() for i in range(batch.shape[0])]
24
  sequences = tokenizer.decode_batch(as_lists(tokens_batch))
25
  ```
26
+ Note that `condition_encodings` is a representation of the conditioning, which can be calculated using the dictionaries `.pt` provided in our github under `data`.
27
 
28
  ## Summary of Available Models
29