Merge branch 'main' of https://huggingface.co/jsunn-y/ProCALM
Browse files
README.md
CHANGED
@@ -5,9 +5,9 @@ license: bsd-3-clause
|
|
5 |
# ProCALM
|
6 |
[ProCALM](https://github.com/jsunn-y/ProCALM/tree/main) (Protein Conditionally Adapted Language Model) is a suite of models where [ProGen2-base](https://github.com/enijkamp/progen2) is finetuned with conditional adapters for conditional generation of functional enzymes, based on EC number, taxonomy, or both.
|
7 |
|
8 |
-
ProCALM models share `tokenizer.json
|
9 |
|
10 |
-
##
|
11 |
Usage details with examples can be found in [github](https://github.com/jsunn-y/ProCALM/tree/main) under "Generation" and in our paper. Example framework for generation from pretrained models:
|
12 |
```
|
13 |
from tokenizers import Tokenizer
|
@@ -23,7 +23,7 @@ with torch.no_grad():
|
|
23 |
as_lists = lambda batch: [batch[i, ...].detach().cpu().numpy().tolist() for i in range(batch.shape[0])]
|
24 |
sequences = tokenizer.decode_batch(as_lists(tokens_batch))
|
25 |
```
|
26 |
-
Note that condition_encodings is a representation of the conditioning, which can be calculated using the dictionaries `.pt` provided in our github under `data`.
|
27 |
|
28 |
## Summary of Available Models
|
29 |
|
|
|
5 |
# ProCALM
|
6 |
[ProCALM](https://github.com/jsunn-y/ProCALM/tree/main) (Protein Conditionally Adapted Language Model) is a suite of models where [ProGen2-base](https://github.com/enijkamp/progen2) is finetuned with conditional adapters for conditional generation of functional enzymes, based on EC number, taxonomy, or both.
|
7 |
|
8 |
+
ProCALM models share `tokenizer.json`, and individual models are organized into subfolders. We have uploaded the most relevant models here, but please reach out if you would like to use other models from our paper. `1.5B` and `9B` refer to checkpoints trained to 1.5 and 9 billion tokens, respectively.
|
9 |
|
10 |
+
## General Usage
|
11 |
Usage details with examples can be found in [github](https://github.com/jsunn-y/ProCALM/tree/main) under "Generation" and in our paper. Example framework for generation from pretrained models:
|
12 |
```
|
13 |
from tokenizers import Tokenizer
|
|
|
23 |
as_lists = lambda batch: [batch[i, ...].detach().cpu().numpy().tolist() for i in range(batch.shape[0])]
|
24 |
sequences = tokenizer.decode_batch(as_lists(tokens_batch))
|
25 |
```
|
26 |
+
Note that `condition_encodings` is a representation of the conditioning, which can be calculated using the dictionaries `.pt` provided in our github under `data`.
|
27 |
|
28 |
## Summary of Available Models
|
29 |
|