--- library_name: transformers tags: - CodonTransformer - Computational Biology - Machine Learning - Bioinformatics - Synthetic Biology license: apache-2.0 pipeline_tag: token-classification --- ![image/png](https://github.com/Adibvafa/CodonTransformer/raw/main/src/banner_final.png) **CodonTransformer** is the ultimate tool for codon optimization, transforming protein sequences into optimized DNA sequences specific for your target organisms. Whether you are a researcher or a practitioner in genetic engineering, CodonTransformer provides a comprehensive suite of features to facilitate your work. By leveraging the Transformer architecture and a user-friendly Jupyter notebook, it reduces the complexity of codon optimization, saving you time and effort.
**This is the pretrained model, for best results please use the [finetuned model](https://huggingface.co/adibvafa/CodonTransformer)**. ## Authors Adibvafa Fallahpour1,2\*, Vincent Gureghian3\*, Guillaume J. Filion2‡, Ariel B. Lindner3‡, Amir Pandi31 Vector Institute for Artificial Intelligence, Toronto ON, Canada 2 University of Toronto Scarborough; Department of Biological Science; Scarborough ON, Canada 3 Université Paris Cité, INSERM U1284, Center for Research and Interdisciplinarity, F-75006 Paris, France \* These authors contributed equally to this work. ‡ To whom correspondence should be addressed:
guillaume.filion@utoronto.ca, ariel.lindner@inserm.fr, amir.pandi@cri-paris.org
## Use Case **For a guide on finetuning CodonTransformer, check out our [GitHub.](https://github.com/Adibvafa/CodonTransformer/tree/main?tab=readme-ov-file#finetuning-codontransformer)**
**For an interactive demo, check out our [Google Colab Notebook.](https://adibvafa.github.io/CodonTransformer/GoogleColab)**

After installing CodonTransformer, you can use: ```python import torch from transformers import AutoTokenizer, BigBirdForMaskedLM from CodonTransformer.CodonPrediction import predict_dna_sequence from CodonTransformer.CodonJupyter import format_model_output DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu") # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained("adibvafa/CodonTransformer") model = BigBirdForMaskedLM.from_pretrained("adibvafa/CodonTransformer-base").to(DEVICE) # Set your input data protein = "MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG" organism = "Escherichia coli general" # Predict with CodonTransformer output = predict_dna_sequence( protein=protein, organism=organism, device=DEVICE, tokenizer=tokenizer, model=model, attention_type="original_full", ) print(format_model_output(output)) ``` The output is:
```python ----------------------------- | Organism | ----------------------------- Escherichia coli general ----------------------------- | Input Protein | ----------------------------- MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG ----------------------------- | Processed Input | ----------------------------- M_UNK A_UNK L_UNK W_UNK M_UNK R_UNK L_UNK L_UNK P_UNK L_UNK L_UNK A_UNK L_UNK L_UNK A_UNK L_UNK W_UNK G_UNK P_UNK D_UNK P_UNK A_UNK A_UNK A_UNK F_UNK V_UNK N_UNK Q_UNK H_UNK L_UNK C_UNK G_UNK S_UNK H_UNK L_UNK V_UNK E_UNK A_UNK L_UNK Y_UNK L_UNK V_UNK C_UNK G_UNK E_UNK R_UNK G_UNK F_UNK F_UNK Y_UNK T_UNK P_UNK K_UNK T_UNK R_UNK R_UNK E_UNK A_UNK E_UNK D_UNK L_UNK Q_UNK V_UNK G_UNK Q_UNK V_UNK E_UNK L_UNK G_UNK G_UNK __UNK ----------------------------- | Predicted DNA | ----------------------------- ATGGCTTTATGGATGCGTCTGCTGCCGCTGCTGGCGCTGCTGGCGCTGTGGGGCCCGGACCCGGCGGCGGCGTTTGTGAATCAGCACCTGTGCGGCAGCCACCTGGTGGAAGCGCTGTATCTGGTGTGCGGTGAGCGCGGCTTCTTCTACACGCCCAAAACCCGCCGCGAAGCGGAAGATCTGCAGGTGGGCCAGGTGGAGCTGGGCGGCTAA ``` ## Additional Resources - **Project Website**
https://adibvafa.github.io/CodonTransformer/ - **GitHub Repository**
https://github.com/Adibvafa/CodonTransformer - **Google Colab Demo**
https://adibvafa.github.io/CodonTransformer/GoogleColab - **PyPI Package**
https://pypi.org/project/CodonTransformer/ - **Paper**
https://www.biorxiv.org/content/10.1101/2024.09.13.612903