jinyuan22 commited on
Commit
1630f44
1 Parent(s): b3e5f44

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -3
README.md CHANGED
@@ -1,3 +1,40 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ library_name: transformers
4
+ tags:
5
+ - biology
6
+ pipeline_tag: text-generation
7
+ widget:
8
+ - text: <|bos|> <|tag_start|> 00050 <|tag_end|> <|5|>
9
+ ---
10
+
11
+ # **RFamLlama**
12
+
13
+ The ability to efficiently generate specific RNA sequences on demand has significant implications for both scientific research and therapeutic applications. In this context, we introduce RFamLlama, a conditional language model that is specifically optimized for generating RNA sequences across diverse families. This model was trained on RNA sequences representing over 4,000 distinct families, each augmented with control tags to denote the specific family. We have shown that the inclusion of family-specific tags substantially enhances the capabilities of our model in zero-shot fitness prediction of RNA molecules. Additionally, this model supports a conditional generation approach, allowing for the generation of RNA sequences by using Rfam IDs as input prompts, thereby eliminating the need for further functional-specific fine-tuning. Consequently, RFamLlama is poised to be an effective and widely applicable tool for the zero-shot fitness prediction and generation of RNA sequences, potentially pushing the boundaries of what can be achieved beyond natural evolutionary processes.
14
+
15
+ ## Use RFamLlama-base
16
+
17
+ ```python
18
+ # generation
19
+ from transformers import LlamaForCausalLM, AutoTokenizer, pipeline
20
+ import torch
21
+ import sys
22
+
23
+ model_url = "jinyuan22/RFamLlama-large"
24
+ model = LlamaForCausalLM.from_pretrained(model_url, torch_dtype=torch.float16)
25
+ tokenizer = AutoTokenizer.from_pretrained(model_url)
26
+
27
+ device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")
28
+
29
+ pipe = pipeline("text-generation", model=model, device=device, tokenizer=tokenizer)
30
+
31
+ tag = "RF00005"
32
+ txt = f"<|bos|> <|tag_start|> {tag[2:]} <|tag_end|> <|5|> "
33
+ all_outputs = []
34
+ outputs = pipe(txt, num_return_sequences=10, max_new_tokens=300, repetition_penalty=1, top_p=1,temperature=1, do_sample=True)
35
+
36
+ for i, output in enumerate(outputs):
37
+ seq = output["generated_text"]
38
+ seq = seq.split("<|5|>")[1].split("<|3|>")[0]
39
+ print(f">{i}\n{seq}")
40
+ ```