luxai commited on
Commit
d0f17c8
·
verified ·
1 Parent(s): 4dd9188

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -67
README.md CHANGED
@@ -1,68 +1,70 @@
1
- ---
2
- license: cc-by-nc-sa-4.0
3
- tags:
4
- - Helical
5
- - RNA
6
- - Biology
7
- - Transformers
8
- - Genomics
9
- - Sequence
10
- library_name: transformers
11
- ---
12
- # Helix-mRNA-v0
13
-
14
- Helix-mRNA emerges as a hybrid state-space and transformer based model, leveraging both the efficient sequence processing capabilities of Mamba2's state-space architecture and the contextual understanding of transformer attention mechanisms, allowing for the best of both worlds between these two approaches. These traits make it particularly suitable for studying full-length transcripts, splice variants, and complex mRNA structural elements.
15
-
16
- We tokenize mRNA sequences at single-nucleotide resolution by mapping each nucleotide (A, C, U, G) and ambiguous base (N) to a unique integer. A further special character E is incorporated into the sequence, denoting the start of each codon. This fine-grained approach maximizes the model's ability to extract patterns from the sequences. Unlike coarser tokenization methods that might group nucleotides together or use k-mer based approaches, our single-nucleotide resolution preserves the full sequential information of the mRNA molecule. This simple yet effective encoding scheme ensures that no information is lost during the preprocessing stage, allowing the downstream model to learn directly from the raw sequence composition.
17
-
18
- # Helical<a name="helical"></a>
19
-
20
- #### Install the package
21
-
22
- Run the following to install the [Helical](https://github.com/helicalAI/helical) package via pip:
23
- ```console
24
- pip install --upgrade helical
25
- ```
26
-
27
- #### Generate Embeddings
28
- ```python
29
- from helical import HelixmRNA, HelixmRNAConfig
30
- import torch
31
-
32
- device = "cuda" if torch.cuda.is_available() else "cpu"
33
-
34
- input_sequences = ["EACU"*20, "EAUG"*20, "EAUG"*20, "EACU"*20, "EAUU"*20]
35
-
36
- helix_mrna_config = HelixmRNAConfig(batch_size=5, device=device, max_length=100)
37
- helix_mrna = HelixmRNA(configurer=helix_mrna_config)
38
-
39
- # prepare data for input to the model
40
- processed_input_data = helix_mrna.process_data(input_sequences)
41
-
42
- # generate the embeddings for the processed data
43
- embeddings = helix_mrna.get_embeddings(processed_input_data)
44
- ```
45
-
46
- #### Fine-Tuning
47
- Classification fine-tuning example:
48
- ```python
49
- from helical import HelixmRNAFineTuningModel, HelixmRNAConfig
50
- import torch
51
-
52
- device = "cuda" if torch.cuda.is_available() else "cpu"
53
-
54
- input_sequences = ["EACU"*20, "EAUG"*20, "EAUG"*20, "EACU"*20, "EAUU"*20]
55
- labels = [0, 2, 2, 0, 1]
56
-
57
- helixr_config = HelixmRNAConfig(batch_size=5, device=device, max_length=100)
58
- helixr_fine_tune = HelixmRNAFineTuningModel(helix_mrna_config=helixr_config, fine_tuning_head="classification", output_size=3)
59
-
60
- # prepare data for input to the model
61
- train_dataset = helixr_fine_tune.process_data(input_sequences)
62
-
63
- # fine-tune the model with the relevant training labels
64
- helixr_fine_tune.train(train_dataset=train_dataset, train_labels=labels)
65
-
66
- # get outputs from the fine-tuned model on a processed dataset
67
- outputs = helixr_fine_tune.get_outputs(train_dataset)
 
 
68
  ```
 
1
+ ---
2
+ license: cc-by-nc-sa-4.0
3
+ tags:
4
+ - Helical
5
+ - RNA
6
+ - Transformers
7
+ - Sequence
8
+ - biology
9
+ - mrna
10
+ - rna
11
+ - genomics
12
+ library_name: transformers
13
+ ---
14
+ # Helix-mRNA-v0
15
+
16
+ Helix-mRNA emerges as a hybrid state-space and transformer based model, leveraging both the efficient sequence processing capabilities of Mamba2's state-space architecture and the contextual understanding of transformer attention mechanisms, allowing for the best of both worlds between these two approaches. These traits make it particularly suitable for studying full-length transcripts, splice variants, and complex mRNA structural elements.
17
+
18
+ We tokenize mRNA sequences at single-nucleotide resolution by mapping each nucleotide (A, C, U, G) and ambiguous base (N) to a unique integer. A further special character E is incorporated into the sequence, denoting the start of each codon. This fine-grained approach maximizes the model's ability to extract patterns from the sequences. Unlike coarser tokenization methods that might group nucleotides together or use k-mer based approaches, our single-nucleotide resolution preserves the full sequential information of the mRNA molecule. This simple yet effective encoding scheme ensures that no information is lost during the preprocessing stage, allowing the downstream model to learn directly from the raw sequence composition.
19
+
20
+ # Helical<a name="helical"></a>
21
+
22
+ #### Install the package
23
+
24
+ Run the following to install the [Helical](https://github.com/helicalAI/helical) package via pip:
25
+ ```console
26
+ pip install --upgrade helical
27
+ ```
28
+
29
+ #### Generate Embeddings
30
+ ```python
31
+ from helical import HelixmRNA, HelixmRNAConfig
32
+ import torch
33
+
34
+ device = "cuda" if torch.cuda.is_available() else "cpu"
35
+
36
+ input_sequences = ["EACU"*20, "EAUG"*20, "EAUG"*20, "EACU"*20, "EAUU"*20]
37
+
38
+ helix_mrna_config = HelixmRNAConfig(batch_size=5, device=device, max_length=100)
39
+ helix_mrna = HelixmRNA(configurer=helix_mrna_config)
40
+
41
+ # prepare data for input to the model
42
+ processed_input_data = helix_mrna.process_data(input_sequences)
43
+
44
+ # generate the embeddings for the processed data
45
+ embeddings = helix_mrna.get_embeddings(processed_input_data)
46
+ ```
47
+
48
+ #### Fine-Tuning
49
+ Classification fine-tuning example:
50
+ ```python
51
+ from helical import HelixmRNAFineTuningModel, HelixmRNAConfig
52
+ import torch
53
+
54
+ device = "cuda" if torch.cuda.is_available() else "cpu"
55
+
56
+ input_sequences = ["EACU"*20, "EAUG"*20, "EAUG"*20, "EACU"*20, "EAUU"*20]
57
+ labels = [0, 2, 2, 0, 1]
58
+
59
+ helixr_config = HelixmRNAConfig(batch_size=5, device=device, max_length=100)
60
+ helixr_fine_tune = HelixmRNAFineTuningModel(helix_mrna_config=helixr_config, fine_tuning_head="classification", output_size=3)
61
+
62
+ # prepare data for input to the model
63
+ train_dataset = helixr_fine_tune.process_data(input_sequences)
64
+
65
+ # fine-tune the model with the relevant training labels
66
+ helixr_fine_tune.train(train_dataset=train_dataset, train_labels=labels)
67
+
68
+ # get outputs from the fine-tuned model on a processed dataset
69
+ outputs = helixr_fine_tune.get_outputs(train_dataset)
70
  ```