Commit
·
abdf6d7
1
Parent(s):
0c7387f
feat: Add model card
Browse files
README.md
CHANGED
@@ -14,7 +14,8 @@ datasets:
|
|
14 |
|
15 |
The Nucleotide Transformers are a collection of foundational language models that were pre-trained on DNA sequences from whole-genomes. Compared to other approaches, our models do not only integrate information from single reference genomes, but leverage DNA sequences from over 3,200 diverse human genomes, as well as 850 genomes from a wide range of species, including model and non-model organisms. Through robust and extensive evaluation, we show that these large models provide extremely accurate molecular phenotype prediction compared to existing methods
|
16 |
|
17 |
-
Part of this collection is the **nucleotide-transformer-v2-500m-multi-species**, a 500m parameters transformer pre-trained
|
|
|
18 |
|
19 |
**Developed by:** InstaDeep, NVIDIA and TUM
|
20 |
|
@@ -92,6 +93,9 @@ The masking procedure used is the standard one for Bert-style training:
|
|
92 |
|
93 |
The model was trained with 8 A100 80GB on 900B tokens, with an effective batch size of 1M tokens. The sequence length used was 1000 tokens. The Adam optimizer [38] was used with a learning rate schedule, and standard values for exponential decay rates and epsilon constants, β1 = 0.9, β2 = 0.999 and ε=1e-8. During a first warmup period, the learning rate was increased linearly between 5e-5 and 1e-4 over 16k steps before decreasing following a square root decay until the end of training.
|
94 |
|
|
|
|
|
|
|
95 |
|
96 |
### BibTeX entry and citation info
|
97 |
|
|
|
14 |
|
15 |
The Nucleotide Transformers are a collection of foundational language models that were pre-trained on DNA sequences from whole-genomes. Compared to other approaches, our models do not only integrate information from single reference genomes, but leverage DNA sequences from over 3,200 diverse human genomes, as well as 850 genomes from a wide range of species, including model and non-model organisms. Through robust and extensive evaluation, we show that these large models provide extremely accurate molecular phenotype prediction compared to existing methods
|
16 |
|
17 |
+
Part of this collection is the **nucleotide-transformer-v2-500m-multi-species**, a 500m parameters transformer pre-trained on a collection of 850 genomes from a wide range of species, including model and non-model organisms.
|
18 |
+
|
19 |
|
20 |
**Developed by:** InstaDeep, NVIDIA and TUM
|
21 |
|
|
|
93 |
|
94 |
The model was trained with 8 A100 80GB on 900B tokens, with an effective batch size of 1M tokens. The sequence length used was 1000 tokens. The Adam optimizer [38] was used with a learning rate schedule, and standard values for exponential decay rates and epsilon constants, β1 = 0.9, β2 = 0.999 and ε=1e-8. During a first warmup period, the learning rate was increased linearly between 5e-5 and 1e-4 over 16k steps before decreasing following a square root decay until the end of training.
|
95 |
|
96 |
+
### Architecture
|
97 |
+
|
98 |
+
The model belongs to the second generation of nucleotide transformers, with the changes in architecture consisting the use of rotary positional embeddings instead of learned ones, as well as the introduction of Gated Linear Units.
|
99 |
|
100 |
### BibTeX entry and citation info
|
101 |
|