willieneis
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -17,6 +17,14 @@ tags:
|
|
17 |
|
18 |
We carry out byte-pair encoding (BPE) tokenization on our dataset, tailored for metagenomic sequences, and then pretrain our model. We detail the pretraining data, tokenization strategy, and model architecture, highlighting the considerations and design choices that enable the effective modeling of metagenomic data, in our technical report.
|
19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
## **Benchmark Performance**
|
21 |
We evaluate METAGENE-1 across three tasks: pathogen detection, zero-shot embedding benchmarks (**Gene-MTEB**), and genome understanding (**GUE**), achieving state-of-the-art performance on most benchmarks. For more details, check out our [paper](TODO).
|
22 |
### **Pathogen Detection**
|
@@ -95,13 +103,6 @@ Next, we evaluate **METAGENE-1** on the GUE multi-species classification benchma
|
|
95 |
| **COVID** | 22.2 | 23.3 | 62.2 | **73.0** | 71.9 | 72.5 |
|
96 |
| **Global Win %** | 0.0 | 0.0 | 7.1 | 21.4 | 25.0 | **46.4** |
|
97 |
|
98 |
-
## **Usage**
|
99 |
-
```python
|
100 |
-
```
|
101 |
-
|
102 |
-
### **Example Generation Pipeline**
|
103 |
-
```python
|
104 |
-
```
|
105 |
|
106 |
## **Model Details**
|
107 |
- **Release Date**: Dec XX 2024
|
|
|
17 |
|
18 |
We carry out byte-pair encoding (BPE) tokenization on our dataset, tailored for metagenomic sequences, and then pretrain our model. We detail the pretraining data, tokenization strategy, and model architecture, highlighting the considerations and design choices that enable the effective modeling of metagenomic data, in our technical report.
|
19 |
|
20 |
+
## **Usage**
|
21 |
+
```python
|
22 |
+
```
|
23 |
+
|
24 |
+
### **Example Generation Pipeline**
|
25 |
+
```python
|
26 |
+
```
|
27 |
+
|
28 |
## **Benchmark Performance**
|
29 |
We evaluate METAGENE-1 across three tasks: pathogen detection, zero-shot embedding benchmarks (**Gene-MTEB**), and genome understanding (**GUE**), achieving state-of-the-art performance on most benchmarks. For more details, check out our [paper](TODO).
|
30 |
### **Pathogen Detection**
|
|
|
103 |
| **COVID** | 22.2 | 23.3 | 62.2 | **73.0** | 71.9 | 72.5 |
|
104 |
| **Global Win %** | 0.0 | 0.0 | 7.1 | 21.4 | 25.0 | **46.4** |
|
105 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
106 |
|
107 |
## **Model Details**
|
108 |
- **Release Date**: Dec XX 2024
|