metagene-ai
/

METAGENE-1

Feature Extraction

text-generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

oliu-io commited on Jan 4

Commit

1e8ac76

·

verified ·

1 Parent(s): 9ed0aef

Update README.md

Files changed (1) hide show

README.md +24 -3

README.md CHANGED Viewed

@@ -19,10 +19,31 @@ We carry out byte-pair encoding (BPE) tokenization on our dataset, tailored for
 ## **Usage**
 ```python
-```
-### **Example Generation Pipeline**
-```python
 ```
 ## **Benchmark Performance**

 ## **Usage**
 ```python
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+# Load the tokenizer and model
+tokenizer = AutoTokenizer.from_pretrained("metagene-ai/METAGENE-1")
+model = AutoModelForCausalLM.from_pretrained("metagene-ai/METAGENE-1", torch_dtype=torch.bfloat16)
+# Example input: Hexamita inflata 5.8S ribosomal RNA gene sequence
+input_sequence = (
+    "TCACCGTTCTACAATCCCAAGCTGGAGTCAAGCTCAACAGGGTCTTCTTGCCCCGCTGAGGGTTACACTCGCCCGTTCCCGAGTCTGTGGTTTCGCGAAGATATGACCAGGGACAGTAAGAACC"
+)
+# Tokenize the input sequence and truncate to the first 12 tokens
+input_tokens = tokenizer.encode(input_sequence, return_tensors="pt", add_special_tokens=False)[..., :12]
+# Generate output from the model with a max sequence length of 32 tokens
+generated_tokens = model.generate(input_tokens, max_length=32)
+# Decode the generated output and clean up the result
+generated_sequence = tokenizer.decode(generated_tokens[0], skip_special_tokens=True)
+generated_sequence = generated_sequence.replace(" ", "").replace("_", "")
+# Print the original input and the model's output
+print(f"📄 Input Sequence:\n{input_sequence}")
+print(f"🔬 Generated Sequence:\n{generated_sequence}")
 ```
 ## **Benchmark Performance**