This model is Rex-Omni, a 3B-parameter Multimodal Large Language Model (MLLM) presented in the paper "Detect Anything via Next Point Prediction". It is compatible with the Hugging Face transformers library and is licensed under the IDEA License 1.0.
Detect Anything via Next Point Prediction
Rex-Omni is a 3B-parameter Multimodal Large Language Model (MLLM) that redefines object detection and a wide range of other visual perception tasks as a simple next-token prediction problem.

🚀 Quick Start
Installation
conda create -n rexomni -m python=3.10
pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124
git clone https://github.com/IDEA-Research/Rex-Omni.git
cd Rex-Omni
pip install -v -e .
2. Quick Start: Using Rex-Omni for Detection
from PIL import Image
from rex_omni import RexOmniWrapper, RexOmniVisualize
# Initialize model
model = RexOmniWrapper(
model_path="IDEA-Research/Rex-Omni",
backend="transformers" # or "vllm"
)
# Load image
image = Image.open("your_image.jpg")
# Object Detection
results = model.inference(
images=image,
task="detection",
categories=["person", "car", "dog"]
)
result = results[0]
# 4) Visualize
vis = RexOmniVisualize(
image=image,
predictions=result["extracted_predictions"],
font_size=20,
draw_width=5,
show_labels=True,
)
vis.save("visualize.jpg")
3. Tutorials
We provide a series of tutorials to help you get started with Rex-Omni.
- Detection Example
- Pointing Example
- OCR Example
- Keypointing Example
- Visual Prompting Example
- Batch Inference Example
📄 License
Rex-Omni is licensed under the IDEA License 1.0, Copyright (c) IDEA. All Rights Reserved. This model is based on Qwen, which is licensed under the Qwen RESEARCH LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved.
🔗 Links
📧 Contact
For questions and feedback, please contact us at:
- Email: jiangqing@idea.edu.cn
- GitHub Issues: IDEA-Research/Rex-Omni
7. Citation
Rex-Omni comes from a series of prior works. If you’re interested, you can take a look.
@misc{jiang2025detectpointprediction,
title={Detect Anything via Next Point Prediction},
author={Qing Jiang and Junan Huo and Xingyu Chen and Yuda Xiong and Zhaoyang Zeng and Yihao Chen and Tianhe Ren and Junzhi Yu and Lei Zhang},
year={2025},
eprint={2510.12798},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2510.12798},
}
- Downloads last month
- 15,749
Model tree for IDEA-Research/Rex-Omni
Base model
Qwen/Qwen2.5-VL-3B-Instruct