# YOLO-World + EfficientViT SAM 🤗 [HuggingFace Space](https://huggingface.co/spaces/curt-park/yolo-world-with-efficientvit-sam) ![example_0](https://github.com/Curt-Park/yolo-world-with-efficientvit-sam/assets/14961526/326bde19-d535-4be5-829e-782fce0c1d00) ## Prerequisites This project is developed and tested on Python3.10. ```bash # Create and activate a python 3.10 environment. conda create -n yolo-world-with-efficientvit-sam python=3.10 -y conda activate yolo-world-with-efficientvit-sam # Setup packages. make setup ``` ## How to Run ```bash python app.py ``` Open http://127.0.0.1:7860/ on your web browser. ![example_1](https://github.com/Curt-Park/yolo-world-with-efficientvit-sam/assets/14961526/9388e4ee-6f71-4428-b17c-d218fd059949) ## Core Components ### YOLO-World [YOLO-World](https://github.com/AILab-CVC/YOLO-World) is an open-vocabulary object detection model with high efficiency. On the challenging LVIS dataset, YOLO-World achieves 35.4 AP with 52.0 FPS on V100, which outperforms many state-of-the-art methods in terms of both accuracy and speed. ![image](https://github.com/Curt-Park/yolo-world-with-efficientvit-sam/assets/14961526/8a4a17bd-918d-478a-8451-f58e4a2dce79) ### EfficientViT SAM [EfficientViT SAM](https://github.com/mit-han-lab/efficientvit) is a new family of accelerated segment anything models. Thanks to the lightweight and hardware-efficient core building block, it delivers 48.9× measured TensorRT speedup on A100 GPU over SAM-ViT-H without sacrificing performance. ## Powered By ``` @misc{zhang2024efficientvitsam, title={EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss}, author={Zhuoyang Zhang and Han Cai and Song Han}, year={2024}, eprint={2402.05008}, archivePrefix={arXiv}, primaryClass={cs.CV} } @article{cheng2024yolow, title={YOLO-World: Real-Time Open-Vocabulary Object Detection}, author={Cheng, Tianheng and Song, Lin and Ge, Yixiao and Liu, Wenyu and Wang, Xinggang and Shan, Ying}, journal={arXiv preprint arXiv:2401.17270}, year={2024} } @article{cai2022efficientvit, title={Efficientvit: Enhanced linear attention for high-resolution low-computation visual recognition}, author={Cai, Han and Gan, Chuang and Han, Song}, journal={arXiv preprint arXiv:2205.14756}, year={2022} } ```