adamlu1 commited on
Commit
d2cd2f8
1 Parent(s): 0fc5095

update readme

Browse files
Files changed (1) hide show
  1. README.md +9 -56
README.md CHANGED
@@ -1,56 +1,9 @@
1
- # OmniParser: Screen Parsing tool for Pure Vision Based GUI Agent
2
-
3
- <p align="center">
4
- <img src="imgs/logo.png" alt="Logo">
5
- </p>
6
-
7
- [![arXiv](https://img.shields.io/badge/Paper-green)](https://arxiv.org/abs/2408.00203)
8
- [![License](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
9
-
10
- 📢 [[Project Page](https://microsoft.github.io/OmniParser/)] [[Blog Post](https://www.microsoft.com/en-us/research/articles/omniparser-for-pure-vision-based-gui-agent/)] [[Models](https://huggingface.co/microsoft/OmniParser)]
11
-
12
- **OmniParser** is a comprehensive method for parsing user interface screenshots into structured and easy-to-understand elements, which significantly enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface.
13
-
14
- ## News
15
- - [2024/10] Both Interactive Region Detection Model and Icon functional description model are released! [Hugginface models](https://huggingface.co/microsoft/OmniParser)
16
- - [2024/09] OmniParser achieves the best performance on [Windows Agent Arena](https://microsoft.github.io/WindowsAgentArena/)!
17
-
18
- ## Install
19
- Install environment:
20
- ```python
21
- conda create -n "omni" python==3.12
22
- conda activate omni
23
- pip install -r requirements.txt
24
- ```
25
-
26
- Then download the model ckpts files in: https://huggingface.co/microsoft/OmniParser, and put them under weights/, default folder structure is: weights/icon_detect, weights/icon_caption_florence, weights/icon_caption_blip2.
27
-
28
- Finally, convert the safetensor to .pt file.
29
- ```python
30
- python weights/convert_safetensor_to_pt.py
31
- ```
32
-
33
- ## Examples:
34
- We put together a few simple examples in the demo.ipynb.
35
-
36
- ## Gradio Demo
37
- To run gradio demo, simply run:
38
- ```python
39
- python gradio_demo.py
40
- ```
41
-
42
-
43
- ## 📚 Citation
44
- Our technical report can be found [here](https://arxiv.org/abs/2408.00203).
45
- If you find our work useful, please consider citing our work:
46
- ```
47
- @misc{lu2024omniparserpurevisionbased,
48
- title={OmniParser for Pure Vision Based GUI Agent},
49
- author={Yadong Lu and Jianwei Yang and Yelong Shen and Ahmed Awadallah},
50
- year={2024},
51
- eprint={2408.00203},
52
- archivePrefix={arXiv},
53
- primaryClass={cs.CV},
54
- url={https://arxiv.org/abs/2408.00203},
55
- }
56
- ```
 
1
+ title: OmniParser: screen understanding tool for pure vision-based GUI agent
2
+ emoji: 🔥
3
+ colorFrom: yellow
4
+ colorTo: green
5
+ sdk: gradio
6
+ sdk_version: 3.14.0
7
+ app_file: app.py
8
+ pinned: false
9
+ license: agpl-3.0