QiushiSun commited on
Commit
90745d1
·
verified ·
1 Parent(s): 98f97e2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -15
README.md CHANGED
@@ -9,28 +9,27 @@ pipeline_tag: image-text-to-text
9
 
10
  <div align="center">
11
 
12
- [\[🏠Homepage\]](https://osatlas.github.io) [\[💻Code\]](https://github.com/OS-Copilot/OS-Atlas) [\[🚀Quick Start\]](#quick-start) [\[📝Paper\]](https://arxiv.org/abs/2410.23218) [\[🤗Models\]](https://huggingface.co/collections/OS-Copilot/os-atlas-67246e44003a1dfcc5d0d045)[\[🤗Data\]](https://huggingface.co/datasets/OS-Copilot/OS-Atlas-data) [\[🤗ScreenSpot-v2\]](https://huggingface.co/datasets/OS-Copilot/ScreenSpot-v2)
13
 
14
  </div>
15
 
16
  ## Overview
17
- ![os-atlas](https://github.com/user-attachments/assets/cf2ee020-5e15-4087-9a7e-75cc43662494)
18
 
19
- OS-Atlas provides a series of models specifically designed for GUI agents.
20
 
21
- For GUI grounding tasks, you can use:
22
- - [OS-Atlas-Base-7B](https://huggingface.co/OS-Copilot/OS-Atlas-Base-7B)
23
- - [OS-Atlas-Base-4B](https://huggingface.co/OS-Copilot/OS-Atlas-Base-4B)
24
-
25
- For generating single-step actions in GUI agent tasks, you can use:
26
- - [OS-Atlas-Pro-7B](https://huggingface.co/OS-Copilot/OS-Atlas-Pro-7B)
27
- - [OS-Atlas-Pro-4B](https://huggingface.co/OS-Copilot/OS-Atlas-Pro-4B)
28
 
 
 
29
 
30
- ## Quick Start
31
- OS-Atlas-Base-7B is a GUI grounding model finetuned from [Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct).
 
 
 
32
 
33
- **Notes:** Our models accept images of any size as input. The model outputs are normalized to relative coordinates within a 0-1000 range (either a center point or a bounding box defined by top-left and bottom-right coordinates). For visualization, please remember to convert these relative coordinates back to the original image dimensions.
34
 
35
  ### Inference Example
36
  First, ensure that the necessary dependencies are installed:
@@ -38,7 +37,7 @@ First, ensure that the necessary dependencies are installed:
38
  pip install transformers
39
  pip install qwen-vl-utils
40
  ```
41
- Then download the [example image](https://github.com/OS-Copilot/OS-Atlas/blob/main/examples/images/web_6f93090a-81f6-489e-bb35-1a2838b18c01.png) and save it to the current directory.
42
 
43
  Inference code example:
44
  ```python
@@ -59,7 +58,7 @@ messages = [
59
  "type": "image",
60
  "image": "./web_6f93090a-81f6-489e-bb35-1a2838b18c01.png",
61
  },
62
- {"type": "text", "text": "In this UI screenshot, what is the position of the element corresponding to the command \"switch language of current page\" (with bbox)?"},
63
  ],
64
  }
65
  ]
 
9
 
10
  <div align="center">
11
 
12
+ [\[🏠Homepage\]](https://qiushisun.github.io/OS-Genesis-Home/) [\[💻Code\]](https://github.com/OS-Copilot/OS-Genesis) [\[📝Paper\]](https://arxiv.org/abs/2412.19723) [\[🤗Models\]](https://huggingface.co/collections/OS-Copilot/os-genesis-6768d4b6fffc431dbf624c2d)[\[🤗Data\]](https://huggingface.co/collections/OS-Copilot/os-genesis-6768d4b6fffc431dbf624c2d)
13
 
14
  </div>
15
 
16
  ## Overview
17
+ ![os-genesis](https://cdn-uploads.huggingface.co/production/uploads/6064a0eeb1703ddba0d458b9/XvcAh92uvJQglmIu_L_nK.png)
18
 
19
+ We introduce OS-Genesis, an interaction-driven pipeline that synthesizes high-quality and diverse GUI agent trajectory data without human supervision. By leveraging reverse task synthesis, OS-Genesis enables effective training of GUI agents to achieve superior performance on dynamic benchmarks such as AndroidWorld and WebArena.
20
 
21
+ ## Quick Start
22
+ OS-Genesis-7B-AC is a mobile action model finetuned from [Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct).
 
 
 
 
 
23
 
24
+ ### OS-Genesis AC Family Models
25
+ In the following table, we provide an overview of the OS-Genesis AC Family Models used for evaluating the AndroidControl Benchmark.
26
 
27
+ | Model Name | Base Model | Training Data | HF Link |
28
+ | :-------------: | :-------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------: | :---------------------------------------------------------: |
29
+ | OS-Genesis-4B-AC | [InternVL2-4B](https://huggingface.co/OpenGVLab/InternVL2-4B) | [OS-Genesis-mobile-data](https://huggingface.co/datasets/OS-Copilot/OS-Genesis-mobile-data) | [🤗 link](https://huggingface.co/OS-Copilot/OS-Genesis-4B-AC) |
30
+ | OS-Genesis-7B-AC | [Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) | [OS-Genesis-mobile-data](https://huggingface.co/datasets/OS-Copilot/OS-Genesis-mobile-data) | [🤗 link](https://huggingface.co/OS-Copilot/OS-Genesis-7B-AC) |
31
+ | OS-Genesis-8B-AC | [InternVL2-8B](https://huggingface.co/OpenGVLab/InternVL2-8B) | [OS-Genesis-mobile-data](https://huggingface.co/datasets/OS-Copilot/OS-Genesis-mobile-data) | [🤗 link](https://huggingface.co/OS-Copilot/OS-Genesis-8B-AC) |
32
 
 
33
 
34
  ### Inference Example
35
  First, ensure that the necessary dependencies are installed:
 
37
  pip install transformers
38
  pip install qwen-vl-utils
39
  ```
40
+ For evaluating the AndroidControl Benchmark, please refer to the [**evaluation code**](https://github.com/OS-Copilot/OS-Genesis/tree/main/evaluation/android_control).
41
 
42
  Inference code example:
43
  ```python
 
58
  "type": "image",
59
  "image": "./web_6f93090a-81f6-489e-bb35-1a2838b18c01.png",
60
  },
61
+ {"type": "text", "text": "You are a GUI task expert, I will provide you with a high-level instruction, an action history, a screenshot with its corresponding accessibility tree.\n High-level instruction: {high_level_instruction}\n Action history: {action_history}\n Accessibility tree: {a11y_tree}\n Please generate the low-level thought and action for the next step."},
62
  ],
63
  }
64
  ]