Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
15
3
233
Eugene Siow
eugenesiow
Follow
WildGenie's profile picture
ZiHDeng's profile picture
yongzhang's profile picture
17 followers
Ā·
8 following
https://eugenesiow.com
eugene_siow
eugenesiow
AI & ML interests
None yet
Recent Activity
liked
a model
5 days ago
Datou1111/shou_xin
upvoted
an
article
12 days ago
šŗš¦āā¬ LLM Comparison/Test: 25 SOTA LLMs (including QwQ) through 59 MMLU-Pro CS benchmark runs
Reacted to
m-ric
's
post
with š„
12 days ago
š¦šµš¼ššØš: š® ššŗš®š¹š¹ š²š»š±-šš¼-š²š»š± š®š“š²š»š ššµš®š š°š®š» š»š®šš¶š“š®šš² š®š»š šØš š®š»š± š¼ššš½š²šæš³š¼šæšŗš šŗšš°šµ šÆš¶š“š“š²šæ ššššš²šŗš! š² A team from NUS and Microsoft just released an agent that can act on any UI (Desktop, Android, Web) without needing additional text information. It works extremely well : they applied their method on a tiny Qwen2-VL-2B, and they managed to beat methods that use either much more powerful vision models (like GPT-4V) without using any additional info (e.g. leveraging the DOM of a webpage) like previous methods did ! šš They started from the idea that most existing methods rely heavily on text, which makes them less generalizable, while letting aside rich UI structure that user actually rely on when navigating this interfaces. āļø They put several good ideas to work: š” Simplify screenshots to the max: They prune a lot the heavy visual content of UI screenshots, by removing cloned image patches (like any vast patch of the same color will be reduced to a small patch, while maintaining positional embeddings), then group patches from the same GUI elements together to simplify even further š” Build a truly generalist dataset: To train a general UI agent, you need trajectories from each possible UI, and express them in a common language. Authors merge datasets like OmniAct for Desktop, Mind2Web for websites, AMEX for Android trajectories to create a high-quality and diverse dataset. ā”ļø Nice results ensued: They fine-tune a tiny Qwen-2-VL-2B on their method, and it reaches SOTA on several task (element identification, web navigation), even beating methods that either use additional info from the DOM or use much bigger VLMS like GPT-4v! š And performance could certainly jump with a slightly bigger vision model. Let's hope the community builds this soon! š Paper added to my "Agents" collection š https://huggingface.co/collections/m-ric/agents-65ba776fbd9e29f771c07d4e
View all activity
Organizations
eugenesiow
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
liked
a model
5 days ago
Datou1111/shou_xin
Text-to-Image
ā¢
Updated
9 days ago
ā¢
20.7k
ā¢
ā¢
523
liked
a model
14 days ago
Qwen/Qwen2.5-72B-Instruct
Text Generation
ā¢
Updated
Sep 25
ā¢
282k
ā¢
ā¢
606
liked
a model
22 days ago
OuteAI/OuteTTS-0.2-500M
Text-to-Speech
ā¢
Updated
15 days ago
ā¢
18.5k
ā¢
269
liked
a model
23 days ago
dblasko/blip-dalle3-img2prompt
Image-to-Text
ā¢
Updated
Nov 20, 2023
ā¢
96
ā¢
35
liked
a model
24 days ago
voyageai/voyage-3-lite
Updated
Sep 17
ā¢
1
liked
2 Spaces
29 days ago
Running
on
Zero
5.81k
š„ļø
FLUX.1 [dev]
Running
290
š»
Qwen2.5 Turbo 1M Demo
liked
a dataset
30 days ago
blnewman/arxivDIGESTables
Updated
Oct 31
ā¢
83
ā¢
2
liked
3 models
about 1 month ago
jinaai/jina-embeddings-v2-base-en
Feature Extraction
ā¢
Updated
Aug 6
ā¢
80.9k
ā¢
703
dunzhang/stella_en_1.5B_v5
Sentence Similarity
ā¢
Updated
7 days ago
ā¢
296k
ā¢
178
Alibaba-NLP/gte-Qwen2-1.5B-instruct
Sentence Similarity
ā¢
Updated
Nov 15
ā¢
61.3k
ā¢
143
liked
4 datasets
about 1 month ago
OpenGVLab/GUI-Odyssey
Viewer
ā¢
Updated
28 days ago
ā¢
7.74k
ā¢
12.5k
ā¢
9
APauli/Persuasive-Pairs
Viewer
ā¢
Updated
Nov 8
ā¢
2.7k
ā¢
70
ā¢
4
huggingface/community-science-paper-v2
Viewer
ā¢
Updated
about 7 hours ago
ā¢
4.96k
ā¢
367
ā¢
6
amazon/AmazonQAC
Viewer
ā¢
Updated
28 days ago
ā¢
396M
ā¢
312
ā¢
11
liked
a Space
about 1 month ago
Running
22
š„
MEGA-Bench
A leaderboard for multimodal models
liked
3 models
about 1 month ago
chuanli11/Llama-3.2-3B-Instruct-uncensored
Text Generation
ā¢
Updated
Oct 18
ā¢
18.6k
ā¢
105
Qwen/Qwen2-VL-7B-Instruct
Image-Text-to-Text
ā¢
Updated
12 days ago
ā¢
2.65M
ā¢
ā¢
949
google/gemma-2-2b-it
Text Generation
ā¢
Updated
Aug 27
ā¢
405k
ā¢
785
liked
a Space
about 2 months ago
Running
71
š
Open LLM Leaderboard Model Comparator
Compare Open LLM Leaderboard results
Load more