15 3 233

Eugene Siow

eugenesiow

https://eugenesiow.com

AI & ML interests

None yet

Recent Activity

liked a model 5 days ago

Datou1111/shou_xin

upvoted an article 12 days ago

🐺🐦‍⬛ LLM Comparison/Test: 25 SOTA LLMs (including QwQ) through 59 MMLU-Pro CS benchmark runs

Reacted to m-ric's post with 🔥 12 days ago

𝗦𝗵𝗼𝘄𝗨𝗜: 𝗮 𝘀𝗺𝗮𝗹𝗹 𝗲𝗻𝗱-𝘁𝗼-𝗲𝗻𝗱 𝗮𝗴𝗲𝗻𝘁 𝘁𝗵𝗮𝘁 𝗰𝗮𝗻 𝗻𝗮𝘃𝗶𝗴𝗮𝘁𝗲 𝗮𝗻𝘆 𝗨𝗜 𝗮𝗻𝗱 𝗼𝘂𝘁𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝘀 𝗺𝘂𝗰𝗵 𝗯𝗶𝗴𝗴𝗲𝗿 𝘀𝘆𝘀𝘁𝗲𝗺𝘀! 📲 A team from NUS and Microsoft just released an agent that can act on any UI (Desktop, Android, Web) without needing additional text information. It works extremely well : they applied their method on a tiny Qwen2-VL-2B, and they managed to beat methods that use either much more powerful vision models (like GPT-4V) without using any additional info (e.g. leveraging the DOM of a webpage) like previous methods did ! 👏👏 They started from the idea that most existing methods rely heavily on text, which makes them less generalizable, while letting aside rich UI structure that user actually rely on when navigating this interfaces. ⚙️ They put several good ideas to work: 💡 Simplify screenshots to the max: They prune a lot the heavy visual content of UI screenshots, by removing cloned image patches (like any vast patch of the same color will be reduced to a small patch, while maintaining positional embeddings), then group patches from the same GUI elements together to simplify even further 💡 Build a truly generalist dataset: To train a general UI agent, you need trajectories from each possible UI, and express them in a common language. Authors merge datasets like OmniAct for Desktop, Mind2Web for websites, AMEX for Android trajectories to create a high-quality and diverse dataset. ➡️ Nice results ensued: They fine-tune a tiny Qwen-2-VL-2B on their method, and it reaches SOTA on several task (element identification, web navigation), even beating methods that either use additional info from the DOM or use much bigger VLMS like GPT-4v! 🏆 And performance could certainly jump with a slightly bigger vision model. Let's hope the community builds this soon! 🚀 Paper added to my "Agents" collection 👉 https://huggingface.co/collections/m-ric/agents-65ba776fbd9e29f771c07d4e

View all activity

Organizations

eugenesiow's activity

New activity in eugenesiow/remove-bg over 1 year ago

Update requirements.txt

#2 opened over 1 year ago by

itacaiunas

New activity in HuggingFaceH4/starchat-alpha over 1 year ago

Model doesn't end/terminate generation: Need to modify EOS token

#2 opened over 1 year ago by

eugenesiow

New activity in OpenAssistant/oasst-sft-6-llama-30b-xor over 1 year ago

Chat/Conversation Templates

#9 opened over 1 year ago by

eugenesiow

New activity in eugenesiow/bart-paraphrase over 1 year ago

How to add the number of sequences in model.generate for Bart Paraphrase

#1 opened over 2 years ago by

pratikkotian04

Adding `safetensors` variant of this model

#2 opened almost 2 years ago by

SFconvertbot

New activity in eugenesiow/Div2k about 2 years ago

Why lr and hr is string not image?

#3 opened about 2 years ago by

Freed-Wu

New activity in eugenesiow/BSD100 about 2 years ago

Fix task tags

#4 opened about 2 years ago by

albertvillanova

New activity in eugenesiow/PIRM about 2 years ago

Fix task_ids

#2 opened about 2 years ago by

albertvillanova

New activity in eugenesiow/Div2k about 2 years ago

Fix task_ids

#2 opened about 2 years ago by

albertvillanova

New activity in eugenesiow/BSD100 about 2 years ago

Fix language and license tag names

#1 opened about 2 years ago by

albertvillanova

New activity in eugenesiow/Set14 about 2 years ago

Fix task_ids

#2 opened about 2 years ago by

albertvillanova

New activity in eugenesiow/Set5 about 2 years ago

Fix task_ids

#1 opened about 2 years ago by

albertvillanova

New activity in eugenesiow/Urban100 about 2 years ago

Fix task_ids

#2 opened about 2 years ago by

albertvillanova

New activity in eugenesiow/Urban100 over 2 years ago

Fix `license` metadata

#1 opened over 2 years ago by

julien-c

New activity in eugenesiow/PIRM over 2 years ago

Fix `license` metadata

#1 opened over 2 years ago by

julien-c

New activity in eugenesiow/bart-paraphrase over 2 years ago

How to add the number of sequences in model.generate for Bart Paraphrase

#1 opened over 2 years ago by

pratikkotian04