The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use Paper • 2411.10323 • Published Nov 15, 2024 • 31
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Paper • 2412.04454 • Published Dec 5, 2024 • 59
Ponder & Press: Advancing Visual GUI Agent towards General Computer Control Paper • 2412.01268 • Published Dec 2, 2024 • 1
UI Agent Collection a collection of algorithmic agents for user interfaces/interactions and program synthesis • 239 items • Updated about 10 hours ago • 41
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 15 items • Updated 27 days ago • 204
Models Used in HackerNoon Publishing System Collection HackerNoon.com’s content management system empowers a small team to manage tens of thousands of writers, advertisers, & millions of readers 🙏 🤖 🙏🤖 • 15 items • Updated Dec 6, 2024 • 20
view article Article Train custom AI models with the trainer API and adapt them to 🤗 By not-lain • Jun 29, 2024 • 33
Imp: Highly Capable Large Multimodal Models for Mobile Devices Paper • 2405.12107 • Published May 20, 2024 • 27
OpenCulture Collection A multilingual dataset of public domain books and newspapers. • 27 items • Updated Nov 6, 2024 • 122
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14, 2024 • 126
Design2Code: How Far Are We From Automating Front-End Engineering? Paper • 2403.03163 • Published Mar 5, 2024 • 94
Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters Paper • 2403.02677 • Published Mar 5, 2024 • 18
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads Paper • 2401.10774 • Published Jan 19, 2024 • 54
BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation Paper • 2401.17053 • Published Jan 30, 2024 • 31