arxiv:2410.09006

From Interaction to Impact: Towards Safer AI Agents Through Understanding and Evaluating UI Operation Impacts

Published on Oct 11

Authors:

Amanda Swearngin

Abstract

With advances in generative AI, there is increasing work towards creating autonomous agents that can manage daily tasks by operating user interfaces (UIs). While prior research has studied the mechanics of how AI agents might navigate UIs and understand UI structure, the effects of agents and their autonomous actions-particularly those that may be risky or irreversible-remain under-explored. In this work, we investigate the real-world impacts and consequences of UI actions by AI agents. We began by developing a taxonomy of the impacts of UI actions through a series of workshops with domain experts. Following this, we conducted a data synthesis study to gather realistic UI screen traces and action data that users perceive as impactful. We then used our impact categories to annotate our collected data and data repurposed from existing UI navigation datasets. Our quantitative evaluations of different large language models (LLMs) and variants demonstrate how well different LLMs can understand the impacts of UI actions that might be taken by an agent. We show that our taxonomy enhances the reasoning capabilities of these LLMs for understanding the impacts of UI actions, but our findings also reveal significant gaps in their ability to reliably classify more nuanced or complex categories of impact.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2410.09006 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2410.09006 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2410.09006 in a Space README.md to link it from this page.