Model Card for Llama8b-NNetNav-WA

LLama8b-NNetNav-WA is a LLama-3.1-8B model that is instruct-tuned with NNetNav-WA data collected via unsupervised exploration on WebArena websites, with a larger LLama-3.1-70B model.

Most details about this model along with details can be found in our paper: NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild.

Model Card for Llama8b-NNetNav-WA
Table of Contents
Model Details
Results on Web-Agent Benchmarks
Bias, Risks, and Limitations
Training Details
- Training Data
- Training Procedure
Environmental Impact
Technical Specifications
- Hardware
- Software
Model Card Authors [optional]
Model Card Contact
How to Get Started with the Model

Model Details

This model is intended to be used as a web-agent i.e. given an instruction such as Upvote the post by user smurty123 on subreddit r/LocalLLaMA, and a web-url reddit.com, the model can perform the task by executing a sequence of actions.

The action space of the model is as follows:

Page Operation Actions:
`click [id]`: This action clicks on an element with a specific id on the webpage.
`type [id] [content] [press_enter_after=0|1]`: Use this to type the content into the field with id. By default, the "Enter" key is pressed after typing unless press_enter_after is set to 0.
`hover [id]`: Hover over an element with id.
`press [key_comb]`:  Simulates the pressing of a key combination on the keyboard (e.g., Ctrl+v).
`scroll [down|up]`: Scroll the page up or down.

Tab Management Actions:
`new_tab`: Open a new, empty browser tab.
`tab_focus [tab_index]`: Switch the browser's focus to a specific tab using its index.
`close_tab`: Close the currently active tab.

URL Navigation Actions:
`goto [url]`: Navigate to a specific URL.
`go_back`: Navigate to the previously viewed page.
`go_forward`: Navigate to the next page (if a previous 'go_back' action was performed).

Completion Action:
`stop [answer]`: Issue this action when you believe the task is complete. If the objective is to find a text-based answer, provide the answer in the bracket. If you believe the task is impossible to complete, provide the answer as "N/A" in the bracket.

Results on Benchmarks

This model gets the following results on WebArena and WebVoyager:

Model	WebArena (SR)	WebVoyager (SR)
GPT-4	14.1	33.5
llama8b-nnetnav-wa	16.3	28.1

Bias, Risks, and Limitations

TODO

How to Get Started with the Model

TODO

Training Details

Training Data

This model was trained with SFT on the NNetnav-WA dataset, which is comprised of synthetic demonstrations entirely from self-hosted websites.

Training Procedure

This model was trained for 2 epochs (roughly 4k gradient steps) with a batch size of 128, and a maximum sequence length of 20000.

Environmental Impact

Hardware Type: 4 H100 GPUs (80G)
Hours used: Roughly 2 days.
Cloud Provider: Stanford compute.
Compute Region: Stanford energy grid.

Technical Specifications

Hardware

This model was trained on 4 H100s.

Software

This model was fine-tuned with Open-Instruct

Model Card Authors [optional]

Shikhar Murty

Model Card Contact

smurty@cs.stanford.edu

stanfordnlp
/

llama8b-nnetnav-wa