Model Card for Llama8b-NNetNav-WA
LLama8b-NNetNav-WA is a LLama-3.1-8B model that is instruct-tuned with NNetNav-WA data collected via unsupervised exploration on WebArena websites, with a larger LLama-3.1-70B model.
Most details about this model along with details can be found in our paper: NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild.
Table of Contents
- Model Card for Llama8b-NNetNav-WA
- Table of Contents
- Model Details
- Results on Web-Agent Benchmarks
- Bias, Risks, and Limitations
- Training Details
- Environmental Impact
- Technical Specifications
- Model Card Authors [optional]
- Model Card Contact
- How to Get Started with the Model
Model Details
This model is intended to be used as a web-agent i.e. given an instruction such as Upvote the post by user smurty123 on subreddit r/LocalLLaMA, and a web-url reddit.com, the model can perform the task by executing a sequence of actions.
The action space of the model is as follows:
Page Operation Actions:
`click [id]`: This action clicks on an element with a specific id on the webpage.
`type [id] [content] [press_enter_after=0|1]`: Use this to type the content into the field with id. By default, the "Enter" key is pressed after typing unless press_enter_after is set to 0.
`hover [id]`: Hover over an element with id.
`press [key_comb]`: Simulates the pressing of a key combination on the keyboard (e.g., Ctrl+v).
`scroll [down|up]`: Scroll the page up or down.
Tab Management Actions:
`new_tab`: Open a new, empty browser tab.
`tab_focus [tab_index]`: Switch the browser's focus to a specific tab using its index.
`close_tab`: Close the currently active tab.
URL Navigation Actions:
`goto [url]`: Navigate to a specific URL.
`go_back`: Navigate to the previously viewed page.
`go_forward`: Navigate to the next page (if a previous 'go_back' action was performed).
Completion Action:
`stop [answer]`: Issue this action when you believe the task is complete. If the objective is to find a text-based answer, provide the answer in the bracket. If you believe the task is impossible to complete, provide the answer as "N/A" in the bracket.
Results on Benchmarks
This model gets the following results on WebArena and WebVoyager:
Model | WebArena (SR) | WebVoyager (SR) |
---|---|---|
GPT-4 | 14.1 | 33.5 |
llama8b-nnetnav-wa | 16.3 | 28.1 |
Bias, Risks, and Limitations
TODO
How to Get Started with the Model
TODO
Training Details
Training Data
This model was trained with SFT on the NNetnav-WA dataset, which is comprised of synthetic demonstrations entirely from self-hosted websites.
Training Procedure
This model was trained for 2 epochs (roughly 4k gradient steps) with a batch size of 128, and a maximum sequence length of 20000.
Environmental Impact
- Hardware Type: 4 H100 GPUs (80G)
- Hours used: Roughly 2 days.
- Cloud Provider: Stanford compute.
- Compute Region: Stanford energy grid.
Technical Specifications
Hardware
This model was trained on 4 H100s.
Software
This model was fine-tuned with Open-Instruct
Model Card Authors [optional]
Shikhar Murty
Model Card Contact
- Downloads last month
- 0
Model tree for stanfordnlp/llama8b-nnetnav-wa
Base model
meta-llama/Llama-3.1-8B