AFRAgent — Epoch 7 checkpoint
This model is a checkpoint from AFRAgent (paper, WACV 2026): an Adaptive Feature Renormalization–based GUI agent for smartphone automation, built on InstructBLIP.
- Architecture: AnyResAdaIn (any-resolution adaptive feature renormalization)
- Base model: Salesforce/instructblip-flan-t5-xl
- Training: Fine-tuned on Android-in-the-Wild (AITW) — 7 epochs,
all_data_any_res_adain_finetuning(bs128, ip512, op256, ep12 run; this is checkpoint at step 56266 ≈ epoch 7)
How to load
Requires the AFRAgent codebase for the custom AnyResAdaIn class.
# Clone AFRAgent and add to path, then:
from models.any_res_adain_queries_fusion import AnyResAdaIn
from transformers import InstructBlipProcessor, AutoTokenizer
model = AnyResAdaIn.from_pretrained("neeraj321/AFRAgent_pure_multimodel")
processor = InstructBlipProcessor.from_pretrained("neeraj321/AFRAgent_pure_multimodel")
tokenizer = AutoTokenizer.from_pretrained("neeraj321/AFRAgent_pure_multimodel")
For evaluation with the AFRAgent script:
python instructblip_main.py \
--evaluate_dir neeraj321/AFRAgent_pure_multimodel \
--train_any_res_adain True \
--use_high_res True \
--data_root dataset/aitw/general/general \
--input_len 512 --output_len 256 --eval_bs 64
License
MIT
Citation
@article{anand2025afragent,
title={AFRAgent: An Adaptive Feature Renormalization Based High Resolution Aware GUI agent},
author={Anand, Neeraj and others},
journal={WACV},
year={2026}
}