Trick or ResNet Treat

Community Article Published October 31, 2024

A small 🎃 treat, I just uploaded a few small ResNets trained like they've never been trained before. A user request, I threw a recent hparam set (MobileNet-v4 Conv Small x ResNet Strikes Back / timm, ra4 in the tables) at the 'Basic Block' ResNet-18 & 34, including V2 (pre-activation) variants.

The results were good! ResNet-18s at 73-74% and 34s at 77-78%, oh my!

See the table below for some context. I included some past 'best' ResNet results,

  • ResNet Strikes Back (a1, a1h)
  • torchvision 'Batteries Included' Resnets (tv2) that followed RSB
  • O.G. torchvision (tv) ResNets in 18-50 range

I did actually train a D variant ResNet50 w/ similar ra4 hparams, but they didn't improve upon past quite as much, likely need further hparam tweaks, more augreg.

model img_size top1 top5 param_count
resnet50d.ra4_e3600_r224_in1k 224 80.958 95.372 25.58
resnet50.tv2_in1k 224 80.856 95.43 25.56
resnet50d.a1_in1k 224 80.686 94.712 25.58
resnet50.a1h_in1k 224 80.662 95.306 25.56
resnet50.a1_in1k 224 80.368 94.59 25.56
resnetv2_34d.ra4_e3600_r224_in1k 224 78.268 93.956 21.82
resnetv2_34.ra4_e3600_r224_in1k 224 77.636 93.528 21.8
resnet34.ra4_e3600_r224_in1k 224 77.448 93.502 21.8
resnet34.a1_in1k 224 76.428 92.88 21.8
resnet50.tv_in1k 224 76.128 92.858 25.56
resnetv2_18d.ra4_e3600_r224_in1k 224 74.412 91.928 11.71
resnet18d.ra4_e3600_r224_in1k 224 74.324 91.832 11.71
resnetv2_18.ra4_e3600_r224_in1k 224 73.578 91.352 11.69
resnet34.tv_in1k 224 73.316 91.422 21.8
resnet18.a1_in1k 224 71.49 90.076 11.69
resnet18.tv_in1k 224 69.758 89.074 11.69

The new weights all scale quite nicely to higher resolutions at inference time. Some points of interest here. The tv2 and a1h ResNet50 were trained at 176x176 resolution and by evaluating at 224x224 they were attempting to hit the 'peak' in the train-test resolution discrepancy (https://arxiv.org/abs/1906.06423). When I was working on the RSB recipe I did not want to sacrifice higher res scaling by trying to bag the peak for 224x224 eval, only the low-cost a3 trained at lower res. You can see that in this 288x288 table, the a1 RSB has more to give, the tv2 is already on the downslope, and a1h just past peak. These new ra4 are res scaling champs and have a bit more to give.

model img_size top1 top5 param_count
resnet50d.ra4_e3600_r224_in1k 288 81.812 95.91 25.58
resnet50d.a1_in1k 288 81.45 95.216 25.58
resnet50.a1_in1k 288 81.232 95.108 25.56
resnet50.a1h_in1k 288 80.914 95.516 25.56
resnet50.tv2_in1k 288 80.87 95.646 25.56
resnetv2_34d.ra4_e3600_r224_in1k 288 79.59 94.77 21.82
resnetv2_34.ra4_e3600_r224_in1k 288 79.072 94.566 21.8
resnet34.ra4_e3600_r224_in1k 288 78.952 94.45 21.8
resnet34.a1_in1k 288 77.91 93.768 21.8
resnet50.tv_in1k 288 77.252 93.606 25.56
resnetv2_18d.ra4_e3600_r224_in1k 288 76.044 93.02 11.71
resnet18d.ra4_e3600_r224_in1k 288 76.024 92.78 11.71
resnetv2_18.ra4_e3600_r224_in1k 288 75.34 92.678 11.69
resnet34.tv_in1k 288 74.8 92.356 21.8
resnet18.a1_in1k 288 73.152 91.036 11.69
resnet18.tv_in1k 288 71.274 90.244 11.69