parent
e3c11a36dc
commit
e3f58fc90c
@ -1,112 +1,145 @@
|
|||||||
# Model Architectures
|
# Model Architectures
|
||||||
|
|
||||||
__FIXME - Clean This Up!__
|
The model architectures included come from a wide variety of sources. Sources, including papers, original impl ("reference code") that I rewrote / adapted, and PyTorch impl that I leveraged directly ("code") are listed below.
|
||||||
|
|
||||||
### ResNet / ResNeXt
|
Most included models have pretrained weights. The weights are either:
|
||||||
|
1. from their original sources
|
||||||
* ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152, ResNeXt50 (32x4d), ResNeXt101 (32x4d and 64x4d)
|
2. ported by myself from their original impl in a different framework (e.g. Tensorflow models)
|
||||||
* 'Bag of Tricks' / Gluon C, D, E, S variations (https://arxiv.org/abs/1812.01187)
|
3. trained from scratch using the included training script
|
||||||
* Instagram trained / ImageNet tuned ResNeXt101-32x8d to 32x48d from from [facebookresearch](https://pytorch.org/hub/facebookresearch_WSL-Images_resnext/)
|
|
||||||
* Res2Net (https://github.com/gasvn/Res2Net, https://arxiv.org/abs/1904.01169)
|
The validation results for the pretrained weights can be found [here](results.md)
|
||||||
* Selective Kernel (SK) Nets (https://arxiv.org/abs/1903.06586)
|
|
||||||
* ResNeSt (code adapted from https://github.com/zhanghang1989/ResNeSt, paper https://arxiv.org/abs/2004.08955)
|
## Cross-Stage Partial Networks [[cspnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/cspnet.py)]
|
||||||
|
* Paper: `CSPNet: A New Backbone that can Enhance Learning Capability of CNN` - https://arxiv.org/abs/1911.11929
|
||||||
Originally based on ResNet from [torchvision](https://github.com/pytorch/vision/tree/master/torchvision/models)
|
* Reference impl: https://github.com/WongKinYiu/CrossStagePartialNetworks
|
||||||
|
|
||||||
### DLA
|
## DenseNet [[densenet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/densenet.py)]
|
||||||
|
* Paper: `Densely Connected Convolutional Networks` - https://arxiv.org/abs/1608.06993
|
||||||
* Original
|
* Code: https://github.com/pytorch/vision/tree/master/torchvision/models
|
||||||
* code: https://github.com/ucbdrive/dla
|
|
||||||
* paper: https://arxiv.org/abs/1707.06484
|
## DLA [[dla.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/dla.py)]
|
||||||
* Res2Net
|
* Paper: https://arxiv.org/abs/1707.06484
|
||||||
* code: https://github.com/gasvn/Res2Net
|
* Code: https://github.com/ucbdrive/dla
|
||||||
* paper: https://arxiv.org/abs/1904.01169
|
|
||||||
|
## Dual-Path Networks [[dpn.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/dpn.py)]
|
||||||
### DenseNet
|
* Paper: `Dual Path Networks` - https://arxiv.org/abs/1707.01629
|
||||||
|
* My PyTorch code: https://github.com/rwightman/pytorch-dpn-pretrained
|
||||||
* DenseNet-121, DenseNet-169, DenseNet-201, DenseNet-161
|
* Reference code: https://github.com/cypw/DPNs
|
||||||
|
|
||||||
Code from [torchvision](https://github.com/pytorch/vision/tree/master/torchvision/models)
|
## HRNet [[hrnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/hrnet.py)]
|
||||||
|
* Paper: `Deep High-Resolution Representation Learning for Visual Recognition` - https://arxiv.org/abs/1908.07919
|
||||||
### Squeeze-and-Excitation ResNet/ResNeXt
|
* Code: https://github.com/HRNet/HRNet-Image-Classification
|
||||||
|
|
||||||
* SENet-154, SE-ResNet-18, SE-ResNet-34, SE-ResNet-50, SE-ResNet-101, SE-ResNet-152, SE-ResNeXt-26 (32x4d), SE-ResNeXt50 (32x4d), SE-ResNeXt101 (32x4d)
|
## Inception-V3 [[inception_v3.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/inception_v3.py)]
|
||||||
|
* Paper: `Rethinking the Inception Architecture for Computer Vision` - https://arxiv.org/abs/1512.00567
|
||||||
Code from [Cadene pretrained-models.pytorch](https://github.com/Cadene/pretrained-models.pytorch) with modifications
|
* Code: https://github.com/pytorch/vision/tree/master/torchvision/models
|
||||||
|
|
||||||
### Inception-V3
|
## Inception-V4 [[inception_v4.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/inception_v4.py)]
|
||||||
|
* Paper: `Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning` - https://arxiv.org/abs/1602.07261
|
||||||
Code from [torchvision](https://github.com/pytorch/vision/tree/master/torchvision/models)
|
* Code: https://github.com/Cadene/pretrained-models.pytorch
|
||||||
|
* Reference code: https://github.com/tensorflow/models/tree/master/research/slim/nets
|
||||||
### Inception-ResNet-V2 and Inception-V4
|
|
||||||
|
## Inception-ResNet-V2 [[inception_resnet_v2.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/inception_resnet_v2.py)]
|
||||||
Code from [Cadene pretrained-models.pytorch](https://github.com/Cadene/pretrained-models.pytorch)
|
* Paper: `Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning` - https://arxiv.org/abs/1602.07261
|
||||||
|
* Code: https://github.com/Cadene/pretrained-models.pytorch
|
||||||
### Xception and Aligned-Xception (DeepLab)
|
* Reference code: https://github.com/tensorflow/models/tree/master/research/slim/nets
|
||||||
|
|
||||||
* Original variant from [Cadene pretrained-models.pytorch](https://github.com/Cadene/pretrained-models.pytorch)
|
## NASNet-A [[nasnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/nasnet.py)]
|
||||||
* MXNet Gluon 'modified aligned' Xception-65 and 71 models from [Gluon ModelZoo](https://github.com/dmlc/gluon-cv/tree/master/gluoncv/model_zoo)
|
* Papers: `Learning Transferable Architectures for Scalable Image Recognition` - https://arxiv.org/abs/1707.07012
|
||||||
* DeepLab (Aligned) Xception-41, 65, and 71 from [Tensorflow Models](https://github.com/tensorflow/models/tree/master/research/deeplab)
|
* Code: https://github.com/Cadene/pretrained-models.pytorch
|
||||||
|
* Reference code: https://github.com/tensorflow/models/tree/master/research/slim/nets/nasnet
|
||||||
### PNasNet & NASNet-A
|
|
||||||
|
## PNasNet-5 [[pnasnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/pnasnet.py)]
|
||||||
Code from [Cadene pretrained-models.pytorch](https://github.com/Cadene/pretrained-models.pytorch)
|
* Papers: `Progressive Neural Architecture Search` - https://arxiv.org/abs/1712.00559
|
||||||
|
* Code: https://github.com/Cadene/pretrained-models.pytorch
|
||||||
### DPN
|
* Reference code: https://github.com/tensorflow/models/tree/master/research/slim/nets/nasnet
|
||||||
|
|
||||||
* DPN-68, DPN-68b, DPN-92, DPN-98, DPN-131, DPN-107
|
## EfficientNet [[efficientnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/efficientnet.py)]
|
||||||
|
* Papers
|
||||||
Code adapted by [myself](https://github.com/rwightman/pytorch-dpn-pretrained) from MXNet originals (https://github.com/cypw/DPNs)
|
* EfficientNet NoisyStudent (B0-B7, L2) - https://arxiv.org/abs/1911.04252
|
||||||
|
* EfficientNet AdvProp (B0-B8) - https://arxiv.org/abs/1911.09665
|
||||||
### EfficientNet
|
* EfficientNet (B0-B7) - https://arxiv.org/abs/1905.11946
|
||||||
|
* EfficientNet-EdgeTPU (S, M, L) - https://ai.googleblog.com/2019/08/efficientnet-edgetpu-creating.html
|
||||||
* EfficientNet NoisyStudent (B0-B7, L2) (https://arxiv.org/abs/1911.04252)
|
* MixNet - https://arxiv.org/abs/1907.09595
|
||||||
* EfficientNet AdvProp (B0-B8) (https://arxiv.org/abs/1911.09665)
|
* MNASNet B1, A1 (Squeeze-Excite), and Small - https://arxiv.org/abs/1807.11626
|
||||||
* EfficientNet (B0-B7) (https://arxiv.org/abs/1905.11946)
|
* MobileNet-V2 - https://arxiv.org/abs/1801.04381
|
||||||
* EfficientNet-EdgeTPU (S, M, L) (https://ai.googleblog.com/2019/08/efficientnet-edgetpu-creating.html)
|
* FBNet-C - https://arxiv.org/abs/1812.03443
|
||||||
* MixNet (https://arxiv.org/abs/1907.09595)
|
* Single-Path NAS - https://arxiv.org/abs/1904.02877
|
||||||
* MNASNet B1, A1 (Squeeze-Excite), and Small (https://arxiv.org/abs/1807.11626)
|
* My PyTorch code: https://github.com/rwightman/gen-efficientnet-pytorch
|
||||||
* MobileNet-V2 (https://arxiv.org/abs/1801.04381)
|
* Reference code: https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet
|
||||||
* FBNet-C (https://arxiv.org/abs/1812.03443)
|
|
||||||
* Single-Path NAS (https://arxiv.org/abs/1904.02877)
|
## MobileNet-V3 [[mobilenetv3.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/mobilenetv3.py)]
|
||||||
|
* Paper: `Searching for MobileNetV3` - https://arxiv.org/abs/1905.02244
|
||||||
Code from my standalone [GenEfficientNet](https://github.com/rwightman/gen-efficientnet-pytorch), adapted from [Tensorflow originals](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet).
|
* Reference code: https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet
|
||||||
|
|
||||||
### MobileNet-V3
|
## RegNet [[regnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/regnet.py)]
|
||||||
|
* Paper: `Designing Network Design Spaces` - https://arxiv.org/abs/2003.13678
|
||||||
* MobileNetV3-Large, MobileNetV3-Small (https://arxiv.org/abs/1905.02244)
|
* Reference code: https://github.com/facebookresearch/pycls/blob/master/pycls/models/regnet.py
|
||||||
|
|
||||||
Code from my standalone [GenEfficientNet](https://github.com/rwightman/gen-efficientnet-pytorch), adapted from [Tensorflow originals](https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet).
|
## ResNet, ResNeXt [[resnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/resnet.py)]
|
||||||
|
* ResNet (V1B)
|
||||||
### HRNet
|
* Paper: `Deep Residual Learning for Image Recognition` - https://arxiv.org/abs/1512.03385
|
||||||
|
* Code: https://github.com/pytorch/vision/tree/master/torchvision/models
|
||||||
* code from https://github.com/HRNet/HRNet-Image-Classification
|
* ResNeXt
|
||||||
* paper https://arxiv.org/abs/1908.07919
|
* Paper: `Aggregated Residual Transformations for Deep Neural Networks` - https://arxiv.org/abs/1611.05431
|
||||||
|
* Code: https://github.com/pytorch/vision/tree/master/torchvision/models
|
||||||
### SelecSLS
|
* 'Bag of Tricks' / Gluon C, D, E, S ResNet variants
|
||||||
|
* Paper: `Bag of Tricks for Image Classification with CNNs` - https://arxiv.org/abs/1812.01187
|
||||||
* paper https://arxiv.org/abs/1907.00837
|
* Code: https://github.com/dmlc/gluon-cv/blob/master/gluoncv/model_zoo/resnetv1b.py
|
||||||
* code from https://github.com/mehtadushy/SelecSLS-Pytorch
|
* Instagram pretrained / ImageNet tuned ResNeXt101
|
||||||
|
* Paper: `Exploring the Limits of Weakly Supervised Pretraining` - https://arxiv.org/abs/1805.00932
|
||||||
### TResNet
|
* Weights: https://pytorch.org/hub/facebookresearch_WSL-Images_resnext (NOTE: CC BY-NC 4.0 License, NOT commercial friendly)
|
||||||
|
* Semi-supervised (SSL) / Semi-weakly Supervised (SWSL) ResNet and ResNeXts
|
||||||
* paper https://arxiv.org/abs/2003.13630
|
* Paper: `Billion-scale semi-supervised learning for image classification` - https://arxiv.org/abs/1905.00546
|
||||||
* code from https://github.com/mrT23/TResNet
|
* Weights: https://github.com/facebookresearch/semi-supervised-ImageNet1K-models (NOTE: CC BY-NC 4.0 License, NOT commercial friendly)
|
||||||
|
* Squeeze-and-Excitation Networks
|
||||||
### RegNet
|
* Paper: `Squeeze-and-Excitation Networks` - https://arxiv.org/abs/1709.01507
|
||||||
|
* Code: Added code to my ResNet base, this is current version going forward, old senet.py is being deprecated
|
||||||
* paper `Designing Network Design Spaces` - https://arxiv.org/abs/2003.13678
|
* ECAResNet (ECA-Net)
|
||||||
* reference code at https://github.com/facebookresearch/pycls/blob/master/pycls/models/regnet.py
|
* Paper: `ECA-Net: Efficient Channel Attention for Deep CNN` - https://arxiv.org/abs/1910.03151v4
|
||||||
|
* Code: Added to ResNet base, ECA module contributed by @VRandme, reference https://github.com/BangguWu/ECANet
|
||||||
### VovNet V2 / V1
|
|
||||||
|
## Res2Net [[res2net.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/res2net.py)]
|
||||||
* paper `CenterMask : Real-Time Anchor-Free Instance Segmentation` - https://arxiv.org/abs/1911.06667
|
* Paper: `Res2Net: A New Multi-scale Backbone Architecture` - https://arxiv.org/abs/1904.01169
|
||||||
* reference code at https://github.com/youngwanLEE/vovnet-detectron2
|
* Code: https://github.com/gasvn/Res2Net
|
||||||
|
|
||||||
### CspNet (Cross-Stage Partial Networks)
|
## ResNeSt [[resnest.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/resnest.py)]
|
||||||
* paper `CSPNet: A New Backbone that can Enhance Learning Capability of CNN` - https://arxiv.org/abs/1911.11929
|
* Paper: `ResNeSt: Split-Attention Networks` - https://arxiv.org/abs/2004.08955
|
||||||
* reference impl at https://github.com/WongKinYiu/CrossStagePartialNetworks
|
* Code: https://github.com/zhanghang1989/ResNeSt
|
||||||
|
|
||||||
### ReXNet
|
## ReXNet [[rexnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/rexnet.py)]
|
||||||
* paper `ReXNet: Diminishing Representational Bottleneck on CNN` - https://arxiv.org/abs/2007.00992
|
* Paper: `ReXNet: Diminishing Representational Bottleneck on CNN` - https://arxiv.org/abs/2007.00992
|
||||||
* code from https://github.com/clovaai/rexnet
|
* Code: https://github.com/clovaai/rexnet
|
||||||
|
|
||||||
|
## Selective-Kernel Networks [[sknet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/sknet.py)]
|
||||||
|
* Paper: `Selective-Kernel Networks` - https://arxiv.org/abs/1903.06586
|
||||||
|
* Code: https://github.com/implus/SKNet, https://github.com/clovaai/assembled-cnn
|
||||||
|
|
||||||
|
## SelecSLS [[selecsls.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/selecsls.py)]
|
||||||
|
* Paper: `XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera` - https://arxiv.org/abs/1907.00837
|
||||||
|
* Code: https://github.com/mehtadushy/SelecSLS-Pytorch
|
||||||
|
|
||||||
|
## Squeeze-and-Excitation Networks [[senet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/senet.py)]
|
||||||
|
NOTE: I am deprecating this version of the networks, the new ones are part of `resnet.py`
|
||||||
|
* Paper: `Squeeze-and-Excitation Networks` - https://arxiv.org/abs/1709.01507
|
||||||
|
* Code: https://github.com/Cadene/pretrained-models.pytorch
|
||||||
|
|
||||||
|
## TResNet [[tresnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/tresnet.py)]
|
||||||
|
* Paper: `TResNet: High Performance GPU-Dedicated Architecture` - https://arxiv.org/abs/2003.13630
|
||||||
|
* Code: https://github.com/mrT23/TResNet
|
||||||
|
|
||||||
|
## VovNet V2 and V1 [[vovnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vovnet.py)]
|
||||||
|
* Paper: `CenterMask : Real-Time Anchor-Free Instance Segmentation` - https://arxiv.org/abs/1911.06667
|
||||||
|
* Reference code: https://github.com/youngwanLEE/vovnet-detectron2
|
||||||
|
|
||||||
|
## Xception [[xception.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/xception.py)]
|
||||||
|
* Paper: `Xception: Deep Learning with Depthwise Separable Convolutions` - https://arxiv.org/abs/1610.02357
|
||||||
|
* Code: https://github.com/Cadene/pretrained-models.pytorch
|
||||||
|
|
||||||
|
## Xception (Modified Aligned, Gluon) [[gluon_xception.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/gluon_xception.py)]
|
||||||
|
* Paper: `Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation` - https://arxiv.org/abs/1802.02611
|
||||||
|
* Reference code: https://github.com/dmlc/gluon-cv/tree/master/gluoncv/model_zoo, https://github.com/jfzhang95/pytorch-deeplab-xception/
|
||||||
|
|
||||||
|
## Xception (Modified Aligned, TF) [[aligned_xception.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/aligned_xception.py)]
|
||||||
|
* Paper: `Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation` - https://arxiv.org/abs/1802.02611
|
||||||
|
* Reference code: https://github.com/tensorflow/models/tree/master/research/deeplab
|
||||||
|
@ -1,45 +1,45 @@
|
|||||||
## Training Hyperparameter Examples
|
# Training Examples
|
||||||
|
|
||||||
### EfficientNet-B2 with RandAugment - 80.4 top-1, 95.1 top-5
|
## EfficientNet-B2 with RandAugment - 80.4 top-1, 95.1 top-5
|
||||||
These params are for dual Titan RTX cards with NVIDIA Apex installed:
|
These params are for dual Titan RTX cards with NVIDIA Apex installed:
|
||||||
|
|
||||||
`./distributed_train.sh 2 /imagenet/ --model efficientnet_b2 -b 128 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .97 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.3 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .016`
|
`./distributed_train.sh 2 /imagenet/ --model efficientnet_b2 -b 128 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .97 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.3 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .016`
|
||||||
|
|
||||||
### MixNet-XL with RandAugment - 80.5 top-1, 94.9 top-5
|
## MixNet-XL with RandAugment - 80.5 top-1, 94.9 top-5
|
||||||
This params are for dual Titan RTX cards with NVIDIA Apex installed:
|
This params are for dual Titan RTX cards with NVIDIA Apex installed:
|
||||||
|
|
||||||
`./distributed_train.sh 2 /imagenet/ --model mixnet_xl -b 128 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .969 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.3 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.3 --amp --lr .016 --dist-bn reduce`
|
`./distributed_train.sh 2 /imagenet/ --model mixnet_xl -b 128 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .969 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.3 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.3 --amp --lr .016 --dist-bn reduce`
|
||||||
|
|
||||||
### SE-ResNeXt-26-D and SE-ResNeXt-26-T
|
## SE-ResNeXt-26-D and SE-ResNeXt-26-T
|
||||||
These hparams (or similar) work well for a wide range of ResNet architecture, generally a good idea to increase the epoch # as the model size increases... ie approx 180-200 for ResNe(X)t50, and 220+ for larger. Increase batch size and LR proportionally for better GPUs or with AMP enabled. These params were for 2 1080Ti cards:
|
These hparams (or similar) work well for a wide range of ResNet architecture, generally a good idea to increase the epoch # as the model size increases... ie approx 180-200 for ResNe(X)t50, and 220+ for larger. Increase batch size and LR proportionally for better GPUs or with AMP enabled. These params were for 2 1080Ti cards:
|
||||||
|
|
||||||
`./distributed_train.sh 2 /imagenet/ --model seresnext26t_32x4d --lr 0.1 --warmup-epochs 5 --epochs 160 --weight-decay 1e-4 --sched cosine --reprob 0.4 --remode pixel -b 112`
|
`./distributed_train.sh 2 /imagenet/ --model seresnext26t_32x4d --lr 0.1 --warmup-epochs 5 --epochs 160 --weight-decay 1e-4 --sched cosine --reprob 0.4 --remode pixel -b 112`
|
||||||
|
|
||||||
### EfficientNet-B3 with RandAugment - 81.5 top-1, 95.7 top-5
|
## EfficientNet-B3 with RandAugment - 81.5 top-1, 95.7 top-5
|
||||||
The training of this model started with the same command line as EfficientNet-B2 w/ RA above. After almost three weeks of training the process crashed. The results weren't looking amazing so I resumed the training several times with tweaks to a few params (increase RE prob, decrease rand-aug, increase ema-decay). Nothing looked great. I ended up averaging the best checkpoints from all restarts. The result is mediocre at default res/crop but oddly performs much better with a full image test crop of 1.0.
|
The training of this model started with the same command line as EfficientNet-B2 w/ RA above. After almost three weeks of training the process crashed. The results weren't looking amazing so I resumed the training several times with tweaks to a few params (increase RE prob, decrease rand-aug, increase ema-decay). Nothing looked great. I ended up averaging the best checkpoints from all restarts. The result is mediocre at default res/crop but oddly performs much better with a full image test crop of 1.0.
|
||||||
|
|
||||||
### EfficientNet-B0 with RandAugment - 77.7 top-1, 95.3 top-5
|
## EfficientNet-B0 with RandAugment - 77.7 top-1, 95.3 top-5
|
||||||
[Michael Klachko](https://github.com/michaelklachko) achieved these results with the command line for B2 adapted for larger batch size, with the recommended B0 dropout rate of 0.2.
|
[Michael Klachko](https://github.com/michaelklachko) achieved these results with the command line for B2 adapted for larger batch size, with the recommended B0 dropout rate of 0.2.
|
||||||
|
|
||||||
`./distributed_train.sh 2 /imagenet/ --model efficientnet_b0 -b 384 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .97 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.2 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .048`
|
`./distributed_train.sh 2 /imagenet/ --model efficientnet_b0 -b 384 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .97 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.2 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .048`
|
||||||
|
|
||||||
### ResNet50 with JSD loss and RandAugment (clean + 2x RA augs) - 79.04 top-1, 94.39 top-5
|
## ResNet50 with JSD loss and RandAugment (clean + 2x RA augs) - 79.04 top-1, 94.39 top-5
|
||||||
|
|
||||||
Trained on two older 1080Ti cards, this took a while. Only slightly, non statistically better ImageNet validation result than my first good AugMix training of 78.99. However, these weights are more robust on tests with ImageNetV2, ImageNet-Sketch, etc. Unlike my first AugMix runs, I've enabled SplitBatchNorm, disabled random erasing on the clean split, and cranked up random erasing prob on the 2 augmented paths.
|
Trained on two older 1080Ti cards, this took a while. Only slightly, non statistically better ImageNet validation result than my first good AugMix training of 78.99. However, these weights are more robust on tests with ImageNetV2, ImageNet-Sketch, etc. Unlike my first AugMix runs, I've enabled SplitBatchNorm, disabled random erasing on the clean split, and cranked up random erasing prob on the 2 augmented paths.
|
||||||
|
|
||||||
`./distributed_train.sh 2 /imagenet -b 64 --model resnet50 --sched cosine --epochs 200 --lr 0.05 --amp --remode pixel --reprob 0.6 --aug-splits 3 --aa rand-m9-mstd0.5-inc1 --resplit --split-bn --jsd --dist-bn reduce`
|
`./distributed_train.sh 2 /imagenet -b 64 --model resnet50 --sched cosine --epochs 200 --lr 0.05 --amp --remode pixel --reprob 0.6 --aug-splits 3 --aa rand-m9-mstd0.5-inc1 --resplit --split-bn --jsd --dist-bn reduce`
|
||||||
|
|
||||||
### EfficientNet-ES (EdgeTPU-Small) with RandAugment - 78.066 top-1, 93.926 top-5
|
## EfficientNet-ES (EdgeTPU-Small) with RandAugment - 78.066 top-1, 93.926 top-5
|
||||||
Trained by [Andrew Lavin](https://github.com/andravin) with 8 V100 cards. Model EMA was not used, final checkpoint is the average of 8 best checkpoints during training.
|
Trained by [Andrew Lavin](https://github.com/andravin) with 8 V100 cards. Model EMA was not used, final checkpoint is the average of 8 best checkpoints during training.
|
||||||
|
|
||||||
`./distributed_train.sh 8 /imagenet --model efficientnet_es -b 128 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .97 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.2 --drop-connect 0.2 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .064`
|
`./distributed_train.sh 8 /imagenet --model efficientnet_es -b 128 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .97 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.2 --drop-connect 0.2 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .064`
|
||||||
|
|
||||||
### MobileNetV3-Large-100 - 75.766 top-1, 92,542 top-5
|
## MobileNetV3-Large-100 - 75.766 top-1, 92,542 top-5
|
||||||
|
|
||||||
`./distributed_train.sh 2 /imagenet/ --model mobilenetv3_large_100 -b 512 --sched step --epochs 600 --decay-epochs 2.4 --decay-rate .973 --opt rmsproptf --opt-eps .001 -j 7 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.2 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .064 --lr-noise 0.42 0.9`
|
`./distributed_train.sh 2 /imagenet/ --model mobilenetv3_large_100 -b 512 --sched step --epochs 600 --decay-epochs 2.4 --decay-rate .973 --opt rmsproptf --opt-eps .001 -j 7 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.2 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .064 --lr-noise 0.42 0.9`
|
||||||
|
|
||||||
|
|
||||||
### ResNeXt-50 32x4d w/ RandAugment - 79.762 top-1, 94.60 top-5
|
## ResNeXt-50 32x4d w/ RandAugment - 79.762 top-1, 94.60 top-5
|
||||||
These params will also work well for SE-ResNeXt-50 and SK-ResNeXt-50 and likely 101. I used them for the SK-ResNeXt-50 32x4d that I trained with 2 GPU using a slightly higher LR per effective batch size (lr=0.18, b=192 per GPU). The cmd line below are tuned for 8 GPU training.
|
These params will also work well for SE-ResNeXt-50 and SK-ResNeXt-50 and likely 101. I used them for the SK-ResNeXt-50 32x4d that I trained with 2 GPU using a slightly higher LR per effective batch size (lr=0.18, b=192 per GPU). The cmd line below are tuned for 8 GPU training.
|
||||||
|
|
||||||
|
|
Loading…
Reference in new issue