Continuing README and documentation updates

pull/175/head
Ross Wightman 4 years ago
parent e3c11a36dc
commit e3f58fc90c

@ -4,11 +4,11 @@
### Aug 1, 2020 ### Aug 1, 2020
Universal feature extraction, new models, new weights, new test sets. Universal feature extraction, new models, new weights, new test sets.
* All models support the `features_only=True` argument for `create_model` call to return a network that extracts features from the deepest layer at each stride. * All models support the `features_only=True` argument for `create_model` call to return a network that extracts feature maps from the deepest layer at each stride.
* New models * New models
* CSPResNet, CSPResNeXt, CSPDarkNet, DarkNet * CSPResNet, CSPResNeXt, CSPDarkNet, DarkNet
* ReXNet * ReXNet
* (Aligned) Xception41/65/71 (a proper port of TF models) * (Modified Aligned) Xception41/65/71 (a proper port of TF models)
* New trained weights * New trained weights
* SEResNet50 - 80.3 * SEResNet50 - 80.3
* CSPDarkNet53 - 80.1 top-1 * CSPDarkNet53 - 80.1 top-1
@ -56,76 +56,58 @@ Bunch of changes:
## Introduction ## Introduction
Py**T**orch **Im**age **M**odels is a collection of image models, layers, utilities, optimizers, schedulers, data-loaders / augmentations, and reference training / validation scripts that aim to pull together a wide variety of SOTA models with ability to reproduce ImageNet training results. Py**T**orch **Im**age **M**odels (`timm`) is a collection of image models, layers, utilities, optimizers, schedulers, data-loaders / augmentations, and reference training / validation scripts that aim to pull together a wide variety of SOTA models with ability to reproduce ImageNet training results.
The work of many others is present here. I've tried to make sure all source material is acknowledged via links to github, arxiv papers, etc in the README, documentation, and code comments. Please let me know if I missed anything. The work of many others is present here. I've tried to make sure all source material is acknowledged via links to github, arxiv papers, etc in the README, documentation, and code docstrings. Please let me know if I missed anything.
## Models ## Models
Most included models have pretrained weights. The weights are either from their original sources, ported by myself from their original framework (e.g. Tensorflow models), or trained from scratch using the included training script. Most included models have pretrained weights. The weights are either from their original sources, ported by myself from their original framework (e.g. Tensorflow models), or trained from scratch using the included training script. A full version of the list below with source links and references can be found in the [documentation](https://rwightman.github.io/pytorch-image-models/models/).
Included models: * CspNet (Cross-Stage Partial Networks) - https://arxiv.org/abs/1911.11929
* ResNet/ResNeXt (from [torchvision](https://github.com/pytorch/vision/tree/master/torchvision/models) with mods by myself) * DenseNet - https://arxiv.org/abs/1608.06993
* ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152, ResNeXt50 (32x4d), ResNeXt101 (32x4d and 64x4d) * DLA - https://arxiv.org/abs/1707.06484
* 'Bag of Tricks' / Gluon C, D, E, S variations (https://arxiv.org/abs/1812.01187) * DPN (Dual-Path Network) - https://arxiv.org/abs/1707.01629
* Instagram trained / ImageNet tuned ResNeXt101-32x8d to 32x48d from from [facebookresearch](https://pytorch.org/hub/facebookresearch_WSL-Images_resnext/) * EfficientNet (MBConvNet Family)
* Res2Net (https://github.com/gasvn/Res2Net, https://arxiv.org/abs/1904.01169) * EfficientNet NoisyStudent (B0-B7, L2) - https://arxiv.org/abs/1911.04252
* Selective Kernel (SK) Nets (https://arxiv.org/abs/1903.06586) * EfficientNet AdvProp (B0-B8) - https://arxiv.org/abs/1911.09665
* ResNeSt (code adapted from https://github.com/zhanghang1989/ResNeSt, paper https://arxiv.org/abs/2004.08955) * EfficientNet (B0-B7) - https://arxiv.org/abs/1905.11946
* DLA * EfficientNet-EdgeTPU (S, M, L) - https://ai.googleblog.com/2019/08/efficientnet-edgetpu-creating.html
* Original (https://github.com/ucbdrive/dla, https://arxiv.org/abs/1707.06484) * FBNet-C - https://arxiv.org/abs/1812.03443
* Res2Net (https://github.com/gasvn/Res2Net, https://arxiv.org/abs/1904.01169) * MixNet - https://arxiv.org/abs/1907.09595
* DenseNet (from [torchvision](https://github.com/pytorch/vision/tree/master/torchvision/models)) * MNASNet B1, A1 (Squeeze-Excite), and Small - https://arxiv.org/abs/1807.11626
* DenseNet-121, DenseNet-169, DenseNet-201, DenseNet-161 * MobileNet-V2 - https://arxiv.org/abs/1801.04381
* Squeeze-and-Excitation ResNet/ResNeXt (from [Cadene](https://github.com/Cadene/pretrained-models.pytorch) with some pretrained weight additions by myself) * Single-Path NAS - https://arxiv.org/abs/1904.02877
* SENet-154, SE-ResNet-18, SE-ResNet-34, SE-ResNet-50, SE-ResNet-101, SE-ResNet-152, SE-ResNeXt-26 (32x4d), SE-ResNeXt50 (32x4d), SE-ResNeXt101 (32x4d) * HRNet - https://arxiv.org/abs/1908.07919
* Inception-V3 (from [torchvision](https://github.com/pytorch/vision/tree/master/torchvision/models)) * Inception-V3 - https://arxiv.org/abs/1512.00567
* Inception-ResNet-V2 and Inception-V4 (from [Cadene](https://github.com/Cadene/pretrained-models.pytorch) ) * Inception-ResNet-V2 and Inception-V4 - https://arxiv.org/abs/1602.07261
* Xception * MobileNet-V3 (MBConvNet w/ Efficient Head) - https://arxiv.org/abs/1905.02244
* Original Xception from [Cadene](https://github.com/Cadene/pretrained-models.pytorch) * NASNet-A - https://arxiv.org/abs/1707.07012
* MXNet Gluon 'modified aligned' Xception-65 from [Gluon ModelZoo](https://github.com/dmlc/gluon-cv/tree/master/gluoncv/model_zoo) * PNasNet - https://arxiv.org/abs/1712.00559
* DeepLab (Aligned) Xception-41, 65, and 71 from [Tensorflow Models](https://github.com/tensorflow/models/tree/master/research/deeplab) * RegNet - https://arxiv.org/abs/2003.13678
* PNasNet & NASNet-A (from [Cadene](https://github.com/Cadene/pretrained-models.pytorch)) * ResNet/ResNeXt
* DPN (from [myself](https://github.com/rwightman/pytorch-dpn-pretrained)) * ResNet (v1b/v1.5) - https://arxiv.org/abs/1512.03385
* DPN-68, DPN-68b, DPN-92, DPN-98, DPN-131, DPN-107 * ResNeXt - https://arxiv.org/abs/1611.05431
* EfficientNet (from my standalone [GenEfficientNet](https://github.com/rwightman/gen-efficientnet-pytorch)) - A generic model that implements many of the efficient models that utilize similar DepthwiseSeparable and InvertedResidual blocks * 'Bag of Tricks' / Gluon C, D, E, S variations - https://arxiv.org/abs/1812.01187
* EfficientNet NoisyStudent (B0-B7, L2) (https://arxiv.org/abs/1911.04252) * Weakly-supervised (WSL) Instagram pretrained / ImageNet tuned ResNeXt101 - https://arxiv.org/abs/1805.00932
* EfficientNet AdvProp (B0-B8) (https://arxiv.org/abs/1911.09665) * Semi-supervised (SSL) / Semi-weakly Supervised (SWSL) ResNet/ResNeXts - https://arxiv.org/abs/1905.00546
* EfficientNet (B0-B7) (https://arxiv.org/abs/1905.11946) * ECA-Net (ECAResNet) - https://arxiv.org/abs/1910.03151v4
* EfficientNet-EdgeTPU (S, M, L) (https://ai.googleblog.com/2019/08/efficientnet-edgetpu-creating.html) * Squeeze-and-Excitation Networks (SEResNet) - https://arxiv.org/abs/1709.01507
* MixNet (https://arxiv.org/abs/1907.09595) * Res2Net - https://arxiv.org/abs/1904.01169
* MNASNet B1, A1 (Squeeze-Excite), and Small (https://arxiv.org/abs/1807.11626) * ResNeSt - https://arxiv.org/abs/2004.08955
* MobileNet-V2 (https://arxiv.org/abs/1801.04381) * ReXNet - https://arxiv.org/abs/2007.00992
* FBNet-C (https://arxiv.org/abs/1812.03443) * SelecSLS - https://arxiv.org/abs/1907.00837
* Single-Path NAS (https://arxiv.org/abs/1904.02877) * Selective Kernel Networks - https://arxiv.org/abs/1903.06586
* MobileNet-V3 (https://arxiv.org/abs/1905.02244) * TResNet - https://arxiv.org/abs/2003.13630
* HRNet * VovNet V2 (with V1 support) - https://arxiv.org/abs/1911.06667
* code from https://github.com/HRNet/HRNet-Image-Classification * Xception - https://arxiv.org/abs/1610.02357
* paper https://arxiv.org/abs/1908.07919 * Xception (Modified Aligned, Gluon) - https://arxiv.org/abs/1802.02611
* SelecSLS * Xception (Modified Aligned, TF) - https://arxiv.org/abs/1802.02611
* code from https://github.com/mehtadushy/SelecSLS-Pytorch
* paper https://arxiv.org/abs/1907.00837
* TResNet
* code from https://github.com/mrT23/TResNet
* paper https://arxiv.org/abs/2003.13630
* RegNet
* paper `Designing Network Design Spaces` - https://arxiv.org/abs/2003.13678
* reference code at https://github.com/facebookresearch/pycls/blob/master/pycls/models/regnet.py
* VovNet V2 (with V1 support)
* paper `CenterMask : Real-Time Anchor-Free Instance Segmentation` - https://arxiv.org/abs/1911.06667
* reference code at https://github.com/youngwanLEE/vovnet-detectron2
* CspNet (Cross-Stage Partial Networks)
* paper `CSPNet: A New Backbone that can Enhance Learning Capability of CNN` - https://arxiv.org/abs/1911.11929
* reference impl at https://github.com/WongKinYiu/CrossStagePartialNetworks
* ReXNet
* paper `ReXNet: Diminishing Representational Bottleneck on CNN` - https://arxiv.org/abs/2007.00992
* code from https://github.com/clovaai/rexnet
Use the `--model` arg to specify model for train, validation, inference scripts. Match the all lowercase
creation fn for the model you'd like.
## Features ## Features
Several (less common) features that I often utilize in my projects are included. Many of their additions are the reason why I maintain my own set of models, instead of using others' via PIP: Several (less common) features that I often utilize in my projects are included. Many of their additions are the reason why I maintain my own set of models, instead of using others' via PIP:
* All models have a common default configuration interface and API for * All models have a common default configuration interface and API for
* accessing/changing the classifier - `get_classifier` and `reset_classifier` * accessing/changing the classifier - `get_classifier` and `reset_classifier`
* doing a forward pass on just the features - `forward_features` * doing a forward pass on just the features - `forward_features`
@ -165,6 +147,7 @@ Several (less common) features that I often utilize in my projects are included.
* DropBlock (https://arxiv.org/abs/1810.12890) * DropBlock (https://arxiv.org/abs/1810.12890)
* Efficient Channel Attention - ECA (https://arxiv.org/abs/1910.03151) * Efficient Channel Attention - ECA (https://arxiv.org/abs/1910.03151)
* Blur Pooling (https://arxiv.org/abs/1904.11486) * Blur Pooling (https://arxiv.org/abs/1904.11486)
* Space-to-Depth by [mrT23](https://github.com/mrT23/TResNet/blob/master/src/models/tresnet/layers/space_to_depth.py) (https://arxiv.org/abs/1801.04590) -- original paper?
## Results ## Results
@ -172,4 +155,4 @@ Model validation results can be found in the [documentation](https://rwightman.g
## Getting Started ## Getting Started
See [documentation](https://rwightman.github.io/pytorch-image-models/) See the [documentation](https://rwightman.github.io/pytorch-image-models/)

@ -38,8 +38,8 @@ m.eval()
```python ```python
import timm import timm
from pprint import pprint from pprint import pprint
m = timm.create_model('mobilenetv3_large_100', pretrained=True) model_names = timm.list_models(pretrained=True)
pprint(timm.list_models(pretrained=True)) pprint(model_names)
>>> ['adv_inception_v3', >>> ['adv_inception_v3',
'cspdarknet53', 'cspdarknet53',
'cspresnext50', 'cspresnext50',
@ -58,7 +58,8 @@ pprint(timm.list_models(pretrained=True))
```python ```python
import timm import timm
from pprint import pprint from pprint import pprint
pprint(timm.list_models('*resne*t*')) model_names = timm.list_models('*resne*t*')
pprint(model_names)
>>> ['cspresnet50', >>> ['cspresnet50',
'cspresnet50d', 'cspresnet50d',
'cspresnet50w', 'cspresnet50w',

@ -1,112 +1,145 @@
# Model Architectures # Model Architectures
__FIXME - Clean This Up!__ The model architectures included come from a wide variety of sources. Sources, including papers, original impl ("reference code") that I rewrote / adapted, and PyTorch impl that I leveraged directly ("code") are listed below.
### ResNet / ResNeXt Most included models have pretrained weights. The weights are either:
1. from their original sources
* ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152, ResNeXt50 (32x4d), ResNeXt101 (32x4d and 64x4d) 2. ported by myself from their original impl in a different framework (e.g. Tensorflow models)
* 'Bag of Tricks' / Gluon C, D, E, S variations (https://arxiv.org/abs/1812.01187) 3. trained from scratch using the included training script
* Instagram trained / ImageNet tuned ResNeXt101-32x8d to 32x48d from from [facebookresearch](https://pytorch.org/hub/facebookresearch_WSL-Images_resnext/)
* Res2Net (https://github.com/gasvn/Res2Net, https://arxiv.org/abs/1904.01169) The validation results for the pretrained weights can be found [here](results.md)
* Selective Kernel (SK) Nets (https://arxiv.org/abs/1903.06586)
* ResNeSt (code adapted from https://github.com/zhanghang1989/ResNeSt, paper https://arxiv.org/abs/2004.08955) ## Cross-Stage Partial Networks [[cspnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/cspnet.py)]
* Paper: `CSPNet: A New Backbone that can Enhance Learning Capability of CNN` - https://arxiv.org/abs/1911.11929
Originally based on ResNet from [torchvision](https://github.com/pytorch/vision/tree/master/torchvision/models) * Reference impl: https://github.com/WongKinYiu/CrossStagePartialNetworks
### DLA ## DenseNet [[densenet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/densenet.py)]
* Paper: `Densely Connected Convolutional Networks` - https://arxiv.org/abs/1608.06993
* Original * Code: https://github.com/pytorch/vision/tree/master/torchvision/models
* code: https://github.com/ucbdrive/dla
* paper: https://arxiv.org/abs/1707.06484 ## DLA [[dla.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/dla.py)]
* Res2Net * Paper: https://arxiv.org/abs/1707.06484
* code: https://github.com/gasvn/Res2Net * Code: https://github.com/ucbdrive/dla
* paper: https://arxiv.org/abs/1904.01169
## Dual-Path Networks [[dpn.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/dpn.py)]
### DenseNet * Paper: `Dual Path Networks` - https://arxiv.org/abs/1707.01629
* My PyTorch code: https://github.com/rwightman/pytorch-dpn-pretrained
* DenseNet-121, DenseNet-169, DenseNet-201, DenseNet-161 * Reference code: https://github.com/cypw/DPNs
Code from [torchvision](https://github.com/pytorch/vision/tree/master/torchvision/models) ## HRNet [[hrnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/hrnet.py)]
* Paper: `Deep High-Resolution Representation Learning for Visual Recognition` - https://arxiv.org/abs/1908.07919
### Squeeze-and-Excitation ResNet/ResNeXt * Code: https://github.com/HRNet/HRNet-Image-Classification
* SENet-154, SE-ResNet-18, SE-ResNet-34, SE-ResNet-50, SE-ResNet-101, SE-ResNet-152, SE-ResNeXt-26 (32x4d), SE-ResNeXt50 (32x4d), SE-ResNeXt101 (32x4d) ## Inception-V3 [[inception_v3.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/inception_v3.py)]
* Paper: `Rethinking the Inception Architecture for Computer Vision` - https://arxiv.org/abs/1512.00567
Code from [Cadene pretrained-models.pytorch](https://github.com/Cadene/pretrained-models.pytorch) with modifications * Code: https://github.com/pytorch/vision/tree/master/torchvision/models
### Inception-V3 ## Inception-V4 [[inception_v4.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/inception_v4.py)]
* Paper: `Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning` - https://arxiv.org/abs/1602.07261
Code from [torchvision](https://github.com/pytorch/vision/tree/master/torchvision/models) * Code: https://github.com/Cadene/pretrained-models.pytorch
* Reference code: https://github.com/tensorflow/models/tree/master/research/slim/nets
### Inception-ResNet-V2 and Inception-V4
## Inception-ResNet-V2 [[inception_resnet_v2.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/inception_resnet_v2.py)]
Code from [Cadene pretrained-models.pytorch](https://github.com/Cadene/pretrained-models.pytorch) * Paper: `Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning` - https://arxiv.org/abs/1602.07261
* Code: https://github.com/Cadene/pretrained-models.pytorch
### Xception and Aligned-Xception (DeepLab) * Reference code: https://github.com/tensorflow/models/tree/master/research/slim/nets
* Original variant from [Cadene pretrained-models.pytorch](https://github.com/Cadene/pretrained-models.pytorch) ## NASNet-A [[nasnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/nasnet.py)]
* MXNet Gluon 'modified aligned' Xception-65 and 71 models from [Gluon ModelZoo](https://github.com/dmlc/gluon-cv/tree/master/gluoncv/model_zoo) * Papers: `Learning Transferable Architectures for Scalable Image Recognition` - https://arxiv.org/abs/1707.07012
* DeepLab (Aligned) Xception-41, 65, and 71 from [Tensorflow Models](https://github.com/tensorflow/models/tree/master/research/deeplab) * Code: https://github.com/Cadene/pretrained-models.pytorch
* Reference code: https://github.com/tensorflow/models/tree/master/research/slim/nets/nasnet
### PNasNet & NASNet-A
## PNasNet-5 [[pnasnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/pnasnet.py)]
Code from [Cadene pretrained-models.pytorch](https://github.com/Cadene/pretrained-models.pytorch) * Papers: `Progressive Neural Architecture Search` - https://arxiv.org/abs/1712.00559
* Code: https://github.com/Cadene/pretrained-models.pytorch
### DPN * Reference code: https://github.com/tensorflow/models/tree/master/research/slim/nets/nasnet
* DPN-68, DPN-68b, DPN-92, DPN-98, DPN-131, DPN-107 ## EfficientNet [[efficientnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/efficientnet.py)]
* Papers
Code adapted by [myself](https://github.com/rwightman/pytorch-dpn-pretrained) from MXNet originals (https://github.com/cypw/DPNs) * EfficientNet NoisyStudent (B0-B7, L2) - https://arxiv.org/abs/1911.04252
* EfficientNet AdvProp (B0-B8) - https://arxiv.org/abs/1911.09665
### EfficientNet * EfficientNet (B0-B7) - https://arxiv.org/abs/1905.11946
* EfficientNet-EdgeTPU (S, M, L) - https://ai.googleblog.com/2019/08/efficientnet-edgetpu-creating.html
* EfficientNet NoisyStudent (B0-B7, L2) (https://arxiv.org/abs/1911.04252) * MixNet - https://arxiv.org/abs/1907.09595
* EfficientNet AdvProp (B0-B8) (https://arxiv.org/abs/1911.09665) * MNASNet B1, A1 (Squeeze-Excite), and Small - https://arxiv.org/abs/1807.11626
* EfficientNet (B0-B7) (https://arxiv.org/abs/1905.11946) * MobileNet-V2 - https://arxiv.org/abs/1801.04381
* EfficientNet-EdgeTPU (S, M, L) (https://ai.googleblog.com/2019/08/efficientnet-edgetpu-creating.html) * FBNet-C - https://arxiv.org/abs/1812.03443
* MixNet (https://arxiv.org/abs/1907.09595) * Single-Path NAS - https://arxiv.org/abs/1904.02877
* MNASNet B1, A1 (Squeeze-Excite), and Small (https://arxiv.org/abs/1807.11626) * My PyTorch code: https://github.com/rwightman/gen-efficientnet-pytorch
* MobileNet-V2 (https://arxiv.org/abs/1801.04381) * Reference code: https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet
* FBNet-C (https://arxiv.org/abs/1812.03443)
* Single-Path NAS (https://arxiv.org/abs/1904.02877) ## MobileNet-V3 [[mobilenetv3.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/mobilenetv3.py)]
* Paper: `Searching for MobileNetV3` - https://arxiv.org/abs/1905.02244
Code from my standalone [GenEfficientNet](https://github.com/rwightman/gen-efficientnet-pytorch), adapted from [Tensorflow originals](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet). * Reference code: https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet
### MobileNet-V3 ## RegNet [[regnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/regnet.py)]
* Paper: `Designing Network Design Spaces` - https://arxiv.org/abs/2003.13678
* MobileNetV3-Large, MobileNetV3-Small (https://arxiv.org/abs/1905.02244) * Reference code: https://github.com/facebookresearch/pycls/blob/master/pycls/models/regnet.py
Code from my standalone [GenEfficientNet](https://github.com/rwightman/gen-efficientnet-pytorch), adapted from [Tensorflow originals](https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet). ## ResNet, ResNeXt [[resnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/resnet.py)]
* ResNet (V1B)
### HRNet * Paper: `Deep Residual Learning for Image Recognition` - https://arxiv.org/abs/1512.03385
* Code: https://github.com/pytorch/vision/tree/master/torchvision/models
* code from https://github.com/HRNet/HRNet-Image-Classification * ResNeXt
* paper https://arxiv.org/abs/1908.07919 * Paper: `Aggregated Residual Transformations for Deep Neural Networks` - https://arxiv.org/abs/1611.05431
* Code: https://github.com/pytorch/vision/tree/master/torchvision/models
### SelecSLS * 'Bag of Tricks' / Gluon C, D, E, S ResNet variants
* Paper: `Bag of Tricks for Image Classification with CNNs` - https://arxiv.org/abs/1812.01187
* paper https://arxiv.org/abs/1907.00837 * Code: https://github.com/dmlc/gluon-cv/blob/master/gluoncv/model_zoo/resnetv1b.py
* code from https://github.com/mehtadushy/SelecSLS-Pytorch * Instagram pretrained / ImageNet tuned ResNeXt101
* Paper: `Exploring the Limits of Weakly Supervised Pretraining` - https://arxiv.org/abs/1805.00932
### TResNet * Weights: https://pytorch.org/hub/facebookresearch_WSL-Images_resnext (NOTE: CC BY-NC 4.0 License, NOT commercial friendly)
* Semi-supervised (SSL) / Semi-weakly Supervised (SWSL) ResNet and ResNeXts
* paper https://arxiv.org/abs/2003.13630 * Paper: `Billion-scale semi-supervised learning for image classification` - https://arxiv.org/abs/1905.00546
* code from https://github.com/mrT23/TResNet * Weights: https://github.com/facebookresearch/semi-supervised-ImageNet1K-models (NOTE: CC BY-NC 4.0 License, NOT commercial friendly)
* Squeeze-and-Excitation Networks
### RegNet * Paper: `Squeeze-and-Excitation Networks` - https://arxiv.org/abs/1709.01507
* Code: Added code to my ResNet base, this is current version going forward, old senet.py is being deprecated
* paper `Designing Network Design Spaces` - https://arxiv.org/abs/2003.13678 * ECAResNet (ECA-Net)
* reference code at https://github.com/facebookresearch/pycls/blob/master/pycls/models/regnet.py * Paper: `ECA-Net: Efficient Channel Attention for Deep CNN` - https://arxiv.org/abs/1910.03151v4
* Code: Added to ResNet base, ECA module contributed by @VRandme, reference https://github.com/BangguWu/ECANet
### VovNet V2 / V1
## Res2Net [[res2net.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/res2net.py)]
* paper `CenterMask : Real-Time Anchor-Free Instance Segmentation` - https://arxiv.org/abs/1911.06667 * Paper: `Res2Net: A New Multi-scale Backbone Architecture` - https://arxiv.org/abs/1904.01169
* reference code at https://github.com/youngwanLEE/vovnet-detectron2 * Code: https://github.com/gasvn/Res2Net
### CspNet (Cross-Stage Partial Networks) ## ResNeSt [[resnest.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/resnest.py)]
* paper `CSPNet: A New Backbone that can Enhance Learning Capability of CNN` - https://arxiv.org/abs/1911.11929 * Paper: `ResNeSt: Split-Attention Networks` - https://arxiv.org/abs/2004.08955
* reference impl at https://github.com/WongKinYiu/CrossStagePartialNetworks * Code: https://github.com/zhanghang1989/ResNeSt
### ReXNet ## ReXNet [[rexnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/rexnet.py)]
* paper `ReXNet: Diminishing Representational Bottleneck on CNN` - https://arxiv.org/abs/2007.00992 * Paper: `ReXNet: Diminishing Representational Bottleneck on CNN` - https://arxiv.org/abs/2007.00992
* code from https://github.com/clovaai/rexnet * Code: https://github.com/clovaai/rexnet
## Selective-Kernel Networks [[sknet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/sknet.py)]
* Paper: `Selective-Kernel Networks` - https://arxiv.org/abs/1903.06586
* Code: https://github.com/implus/SKNet, https://github.com/clovaai/assembled-cnn
## SelecSLS [[selecsls.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/selecsls.py)]
* Paper: `XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera` - https://arxiv.org/abs/1907.00837
* Code: https://github.com/mehtadushy/SelecSLS-Pytorch
## Squeeze-and-Excitation Networks [[senet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/senet.py)]
NOTE: I am deprecating this version of the networks, the new ones are part of `resnet.py`
* Paper: `Squeeze-and-Excitation Networks` - https://arxiv.org/abs/1709.01507
* Code: https://github.com/Cadene/pretrained-models.pytorch
## TResNet [[tresnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/tresnet.py)]
* Paper: `TResNet: High Performance GPU-Dedicated Architecture` - https://arxiv.org/abs/2003.13630
* Code: https://github.com/mrT23/TResNet
## VovNet V2 and V1 [[vovnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vovnet.py)]
* Paper: `CenterMask : Real-Time Anchor-Free Instance Segmentation` - https://arxiv.org/abs/1911.06667
* Reference code: https://github.com/youngwanLEE/vovnet-detectron2
## Xception [[xception.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/xception.py)]
* Paper: `Xception: Deep Learning with Depthwise Separable Convolutions` - https://arxiv.org/abs/1610.02357
* Code: https://github.com/Cadene/pretrained-models.pytorch
## Xception (Modified Aligned, Gluon) [[gluon_xception.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/gluon_xception.py)]
* Paper: `Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation` - https://arxiv.org/abs/1802.02611
* Reference code: https://github.com/dmlc/gluon-cv/tree/master/gluoncv/model_zoo, https://github.com/jfzhang95/pytorch-deeplab-xception/
## Xception (Modified Aligned, TF) [[aligned_xception.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/aligned_xception.py)]
* Paper: `Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation` - https://arxiv.org/abs/1802.02611
* Reference code: https://github.com/tensorflow/models/tree/master/research/deeplab

@ -1,45 +1,45 @@
## Training Hyperparameter Examples # Training Examples
### EfficientNet-B2 with RandAugment - 80.4 top-1, 95.1 top-5 ## EfficientNet-B2 with RandAugment - 80.4 top-1, 95.1 top-5
These params are for dual Titan RTX cards with NVIDIA Apex installed: These params are for dual Titan RTX cards with NVIDIA Apex installed:
`./distributed_train.sh 2 /imagenet/ --model efficientnet_b2 -b 128 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .97 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.3 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .016` `./distributed_train.sh 2 /imagenet/ --model efficientnet_b2 -b 128 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .97 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.3 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .016`
### MixNet-XL with RandAugment - 80.5 top-1, 94.9 top-5 ## MixNet-XL with RandAugment - 80.5 top-1, 94.9 top-5
This params are for dual Titan RTX cards with NVIDIA Apex installed: This params are for dual Titan RTX cards with NVIDIA Apex installed:
`./distributed_train.sh 2 /imagenet/ --model mixnet_xl -b 128 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .969 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.3 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.3 --amp --lr .016 --dist-bn reduce` `./distributed_train.sh 2 /imagenet/ --model mixnet_xl -b 128 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .969 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.3 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.3 --amp --lr .016 --dist-bn reduce`
### SE-ResNeXt-26-D and SE-ResNeXt-26-T ## SE-ResNeXt-26-D and SE-ResNeXt-26-T
These hparams (or similar) work well for a wide range of ResNet architecture, generally a good idea to increase the epoch # as the model size increases... ie approx 180-200 for ResNe(X)t50, and 220+ for larger. Increase batch size and LR proportionally for better GPUs or with AMP enabled. These params were for 2 1080Ti cards: These hparams (or similar) work well for a wide range of ResNet architecture, generally a good idea to increase the epoch # as the model size increases... ie approx 180-200 for ResNe(X)t50, and 220+ for larger. Increase batch size and LR proportionally for better GPUs or with AMP enabled. These params were for 2 1080Ti cards:
`./distributed_train.sh 2 /imagenet/ --model seresnext26t_32x4d --lr 0.1 --warmup-epochs 5 --epochs 160 --weight-decay 1e-4 --sched cosine --reprob 0.4 --remode pixel -b 112` `./distributed_train.sh 2 /imagenet/ --model seresnext26t_32x4d --lr 0.1 --warmup-epochs 5 --epochs 160 --weight-decay 1e-4 --sched cosine --reprob 0.4 --remode pixel -b 112`
### EfficientNet-B3 with RandAugment - 81.5 top-1, 95.7 top-5 ## EfficientNet-B3 with RandAugment - 81.5 top-1, 95.7 top-5
The training of this model started with the same command line as EfficientNet-B2 w/ RA above. After almost three weeks of training the process crashed. The results weren't looking amazing so I resumed the training several times with tweaks to a few params (increase RE prob, decrease rand-aug, increase ema-decay). Nothing looked great. I ended up averaging the best checkpoints from all restarts. The result is mediocre at default res/crop but oddly performs much better with a full image test crop of 1.0. The training of this model started with the same command line as EfficientNet-B2 w/ RA above. After almost three weeks of training the process crashed. The results weren't looking amazing so I resumed the training several times with tweaks to a few params (increase RE prob, decrease rand-aug, increase ema-decay). Nothing looked great. I ended up averaging the best checkpoints from all restarts. The result is mediocre at default res/crop but oddly performs much better with a full image test crop of 1.0.
### EfficientNet-B0 with RandAugment - 77.7 top-1, 95.3 top-5 ## EfficientNet-B0 with RandAugment - 77.7 top-1, 95.3 top-5
[Michael Klachko](https://github.com/michaelklachko) achieved these results with the command line for B2 adapted for larger batch size, with the recommended B0 dropout rate of 0.2. [Michael Klachko](https://github.com/michaelklachko) achieved these results with the command line for B2 adapted for larger batch size, with the recommended B0 dropout rate of 0.2.
`./distributed_train.sh 2 /imagenet/ --model efficientnet_b0 -b 384 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .97 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.2 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .048` `./distributed_train.sh 2 /imagenet/ --model efficientnet_b0 -b 384 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .97 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.2 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .048`
### ResNet50 with JSD loss and RandAugment (clean + 2x RA augs) - 79.04 top-1, 94.39 top-5 ## ResNet50 with JSD loss and RandAugment (clean + 2x RA augs) - 79.04 top-1, 94.39 top-5
Trained on two older 1080Ti cards, this took a while. Only slightly, non statistically better ImageNet validation result than my first good AugMix training of 78.99. However, these weights are more robust on tests with ImageNetV2, ImageNet-Sketch, etc. Unlike my first AugMix runs, I've enabled SplitBatchNorm, disabled random erasing on the clean split, and cranked up random erasing prob on the 2 augmented paths. Trained on two older 1080Ti cards, this took a while. Only slightly, non statistically better ImageNet validation result than my first good AugMix training of 78.99. However, these weights are more robust on tests with ImageNetV2, ImageNet-Sketch, etc. Unlike my first AugMix runs, I've enabled SplitBatchNorm, disabled random erasing on the clean split, and cranked up random erasing prob on the 2 augmented paths.
`./distributed_train.sh 2 /imagenet -b 64 --model resnet50 --sched cosine --epochs 200 --lr 0.05 --amp --remode pixel --reprob 0.6 --aug-splits 3 --aa rand-m9-mstd0.5-inc1 --resplit --split-bn --jsd --dist-bn reduce` `./distributed_train.sh 2 /imagenet -b 64 --model resnet50 --sched cosine --epochs 200 --lr 0.05 --amp --remode pixel --reprob 0.6 --aug-splits 3 --aa rand-m9-mstd0.5-inc1 --resplit --split-bn --jsd --dist-bn reduce`
### EfficientNet-ES (EdgeTPU-Small) with RandAugment - 78.066 top-1, 93.926 top-5 ## EfficientNet-ES (EdgeTPU-Small) with RandAugment - 78.066 top-1, 93.926 top-5
Trained by [Andrew Lavin](https://github.com/andravin) with 8 V100 cards. Model EMA was not used, final checkpoint is the average of 8 best checkpoints during training. Trained by [Andrew Lavin](https://github.com/andravin) with 8 V100 cards. Model EMA was not used, final checkpoint is the average of 8 best checkpoints during training.
`./distributed_train.sh 8 /imagenet --model efficientnet_es -b 128 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .97 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.2 --drop-connect 0.2 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .064` `./distributed_train.sh 8 /imagenet --model efficientnet_es -b 128 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .97 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.2 --drop-connect 0.2 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .064`
### MobileNetV3-Large-100 - 75.766 top-1, 92,542 top-5 ## MobileNetV3-Large-100 - 75.766 top-1, 92,542 top-5
`./distributed_train.sh 2 /imagenet/ --model mobilenetv3_large_100 -b 512 --sched step --epochs 600 --decay-epochs 2.4 --decay-rate .973 --opt rmsproptf --opt-eps .001 -j 7 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.2 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .064 --lr-noise 0.42 0.9` `./distributed_train.sh 2 /imagenet/ --model mobilenetv3_large_100 -b 512 --sched step --epochs 600 --decay-epochs 2.4 --decay-rate .973 --opt rmsproptf --opt-eps .001 -j 7 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.2 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .064 --lr-noise 0.42 0.9`
### ResNeXt-50 32x4d w/ RandAugment - 79.762 top-1, 94.60 top-5 ## ResNeXt-50 32x4d w/ RandAugment - 79.762 top-1, 94.60 top-5
These params will also work well for SE-ResNeXt-50 and SK-ResNeXt-50 and likely 101. I used them for the SK-ResNeXt-50 32x4d that I trained with 2 GPU using a slightly higher LR per effective batch size (lr=0.18, b=192 per GPU). The cmd line below are tuned for 8 GPU training. These params will also work well for SE-ResNeXt-50 and SK-ResNeXt-50 and likely 101. I used them for the SK-ResNeXt-50 32x4d that I trained with 2 GPU using a slightly higher LR per effective batch size (lr=0.18, b=192 per GPU). The cmd line below are tuned for 8 GPU training.

@ -7,6 +7,7 @@ nav:
- models.md - models.md
- results.md - results.md
- scripts.md - scripts.md
- training_hparam_examples.md
- changes.md - changes.md
- archived_changes.md - archived_changes.md
theme: theme:

Loading…
Cancel
Save