Continuing README and documentation updates

5 years ago · e3f58fc90c
parent e3c11a36dc
commit e3f58fc90c
5 changed files with 207 additions and 189 deletions
--- a/README.md
+++ b/README.md
@ -4,11 +4,11 @@

 ### Aug 1, 2020
 Universal feature extraction, new models, new weights, new test sets.
-* All models support the `features_only=True` argument for `create_model` call to return a network that extracts features from the deepest layer at each stride.
+* All models support the `features_only=True` argument for `create_model` call to return a network that extracts feature maps from the deepest layer at each stride.
 * New models
  * CSPResNet, CSPResNeXt, CSPDarkNet, DarkNet
  * ReXNet
-  * (Aligned) Xception41/65/71 (a proper port of TF models)
+  * (Modified Aligned) Xception41/65/71 (a proper port of TF models)
 * New trained weights
  * SEResNet50 - 80.3
  * CSPDarkNet53 - 80.1 top-1
@ -56,76 +56,58 @@ Bunch of changes:

 ## Introduction

-Py**T**orch **Im**age **M**odels is a collection of image models, layers, utilities, optimizers, schedulers, data-loaders / augmentations, and reference training / validation scripts that aim to pull together a wide variety of SOTA models with ability to reproduce ImageNet training results.
+Py**T**orch **Im**age **M**odels (`timm`) is a collection of image models, layers, utilities, optimizers, schedulers, data-loaders / augmentations, and reference training / validation scripts that aim to pull together a wide variety of SOTA models with ability to reproduce ImageNet training results.

-The work of many others is present here. I've tried to make sure all source material is acknowledged via links to github, arxiv papers, etc in the README, documentation, and code comments. Please let me know if I missed anything.
+The work of many others is present here. I've tried to make sure all source material is acknowledged via links to github, arxiv papers, etc in the README, documentation, and code docstrings. Please let me know if I missed anything.

 ## Models

-Most included models have pretrained weights. The weights are either from their original sources, ported by myself from their original framework (e.g. Tensorflow models), or trained from scratch using the included training script.
-
-Included models:
-* ResNet/ResNeXt (from [torchvision](https://github.com/pytorch/vision/tree/master/torchvision/models) with mods by myself)
-    * ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152, ResNeXt50 (32x4d), ResNeXt101 (32x4d and 64x4d)
-    * 'Bag of Tricks' / Gluon C, D, E, S variations (https://arxiv.org/abs/1812.01187)
-    * Instagram trained / ImageNet tuned ResNeXt101-32x8d to 32x48d from from [facebookresearch](https://pytorch.org/hub/facebookresearch_WSL-Images_resnext/)
-    * Res2Net (https://github.com/gasvn/Res2Net, https://arxiv.org/abs/1904.01169)
-    * Selective Kernel (SK) Nets (https://arxiv.org/abs/1903.06586)
-    * ResNeSt (code adapted from https://github.com/zhanghang1989/ResNeSt, paper https://arxiv.org/abs/2004.08955)
-* DLA
-    * Original (https://github.com/ucbdrive/dla, https://arxiv.org/abs/1707.06484)
-    * Res2Net (https://github.com/gasvn/Res2Net, https://arxiv.org/abs/1904.01169)
-* DenseNet (from [torchvision](https://github.com/pytorch/vision/tree/master/torchvision/models))
-    * DenseNet-121, DenseNet-169, DenseNet-201, DenseNet-161
-* Squeeze-and-Excitation ResNet/ResNeXt (from [Cadene](https://github.com/Cadene/pretrained-models.pytorch) with some pretrained weight additions by myself)
-    * SENet-154, SE-ResNet-18, SE-ResNet-34, SE-ResNet-50, SE-ResNet-101, SE-ResNet-152, SE-ResNeXt-26 (32x4d), SE-ResNeXt50 (32x4d), SE-ResNeXt101 (32x4d)
-* Inception-V3 (from [torchvision](https://github.com/pytorch/vision/tree/master/torchvision/models))
-* Inception-ResNet-V2 and Inception-V4 (from [Cadene](https://github.com/Cadene/pretrained-models.pytorch) )
-* Xception
-    * Original Xception from [Cadene](https://github.com/Cadene/pretrained-models.pytorch)
-    * MXNet Gluon 'modified aligned' Xception-65 from [Gluon ModelZoo](https://github.com/dmlc/gluon-cv/tree/master/gluoncv/model_zoo)
-    * DeepLab (Aligned) Xception-41, 65, and 71 from [Tensorflow Models](https://github.com/tensorflow/models/tree/master/research/deeplab)
-* PNasNet & NASNet-A (from [Cadene](https://github.com/Cadene/pretrained-models.pytorch))
-* DPN (from [myself](https://github.com/rwightman/pytorch-dpn-pretrained))
-    * DPN-68, DPN-68b, DPN-92, DPN-98, DPN-131, DPN-107
-* EfficientNet (from my standalone [GenEfficientNet](https://github.com/rwightman/gen-efficientnet-pytorch)) - A generic model that implements many of the efficient models that utilize similar DepthwiseSeparable and InvertedResidual blocks
-    * EfficientNet NoisyStudent (B0-B7, L2) (https://arxiv.org/abs/1911.04252)
-    * EfficientNet AdvProp (B0-B8) (https://arxiv.org/abs/1911.09665)
-    * EfficientNet (B0-B7) (https://arxiv.org/abs/1905.11946)
-    * EfficientNet-EdgeTPU (S, M, L) (https://ai.googleblog.com/2019/08/efficientnet-edgetpu-creating.html)
-    * MixNet (https://arxiv.org/abs/1907.09595)
-    * MNASNet B1, A1 (Squeeze-Excite), and Small (https://arxiv.org/abs/1807.11626)
-    * MobileNet-V2 (https://arxiv.org/abs/1801.04381)    
-    * FBNet-C (https://arxiv.org/abs/1812.03443)
-    * Single-Path NAS (https://arxiv.org/abs/1904.02877)
-* MobileNet-V3 (https://arxiv.org/abs/1905.02244)
-* HRNet
-    * code from https://github.com/HRNet/HRNet-Image-Classification
-    * paper https://arxiv.org/abs/1908.07919
-* SelecSLS
-    * code from https://github.com/mehtadushy/SelecSLS-Pytorch
-    * paper https://arxiv.org/abs/1907.00837
-* TResNet
-    * code from https://github.com/mrT23/TResNet
-    * paper https://arxiv.org/abs/2003.13630
-* RegNet
-    * paper `Designing Network Design Spaces` - https://arxiv.org/abs/2003.13678
-    * reference code at https://github.com/facebookresearch/pycls/blob/master/pycls/models/regnet.py
-* VovNet V2 (with V1 support)
-    * paper `CenterMask : Real-Time Anchor-Free Instance Segmentation` - https://arxiv.org/abs/1911.06667
-    * reference code at https://github.com/youngwanLEE/vovnet-detectron2
-* CspNet (Cross-Stage Partial Networks)
-    * paper `CSPNet: A New Backbone that can Enhance Learning Capability of CNN` - https://arxiv.org/abs/1911.11929
-    * reference impl at https://github.com/WongKinYiu/CrossStagePartialNetworks
-* ReXNet
-    * paper `ReXNet: Diminishing Representational Bottleneck on CNN` - https://arxiv.org/abs/2007.00992
-    * code from https://github.com/clovaai/rexnet
-
-Use the  `--model` arg to specify model for train, validation, inference scripts. Match the all lowercase
-creation fn for the model you'd like.
+Most included models have pretrained weights. The weights are either from their original sources, ported by myself from their original framework (e.g. Tensorflow models), or trained from scratch using the included training script. A full version of the list below with source links and references can be found in the [documentation](https://rwightman.github.io/pytorch-image-models/models/).
+
+* CspNet (Cross-Stage Partial Networks) - https://arxiv.org/abs/1911.11929
+* DenseNet - https://arxiv.org/abs/1608.06993
+* DLA - https://arxiv.org/abs/1707.06484
+* DPN (Dual-Path Network) - https://arxiv.org/abs/1707.01629
+* EfficientNet (MBConvNet Family)
+    * EfficientNet NoisyStudent (B0-B7, L2) - https://arxiv.org/abs/1911.04252
+    * EfficientNet AdvProp (B0-B8) - https://arxiv.org/abs/1911.09665
+    * EfficientNet (B0-B7) - https://arxiv.org/abs/1905.11946
+    * EfficientNet-EdgeTPU (S, M, L) - https://ai.googleblog.com/2019/08/efficientnet-edgetpu-creating.html
+    * FBNet-C - https://arxiv.org/abs/1812.03443
+    * MixNet - https://arxiv.org/abs/1907.09595
+    * MNASNet B1, A1 (Squeeze-Excite), and Small - https://arxiv.org/abs/1807.11626
+    * MobileNet-V2 - https://arxiv.org/abs/1801.04381
+    * Single-Path NAS - https://arxiv.org/abs/1904.02877
+* HRNet - https://arxiv.org/abs/1908.07919
+* Inception-V3 - https://arxiv.org/abs/1512.00567
+* Inception-ResNet-V2 and Inception-V4 - https://arxiv.org/abs/1602.07261
+* MobileNet-V3 (MBConvNet w/ Efficient Head) - https://arxiv.org/abs/1905.02244
+* NASNet-A - https://arxiv.org/abs/1707.07012
+* PNasNet - https://arxiv.org/abs/1712.00559
+* RegNet - https://arxiv.org/abs/2003.13678
+* ResNet/ResNeXt
+    * ResNet (v1b/v1.5) - https://arxiv.org/abs/1512.03385
+    * ResNeXt - https://arxiv.org/abs/1611.05431
+    * 'Bag of Tricks' / Gluon C, D, E, S variations - https://arxiv.org/abs/1812.01187
+    * Weakly-supervised (WSL) Instagram pretrained / ImageNet tuned ResNeXt101 - https://arxiv.org/abs/1805.00932
+    * Semi-supervised (SSL) / Semi-weakly Supervised (SWSL) ResNet/ResNeXts - https://arxiv.org/abs/1905.00546
+    * ECA-Net (ECAResNet) - https://arxiv.org/abs/1910.03151v4
+    * Squeeze-and-Excitation Networks (SEResNet) - https://arxiv.org/abs/1709.01507
+* Res2Net - https://arxiv.org/abs/1904.01169
+* ResNeSt - https://arxiv.org/abs/2004.08955
+* ReXNet - https://arxiv.org/abs/2007.00992
+* SelecSLS - https://arxiv.org/abs/1907.00837
+* Selective Kernel Networks - https://arxiv.org/abs/1903.06586
+* TResNet - https://arxiv.org/abs/2003.13630
+* VovNet V2 (with V1 support) - https://arxiv.org/abs/1911.06667
+* Xception - https://arxiv.org/abs/1610.02357
+* Xception (Modified Aligned, Gluon) - https://arxiv.org/abs/1802.02611
+* Xception (Modified Aligned, TF) - https://arxiv.org/abs/1802.02611

 ## Features
+
 Several (less common) features that I often utilize in my projects are included. Many of their additions are the reason why I maintain my own set of models, instead of using others' via PIP:
+
 * All models have a common default configuration interface and API for
    * accessing/changing the classifier - `get_classifier` and `reset_classifier`
    * doing a forward pass on just the features - `forward_features`
@ -165,6 +147,7 @@ Several (less common) features that I often utilize in my projects are included.
 * DropBlock (https://arxiv.org/abs/1810.12890)
 * Efficient Channel Attention - ECA (https://arxiv.org/abs/1910.03151)
 * Blur Pooling (https://arxiv.org/abs/1904.11486)
+* Space-to-Depth by [mrT23](https://github.com/mrT23/TResNet/blob/master/src/models/tresnet/layers/space_to_depth.py) (https://arxiv.org/abs/1801.04590) -- original paper?

 ## Results

@ -172,4 +155,4 @@ Model validation results can be found in the [documentation](https://rwightman.g

 ## Getting Started

-See [documentation](https://rwightman.github.io/pytorch-image-models/)
+See the [documentation](https://rwightman.github.io/pytorch-image-models/)
--- a/docs/index.md
+++ b/docs/index.md
@ -38,8 +38,8 @@ m.eval()
 ```python
 import timm
 from pprint import pprint
-m = timm.create_model('mobilenetv3_large_100', pretrained=True)
-pprint(timm.list_models(pretrained=True))
+model_names = timm.list_models(pretrained=True)
+pprint(model_names)
 >>> ['adv_inception_v3',
 'cspdarknet53',
 'cspresnext50',
@ -58,7 +58,8 @@ pprint(timm.list_models(pretrained=True))
 ```python
 import timm
 from pprint import pprint
-pprint(timm.list_models('*resne*t*'))
+model_names = timm.list_models('*resne*t*')
+pprint(model_names)
 >>> ['cspresnet50',
 'cspresnet50d',
 'cspresnet50w',
--- a/docs/models.md
+++ b/docs/models.md
@ -1,112 +1,145 @@
 # Model Architectures

-__FIXME - Clean This Up!__
-
-### ResNet / ResNeXt
-
-* ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152, ResNeXt50 (32x4d), ResNeXt101 (32x4d and 64x4d) 
-* 'Bag of Tricks' / Gluon C, D, E, S variations (https://arxiv.org/abs/1812.01187)
-* Instagram trained / ImageNet tuned ResNeXt101-32x8d to 32x48d from from [facebookresearch](https://pytorch.org/hub/facebookresearch_WSL-Images_resnext/)
-* Res2Net (https://github.com/gasvn/Res2Net, https://arxiv.org/abs/1904.01169)
-* Selective Kernel (SK) Nets (https://arxiv.org/abs/1903.06586)
-* ResNeSt (code adapted from https://github.com/zhanghang1989/ResNeSt, paper https://arxiv.org/abs/2004.08955)
-
-Originally based on ResNet from [torchvision](https://github.com/pytorch/vision/tree/master/torchvision/models)
-
-### DLA
-
-* Original
-  * code: https://github.com/ucbdrive/dla
-  * paper: https://arxiv.org/abs/1707.06484
-* Res2Net
-  * code: https://github.com/gasvn/Res2Net
-  * paper: https://arxiv.org/abs/1904.01169
-
-### DenseNet 
-
-* DenseNet-121, DenseNet-169, DenseNet-201, DenseNet-161
-
-Code from [torchvision](https://github.com/pytorch/vision/tree/master/torchvision/models)
-
-### Squeeze-and-Excitation ResNet/ResNeXt 
-
-* SENet-154, SE-ResNet-18, SE-ResNet-34, SE-ResNet-50, SE-ResNet-101, SE-ResNet-152, SE-ResNeXt-26 (32x4d), SE-ResNeXt50 (32x4d), SE-ResNeXt101 (32x4d)
-
-Code from [Cadene pretrained-models.pytorch](https://github.com/Cadene/pretrained-models.pytorch) with modifications
-
-### Inception-V3 
-
-Code from [torchvision](https://github.com/pytorch/vision/tree/master/torchvision/models)
-
-### Inception-ResNet-V2 and Inception-V4 
-
-Code from [Cadene pretrained-models.pytorch](https://github.com/Cadene/pretrained-models.pytorch)
-
-### Xception and Aligned-Xception (DeepLab)
-
-* Original variant from [Cadene pretrained-models.pytorch](https://github.com/Cadene/pretrained-models.pytorch)
-* MXNet Gluon 'modified aligned' Xception-65 and 71 models from [Gluon ModelZoo](https://github.com/dmlc/gluon-cv/tree/master/gluoncv/model_zoo)
-* DeepLab (Aligned) Xception-41, 65, and 71 from [Tensorflow Models](https://github.com/tensorflow/models/tree/master/research/deeplab)
-
-### PNasNet & NASNet-A
-
-Code from [Cadene pretrained-models.pytorch](https://github.com/Cadene/pretrained-models.pytorch)
- 
-### DPN
-
-* DPN-68, DPN-68b, DPN-92, DPN-98, DPN-131, DPN-107 
-
-Code adapted by [myself](https://github.com/rwightman/pytorch-dpn-pretrained) from MXNet originals (https://github.com/cypw/DPNs)
-
-### EfficientNet 
-
-* EfficientNet NoisyStudent (B0-B7, L2) (https://arxiv.org/abs/1911.04252)
-* EfficientNet AdvProp (B0-B8) (https://arxiv.org/abs/1911.09665)
-* EfficientNet (B0-B7) (https://arxiv.org/abs/1905.11946)
-* EfficientNet-EdgeTPU (S, M, L) (https://ai.googleblog.com/2019/08/efficientnet-edgetpu-creating.html)
-* MixNet (https://arxiv.org/abs/1907.09595)
-* MNASNet B1, A1 (Squeeze-Excite), and Small (https://arxiv.org/abs/1807.11626)
-* MobileNet-V2 (https://arxiv.org/abs/1801.04381)    
-* FBNet-C (https://arxiv.org/abs/1812.03443)
-* Single-Path NAS (https://arxiv.org/abs/1904.02877)
-
-Code from my standalone [GenEfficientNet](https://github.com/rwightman/gen-efficientnet-pytorch), adapted from [Tensorflow originals](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet).
-
-### MobileNet-V3
- 
-* MobileNetV3-Large, MobileNetV3-Small (https://arxiv.org/abs/1905.02244)
-
-Code from my standalone [GenEfficientNet](https://github.com/rwightman/gen-efficientnet-pytorch), adapted from [Tensorflow originals](https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet).
-
-### HRNet
-
-* code from https://github.com/HRNet/HRNet-Image-Classification
-* paper https://arxiv.org/abs/1908.07919
-
-### SelecSLS
-
-* paper https://arxiv.org/abs/1907.00837
-* code from https://github.com/mehtadushy/SelecSLS-Pytorch
-
-### TResNet
-
-* paper https://arxiv.org/abs/2003.13630
-* code from https://github.com/mrT23/TResNet
-
-### RegNet
-
-* paper `Designing Network Design Spaces` - https://arxiv.org/abs/2003.13678
-* reference code at https://github.com/facebookresearch/pycls/blob/master/pycls/models/regnet.py
-
-### VovNet V2 / V1
-
-* paper `CenterMask : Real-Time Anchor-Free Instance Segmentation` - https://arxiv.org/abs/1911.06667
-* reference code at https://github.com/youngwanLEE/vovnet-detectron2
-
-### CspNet (Cross-Stage Partial Networks)
-* paper `CSPNet: A New Backbone that can Enhance Learning Capability of CNN` - https://arxiv.org/abs/1911.11929
-* reference impl at https://github.com/WongKinYiu/CrossStagePartialNetworks
-
-### ReXNet
-* paper `ReXNet: Diminishing Representational Bottleneck on CNN` - https://arxiv.org/abs/2007.00992
-* code from https://github.com/clovaai/rexnet
+The model architectures included come from a wide variety of sources. Sources, including papers, original impl ("reference code") that I rewrote / adapted, and PyTorch impl that I leveraged directly ("code") are listed below.
+
+Most included models have pretrained weights. The weights are either:
+1. from their original sources
+2. ported by myself from their original impl in a different framework (e.g. Tensorflow models)
+3. trained from scratch using the included training script
+
+The validation results for the pretrained weights can be found [here](results.md)
+
+## Cross-Stage Partial Networks [[cspnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/cspnet.py)]
+* Paper: `CSPNet: A New Backbone that can Enhance Learning Capability of CNN` - https://arxiv.org/abs/1911.11929
+* Reference impl: https://github.com/WongKinYiu/CrossStagePartialNetworks
+
+## DenseNet [[densenet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/densenet.py)]
+* Paper: `Densely Connected Convolutional Networks` - https://arxiv.org/abs/1608.06993
+* Code: https://github.com/pytorch/vision/tree/master/torchvision/models
+
+## DLA [[dla.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/dla.py)]
+* Paper: https://arxiv.org/abs/1707.06484
+* Code: https://github.com/ucbdrive/dla
+
+## Dual-Path Networks [[dpn.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/dpn.py)]
+* Paper: `Dual Path Networks` - https://arxiv.org/abs/1707.01629
+* My PyTorch code: https://github.com/rwightman/pytorch-dpn-pretrained
+* Reference code: https://github.com/cypw/DPNs
+
+## HRNet [[hrnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/hrnet.py)]
+* Paper: `Deep High-Resolution Representation Learning for Visual Recognition` - https://arxiv.org/abs/1908.07919
+* Code: https://github.com/HRNet/HRNet-Image-Classification
+
+## Inception-V3 [[inception_v3.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/inception_v3.py)]
+* Paper: `Rethinking the Inception Architecture for Computer Vision` - https://arxiv.org/abs/1512.00567
+* Code: https://github.com/pytorch/vision/tree/master/torchvision/models
+
+## Inception-V4 [[inception_v4.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/inception_v4.py)]
+* Paper: `Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning` - https://arxiv.org/abs/1602.07261
+* Code: https://github.com/Cadene/pretrained-models.pytorch
+* Reference code: https://github.com/tensorflow/models/tree/master/research/slim/nets
+
+## Inception-ResNet-V2 [[inception_resnet_v2.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/inception_resnet_v2.py)]
+* Paper: `Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning` - https://arxiv.org/abs/1602.07261
+* Code: https://github.com/Cadene/pretrained-models.pytorch
+* Reference code: https://github.com/tensorflow/models/tree/master/research/slim/nets
+
+## NASNet-A [[nasnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/nasnet.py)]
+* Papers: `Learning Transferable Architectures for Scalable Image Recognition` - https://arxiv.org/abs/1707.07012
+* Code: https://github.com/Cadene/pretrained-models.pytorch
+* Reference code: https://github.com/tensorflow/models/tree/master/research/slim/nets/nasnet
+
+## PNasNet-5 [[pnasnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/pnasnet.py)]
+* Papers: `Progressive Neural Architecture Search` - https://arxiv.org/abs/1712.00559
+* Code: https://github.com/Cadene/pretrained-models.pytorch
+* Reference code: https://github.com/tensorflow/models/tree/master/research/slim/nets/nasnet
+
+## EfficientNet [[efficientnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/efficientnet.py)]
+* Papers
+    * EfficientNet NoisyStudent (B0-B7, L2) - https://arxiv.org/abs/1911.04252
+    * EfficientNet AdvProp (B0-B8) - https://arxiv.org/abs/1911.09665
+    * EfficientNet (B0-B7) - https://arxiv.org/abs/1905.11946
+    * EfficientNet-EdgeTPU (S, M, L) - https://ai.googleblog.com/2019/08/efficientnet-edgetpu-creating.html
+    * MixNet - https://arxiv.org/abs/1907.09595
+    * MNASNet B1, A1 (Squeeze-Excite), and Small - https://arxiv.org/abs/1807.11626
+    * MobileNet-V2 - https://arxiv.org/abs/1801.04381
+    * FBNet-C - https://arxiv.org/abs/1812.03443
+    * Single-Path NAS - https://arxiv.org/abs/1904.02877
+* My PyTorch code: https://github.com/rwightman/gen-efficientnet-pytorch
+* Reference code: https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet
+
+## MobileNet-V3 [[mobilenetv3.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/mobilenetv3.py)]
+* Paper: `Searching for MobileNetV3` - https://arxiv.org/abs/1905.02244
+* Reference code: https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet
+
+## RegNet [[regnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/regnet.py)]
+* Paper: `Designing Network Design Spaces` - https://arxiv.org/abs/2003.13678
+* Reference code: https://github.com/facebookresearch/pycls/blob/master/pycls/models/regnet.py
+
+## ResNet, ResNeXt [[resnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/resnet.py)]
+* ResNet (V1B)
+    * Paper: `Deep Residual Learning for Image Recognition` - https://arxiv.org/abs/1512.03385
+    * Code: https://github.com/pytorch/vision/tree/master/torchvision/models
+* ResNeXt
+    * Paper: `Aggregated Residual Transformations for Deep Neural Networks` - https://arxiv.org/abs/1611.05431
+    * Code: https://github.com/pytorch/vision/tree/master/torchvision/models
+* 'Bag of Tricks' / Gluon C, D, E, S ResNet variants
+    * Paper: `Bag of Tricks for Image Classification with CNNs` - https://arxiv.org/abs/1812.01187
+    * Code: https://github.com/dmlc/gluon-cv/blob/master/gluoncv/model_zoo/resnetv1b.py
+* Instagram pretrained / ImageNet tuned ResNeXt101
+    * Paper: `Exploring the Limits of Weakly Supervised Pretraining` - https://arxiv.org/abs/1805.00932
+    * Weights: https://pytorch.org/hub/facebookresearch_WSL-Images_resnext (NOTE: CC BY-NC 4.0 License, NOT commercial friendly)
+* Semi-supervised (SSL) / Semi-weakly Supervised (SWSL) ResNet and ResNeXts
+    * Paper: `Billion-scale semi-supervised learning for image classification` - https://arxiv.org/abs/1905.00546
+    * Weights: https://github.com/facebookresearch/semi-supervised-ImageNet1K-models (NOTE: CC BY-NC 4.0 License, NOT commercial friendly)
+* Squeeze-and-Excitation Networks
+    * Paper: `Squeeze-and-Excitation Networks` - https://arxiv.org/abs/1709.01507
+    * Code: Added code to my ResNet base, this is current version going forward, old senet.py is being deprecated  
+* ECAResNet (ECA-Net)
+    * Paper: `ECA-Net: Efficient Channel Attention for Deep CNN` - https://arxiv.org/abs/1910.03151v4
+    * Code: Added to ResNet base, ECA module contributed by @VRandme, reference https://github.com/BangguWu/ECANet
+
+## Res2Net [[res2net.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/res2net.py)]
+* Paper: `Res2Net: A New Multi-scale Backbone Architecture` - https://arxiv.org/abs/1904.01169
+* Code: https://github.com/gasvn/Res2Net
+
+## ResNeSt [[resnest.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/resnest.py)]
+* Paper: `ResNeSt: Split-Attention Networks` - https://arxiv.org/abs/2004.08955
+* Code: https://github.com/zhanghang1989/ResNeSt
+
+## ReXNet [[rexnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/rexnet.py)]
+* Paper: `ReXNet: Diminishing Representational Bottleneck on CNN` - https://arxiv.org/abs/2007.00992
+* Code: https://github.com/clovaai/rexnet
+
+## Selective-Kernel Networks [[sknet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/sknet.py)]
+* Paper: `Selective-Kernel Networks` - https://arxiv.org/abs/1903.06586
+* Code: https://github.com/implus/SKNet, https://github.com/clovaai/assembled-cnn
+
+## SelecSLS [[selecsls.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/selecsls.py)]
+* Paper: `XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera` - https://arxiv.org/abs/1907.00837
+* Code: https://github.com/mehtadushy/SelecSLS-Pytorch
+
+## Squeeze-and-Excitation Networks [[senet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/senet.py)]
+NOTE: I am deprecating this version of the networks, the new ones are part of `resnet.py`
+* Paper: `Squeeze-and-Excitation Networks` - https://arxiv.org/abs/1709.01507
+* Code: https://github.com/Cadene/pretrained-models.pytorch 
+
+## TResNet [[tresnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/tresnet.py)]
+* Paper: `TResNet: High Performance GPU-Dedicated Architecture` - https://arxiv.org/abs/2003.13630
+* Code: https://github.com/mrT23/TResNet
+
+## VovNet V2 and V1 [[vovnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vovnet.py)]
+* Paper: `CenterMask : Real-Time Anchor-Free Instance Segmentation` - https://arxiv.org/abs/1911.06667
+* Reference code: https://github.com/youngwanLEE/vovnet-detectron2
+
+## Xception [[xception.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/xception.py)]
+* Paper: `Xception: Deep Learning with Depthwise Separable Convolutions` - https://arxiv.org/abs/1610.02357
+* Code: https://github.com/Cadene/pretrained-models.pytorch
+
+## Xception (Modified Aligned, Gluon) [[gluon_xception.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/gluon_xception.py)]
+* Paper: `Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation` - https://arxiv.org/abs/1802.02611
+* Reference code: https://github.com/dmlc/gluon-cv/tree/master/gluoncv/model_zoo, https://github.com/jfzhang95/pytorch-deeplab-xception/
+
+## Xception (Modified Aligned, TF) [[aligned_xception.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/aligned_xception.py)]
+* Paper: `Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation` - https://arxiv.org/abs/1802.02611
+* Reference code: https://github.com/tensorflow/models/tree/master/research/deeplab
--- a/docs/training_hparams_and_examples.md
+++ b/docs/training_hparams_and_examples.md
@ -1,45 +1,45 @@
-## Training Hyperparameter Examples
+# Training Examples

-### EfficientNet-B2 with RandAugment - 80.4 top-1, 95.1 top-5
+## EfficientNet-B2 with RandAugment - 80.4 top-1, 95.1 top-5
 These params are for dual Titan RTX cards with NVIDIA Apex installed:

 `./distributed_train.sh 2 /imagenet/ --model efficientnet_b2 -b 128 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .97 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.3 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .016`

-### MixNet-XL with RandAugment - 80.5 top-1, 94.9 top-5
+## MixNet-XL with RandAugment - 80.5 top-1, 94.9 top-5
 This params are for dual Titan RTX cards with NVIDIA Apex installed:

 `./distributed_train.sh 2 /imagenet/ --model mixnet_xl -b 128 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .969 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.3 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.3 --amp --lr .016 --dist-bn reduce`

-### SE-ResNeXt-26-D and SE-ResNeXt-26-T
+## SE-ResNeXt-26-D and SE-ResNeXt-26-T
 These hparams (or similar) work well for a wide range of ResNet architecture, generally a good idea to increase the epoch # as the model size increases... ie approx 180-200 for ResNe(X)t50, and 220+ for larger. Increase batch size and LR proportionally for better GPUs or with AMP enabled. These params were for 2 1080Ti cards:

 `./distributed_train.sh 2 /imagenet/ --model seresnext26t_32x4d --lr 0.1 --warmup-epochs 5 --epochs 160 --weight-decay 1e-4 --sched cosine --reprob 0.4 --remode pixel -b 112`

-### EfficientNet-B3 with RandAugment - 81.5 top-1, 95.7 top-5
+## EfficientNet-B3 with RandAugment - 81.5 top-1, 95.7 top-5
 The training of this model started with the same command line as EfficientNet-B2 w/ RA above. After almost three weeks of training the process crashed. The results weren't looking amazing so I resumed the training several times with tweaks to a few params (increase RE prob, decrease rand-aug, increase ema-decay). Nothing looked great. I ended up averaging the best checkpoints from all restarts. The result is mediocre at default res/crop but oddly performs much better with a full image test crop of 1.0. 

-### EfficientNet-B0 with RandAugment - 77.7 top-1, 95.3 top-5
+## EfficientNet-B0 with RandAugment - 77.7 top-1, 95.3 top-5
 [Michael Klachko](https://github.com/michaelklachko) achieved these results with the command line for B2 adapted for larger batch size, with the recommended B0 dropout rate of 0.2.

 `./distributed_train.sh 2 /imagenet/ --model efficientnet_b0 -b 384 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .97 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.2 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .048`

-### ResNet50 with JSD loss and RandAugment (clean + 2x RA augs) - 79.04 top-1, 94.39 top-5
+## ResNet50 with JSD loss and RandAugment (clean + 2x RA augs) - 79.04 top-1, 94.39 top-5

 Trained on two older 1080Ti cards, this took a while. Only slightly, non statistically better ImageNet validation result than my first good AugMix training of 78.99. However, these weights are more robust on tests with ImageNetV2, ImageNet-Sketch, etc. Unlike my first AugMix runs, I've enabled SplitBatchNorm, disabled random erasing on the clean split, and cranked up random erasing prob on the 2 augmented paths.

 `./distributed_train.sh 2 /imagenet -b 64 --model resnet50 --sched cosine --epochs 200 --lr 0.05 --amp --remode pixel --reprob 0.6 --aug-splits 3 --aa rand-m9-mstd0.5-inc1 --resplit --split-bn --jsd --dist-bn reduce`

-### EfficientNet-ES (EdgeTPU-Small) with RandAugment - 78.066 top-1, 93.926 top-5
+## EfficientNet-ES (EdgeTPU-Small) with RandAugment - 78.066 top-1, 93.926 top-5
 Trained by [Andrew Lavin](https://github.com/andravin) with 8 V100 cards. Model EMA was not used, final checkpoint is the average of 8 best checkpoints during training.

 `./distributed_train.sh 8 /imagenet --model efficientnet_es -b 128 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .97 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.2 --drop-connect 0.2  --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .064`

-### MobileNetV3-Large-100 - 75.766 top-1, 92,542 top-5
+## MobileNetV3-Large-100 - 75.766 top-1, 92,542 top-5

 `./distributed_train.sh 2 /imagenet/ --model mobilenetv3_large_100 -b 512 --sched step --epochs 600 --decay-epochs 2.4 --decay-rate .973 --opt rmsproptf --opt-eps .001 -j 7 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.2 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .064 --lr-noise 0.42 0.9`


-### ResNeXt-50 32x4d w/ RandAugment - 79.762 top-1, 94.60 top-5
+## ResNeXt-50 32x4d w/ RandAugment - 79.762 top-1, 94.60 top-5
 These params will also work well for SE-ResNeXt-50 and SK-ResNeXt-50 and likely 101. I used them for the SK-ResNeXt-50 32x4d that I trained with 2 GPU using a slightly higher LR per effective batch size (lr=0.18, b=192 per GPU). The cmd line below are tuned for 8 GPU training.


--- a/mkdocs.yml
+++ b/mkdocs.yml
@ -7,6 +7,7 @@ nav:
  - models.md
  - results.md
  - scripts.md
+  - training_hparam_examples.md
  - changes.md
  - archived_changes.md
 theme: