Merge pull request #419 from rwightman/byob_vgg_models

More models, GPU-Efficient Nets, RepVGG, classic VGG, and flexible Byob backbone.
pull/425/head
Ross Wightman 4 years ago committed by GitHub
commit d8e69206be
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -2,6 +2,15 @@
## What's New ## What's New
### Feb 10, 2021
* More model archs, incl a flexible ByobNet backbone ('Bring-your-own-blocks')
* GPU-Efficient-Networks (https://github.com/idstcv/GPU-Efficient-Networks), impl in `byobnet.py`
* RepVGG (https://github.com/DingXiaoH/RepVGG), impl in `byobnet.py`
* classic VGG (from torchvision, impl in `vgg.py`)
* Refinements to normalizer layer arg handling and normalizer+act layer handling in some models
* Default AMP mode changed to native PyTorch AMP instead of APEX. Issues not being fixed with APEX. Native works with `--channels-last` and `--torchscript` model training, APEX does not.
* Fix a few bugs introduced since last pypi release
### Feb 8, 2021 ### Feb 8, 2021
* Add several ResNet weights with ECA attention. 26t & 50t trained @ 256, test @ 320. 269d train @ 256, fine-tune @320, test @ 352. * Add several ResNet weights with ECA attention. 26t & 50t trained @ 256, test @ 320. 269d train @ 256, fine-tune @320, test @ 352.
* `ecaresnet26t` - 79.88 top-1 @ 320x320, 79.08 @ 256x256 * `ecaresnet26t` - 79.88 top-1 @ 320x320, 79.08 @ 256x256
@ -118,30 +127,6 @@ Bunch of changes:
* Some import cleanup and classifier reset changes, all models will have classifier reset to nn.Identity on reset_classifer(0) call * Some import cleanup and classifier reset changes, all models will have classifier reset to nn.Identity on reset_classifer(0) call
* Prep for 0.1.28 pip release * Prep for 0.1.28 pip release
### May 12, 2020
* Add ResNeSt models (code adapted from https://github.com/zhanghang1989/ResNeSt, paper https://arxiv.org/abs/2004.08955))
### May 3, 2020
* Pruned EfficientNet B1, B2, and B3 (https://arxiv.org/abs/2002.08258) contributed by [Yonathan Aflalo](https://github.com/yoniaflalo)
### May 1, 2020
* Merged a number of execellent contributions in the ResNet model family over the past month
* BlurPool2D and resnetblur models initiated by [Chris Ha](https://github.com/VRandme), I trained resnetblur50 to 79.3.
* TResNet models and SpaceToDepth, AntiAliasDownsampleLayer layers by [mrT23](https://github.com/mrT23)
* ecaresnet (50d, 101d, light) models and two pruned variants using pruning as per (https://arxiv.org/abs/2002.08258) by [Yonathan Aflalo](https://github.com/yoniaflalo)
* 200 pretrained models in total now with updated results csv in results folder
### April 5, 2020
* Add some newly trained MobileNet-V2 models trained with latest h-params, rand augment. They compare quite favourably to EfficientNet-Lite
* 3.5M param MobileNet-V2 100 @ 73%
* 4.5M param MobileNet-V2 110d @ 75%
* 6.1M param MobileNet-V2 140 @ 76.5%
* 5.8M param MobileNet-V2 120d @ 77.3%
### March 18, 2020
* Add EfficientNet-Lite models w/ weights ported from [Tensorflow TPU](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/lite)
* Add RandAugment trained ResNeXt-50 32x4d weights with 79.8 top-1. Trained by [Andrew Lavin](https://github.com/andravin) (see Training section for hparams)
## Introduction ## Introduction
Py**T**orch **Im**age **M**odels (`timm`) is a collection of image models, layers, utilities, optimizers, schedulers, data-loaders / augmentations, and reference training / validation scripts that aim to pull together a wide variety of SOTA models with ability to reproduce ImageNet training results. Py**T**orch **Im**age **M**odels (`timm`) is a collection of image models, layers, utilities, optimizers, schedulers, data-loaders / augmentations, and reference training / validation scripts that aim to pull together a wide variety of SOTA models with ability to reproduce ImageNet training results.
@ -150,7 +135,7 @@ The work of many others is present here. I've tried to make sure all source mate
## Models ## Models
All model architecture families include variants with pretrained weights. The are variants without any weights. Help training new or better weights is always appreciated. Here are some example [training hparams](https://rwightman.github.io/pytorch-image-models/training_hparam_examples) to get you started. All model architecture families include variants with pretrained weights. There are specific model variants without any weights, it is NOT a bug. Help training new or better weights is always appreciated. Here are some example [training hparams](https://rwightman.github.io/pytorch-image-models/training_hparam_examples) to get you started.
A full version of the list below with source links can be found in the [documentation](https://rwightman.github.io/pytorch-image-models/models/). A full version of the list below with source links can be found in the [documentation](https://rwightman.github.io/pytorch-image-models/models/).
@ -170,6 +155,7 @@ A full version of the list below with source links can be found in the [document
* MNASNet B1, A1 (Squeeze-Excite), and Small - https://arxiv.org/abs/1807.11626 * MNASNet B1, A1 (Squeeze-Excite), and Small - https://arxiv.org/abs/1807.11626
* MobileNet-V2 - https://arxiv.org/abs/1801.04381 * MobileNet-V2 - https://arxiv.org/abs/1801.04381
* Single-Path NAS - https://arxiv.org/abs/1904.02877 * Single-Path NAS - https://arxiv.org/abs/1904.02877
* GPU-Efficient Networks - https://arxiv.org/abs/2006.14090
* HRNet - https://arxiv.org/abs/1908.07919 * HRNet - https://arxiv.org/abs/1908.07919
* Inception-V3 - https://arxiv.org/abs/1512.00567 * Inception-V3 - https://arxiv.org/abs/1512.00567
* Inception-ResNet-V2 and Inception-V4 - https://arxiv.org/abs/1602.07261 * Inception-ResNet-V2 and Inception-V4 - https://arxiv.org/abs/1602.07261
@ -178,6 +164,7 @@ A full version of the list below with source links can be found in the [document
* NF-RegNet / NF-ResNet - https://arxiv.org/abs/2101.08692 * NF-RegNet / NF-ResNet - https://arxiv.org/abs/2101.08692
* PNasNet - https://arxiv.org/abs/1712.00559 * PNasNet - https://arxiv.org/abs/1712.00559
* RegNet - https://arxiv.org/abs/2003.13678 * RegNet - https://arxiv.org/abs/2003.13678
* RepVGG - https://arxiv.org/abs/2101.03697
* ResNet/ResNeXt * ResNet/ResNeXt
* ResNet (v1b/v1.5) - https://arxiv.org/abs/1512.03385 * ResNet (v1b/v1.5) - https://arxiv.org/abs/1512.03385
* ResNeXt - https://arxiv.org/abs/1611.05431 * ResNeXt - https://arxiv.org/abs/1611.05431
@ -261,9 +248,10 @@ The root folder of the repository contains reference train, validation, and infe
One of the greatest assets of PyTorch is the community and their contributions. A few of my favourite resources that pair well with the models and componenets here are listed below. One of the greatest assets of PyTorch is the community and their contributions. A few of my favourite resources that pair well with the models and componenets here are listed below.
### Training / Frameworks ### Object Detection, Instance and Semantic Segmentation
* PyTorch Lightning - https://github.com/PyTorchLightning/pytorch-lightning * Detectron2 - https://github.com/facebookresearch/detectron2
* fastai - https://github.com/fastai/fastai * Segmentation Models (Semantic) - https://github.com/qubvel/segmentation_models.pytorch
* EfficientDet (Obj Det, Semantic soon) - https://github.com/rwightman/efficientdet-pytorch
### Computer Vision / Image Augmentation ### Computer Vision / Image Augmentation
* Albumentations - https://github.com/albumentations-team/albumentations * Albumentations - https://github.com/albumentations-team/albumentations
@ -276,10 +264,8 @@ One of the greatest assets of PyTorch is the community and their contributions.
### Metric Learning ### Metric Learning
* PyTorch Metric Learning - https://github.com/KevinMusgrave/pytorch-metric-learning * PyTorch Metric Learning - https://github.com/KevinMusgrave/pytorch-metric-learning
### Object Detection, Instance and Semantic Segmentation ### Training / Frameworks
* Detectron2 - https://github.com/facebookresearch/detectron2 * fastai - https://github.com/fastai/fastai
* Segmentation Models (Semantic) - https://github.com/qubvel/segmentation_models.pytorch
* EfficientDet (Obj Det, Semantic soon) - https://github.com/rwightman/efficientdet-pytorch
## Licenses ## Licenses

@ -1,5 +1,29 @@
# Archived Changes # Archived Changes
### May 12, 2020
* Add ResNeSt models (code adapted from https://github.com/zhanghang1989/ResNeSt, paper https://arxiv.org/abs/2004.08955))
### May 3, 2020
* Pruned EfficientNet B1, B2, and B3 (https://arxiv.org/abs/2002.08258) contributed by [Yonathan Aflalo](https://github.com/yoniaflalo)
### May 1, 2020
* Merged a number of execellent contributions in the ResNet model family over the past month
* BlurPool2D and resnetblur models initiated by [Chris Ha](https://github.com/VRandme), I trained resnetblur50 to 79.3.
* TResNet models and SpaceToDepth, AntiAliasDownsampleLayer layers by [mrT23](https://github.com/mrT23)
* ecaresnet (50d, 101d, light) models and two pruned variants using pruning as per (https://arxiv.org/abs/2002.08258) by [Yonathan Aflalo](https://github.com/yoniaflalo)
* 200 pretrained models in total now with updated results csv in results folder
### April 5, 2020
* Add some newly trained MobileNet-V2 models trained with latest h-params, rand augment. They compare quite favourably to EfficientNet-Lite
* 3.5M param MobileNet-V2 100 @ 73%
* 4.5M param MobileNet-V2 110d @ 75%
* 6.1M param MobileNet-V2 140 @ 76.5%
* 5.8M param MobileNet-V2 120d @ 77.3%
### March 18, 2020
* Add EfficientNet-Lite models w/ weights ported from [Tensorflow TPU](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/lite)
* Add RandAugment trained ResNeXt-50 32x4d weights with 79.8 top-1. Trained by [Andrew Lavin](https://github.com/andravin) (see Training section for hparams)
### April 5, 2020 ### April 5, 2020
* Add some newly trained MobileNet-V2 models trained with latest h-params, rand augment. They compare quite favourably to EfficientNet-Lite * Add some newly trained MobileNet-V2 models trained with latest h-params, rand augment. They compare quite favourably to EfficientNet-Lite
* 3.5M param MobileNet-V2 100 @ 73% * 3.5M param MobileNet-V2 100 @ 73%

@ -1,5 +1,55 @@
# Recent Changes # Recent Changes
### Feb 10, 2021
* More model archs, incl a flexible ByobNet backbone ('Bring-your-own-blocks')
* GPU-Efficient-Networks (https://github.com/idstcv/GPU-Efficient-Networks), impl in `byobnet.py`
* RepVGG (https://github.com/DingXiaoH/RepVGG), impl in `byobnet.py`
* classic VGG (from torchvision, impl in `vgg`)
* Refinements to normalizer layer arg handling and normalizer+act layer handling in some models
* Default AMP mode changed to native PyTorch AMP instead of APEX. Issues not being fixed with APEX. Native works with `--channels-last` and `--torchscript` model training, APEX does not.
* Fix a few bugs introduced since last pypi release
### Feb 8, 2021
* Add several ResNet weights with ECA attention. 26t & 50t trained @ 256, test @ 320. 269d train @ 256, fine-tune @320, test @ 352.
* `ecaresnet26t` - 79.88 top-1 @ 320x320, 79.08 @ 256x256
* `ecaresnet50t` - 82.35 top-1 @ 320x320, 81.52 @ 256x256
* `ecaresnet269d` - 84.93 top-1 @ 352x352, 84.87 @ 320x320
* Remove separate tiered (`t`) vs tiered_narrow (`tn`) ResNet model defs, all `tn` changed to `t` and `t` models removed (`seresnext26t_32x4d` only model w/ weights that was removed).
* Support model default_cfgs with separate train vs test resolution `test_input_size` and remove extra `_320` suffix ResNet model defs that were just for test.
### Jan 30, 2021
* Add initial "Normalization Free" NF-RegNet-B* and NF-ResNet model definitions based on [paper](https://arxiv.org/abs/2101.08692)
### Jan 25, 2021
* Add ResNetV2 Big Transfer (BiT) models w/ ImageNet-1k and 21k weights from https://github.com/google-research/big_transfer
* Add official R50+ViT-B/16 hybrid models + weights from https://github.com/google-research/vision_transformer
* ImageNet-21k ViT weights are added w/ model defs and representation layer (pre logits) support
* NOTE: ImageNet-21k classifier heads were zero'd in original weights, they are only useful for transfer learning
* Add model defs and weights for DeiT Vision Transformer models from https://github.com/facebookresearch/deit
* Refactor dataset classes into ImageDataset/IterableImageDataset + dataset specific parser classes
* Add Tensorflow-Datasets (TFDS) wrapper to allow use of TFDS image classification sets with train script
* Ex: `train.py /data/tfds --dataset tfds/oxford_iiit_pet --val-split test --model resnet50 -b 256 --amp --num-classes 37 --opt adamw --lr 3e-4 --weight-decay .001 --pretrained -j 2`
* Add improved .tar dataset parser that reads images from .tar, folder of .tar files, or .tar within .tar
* Run validation on full ImageNet-21k directly from tar w/ BiT model: `validate.py /data/fall11_whole.tar --model resnetv2_50x1_bitm_in21k --amp`
* Models in this update should be stable w/ possible exception of ViT/BiT, possibility of some regressions with train/val scripts and dataset handling
### Jan 3, 2021
* Add SE-ResNet-152D weights
* 256x256 val, 0.94 crop top-1 - 83.75
* 320x320 val, 1.0 crop - 84.36
* Update results files
### Dec 18, 2020
* Add ResNet-101D, ResNet-152D, and ResNet-200D weights trained @ 256x256
* 256x256 val, 0.94 crop (top-1) - 101D (82.33), 152D (83.08), 200D (83.25)
* 288x288 val, 1.0 crop - 101D (82.64), 152D (83.48), 200D (83.76)
* 320x320 val, 1.0 crop - 101D (83.00), 152D (83.66), 200D (84.01)
### Dec 7, 2020
* Simplify EMA module (ModelEmaV2), compatible with fully torchscripted models
* Misc fixes for SiLU ONNX export, default_cfg missing from Feature extraction models, Linear layer w/ AMP + torchscript
* PyPi release @ 0.3.2 (needed by EfficientDet)
### Oct 30, 2020 ### Oct 30, 2020
* Test with PyTorch 1.7 and fix a small top-n metric view vs reshape issue. * Test with PyTorch 1.7 and fix a small top-n metric view vs reshape issue.
* Convert newly added 224x224 Vision Transformer weights from official JAX repo. 81.8 top-1 for B/16, 83.1 L/16. * Convert newly added 224x224 Vision Transformer weights from official JAX repo. 81.8 top-1 for B/16, 83.1 L/16.

@ -31,6 +31,10 @@ The validation results for the pretrained weights can be found [here](results.md
* My PyTorch code: https://github.com/rwightman/pytorch-dpn-pretrained * My PyTorch code: https://github.com/rwightman/pytorch-dpn-pretrained
* Reference code: https://github.com/cypw/DPNs * Reference code: https://github.com/cypw/DPNs
## GPU-Efficient Networks [[byobnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/byobnet.py)]
* Paper: `Neural Architecture Design for GPU-Efficient Networks` - https://arxiv.org/abs/2006.14090
* Reference code: https://github.com/idstcv/GPU-Efficient-Networks
## HRNet [[hrnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/hrnet.py)] ## HRNet [[hrnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/hrnet.py)]
* Paper: `Deep High-Resolution Representation Learning for Visual Recognition` - https://arxiv.org/abs/1908.07919 * Paper: `Deep High-Resolution Representation Learning for Visual Recognition` - https://arxiv.org/abs/1908.07919
* Code: https://github.com/HRNet/HRNet-Image-Classification * Code: https://github.com/HRNet/HRNet-Image-Classification
@ -82,6 +86,10 @@ The validation results for the pretrained weights can be found [here](results.md
* Paper: `Designing Network Design Spaces` - https://arxiv.org/abs/2003.13678 * Paper: `Designing Network Design Spaces` - https://arxiv.org/abs/2003.13678
* Reference code: https://github.com/facebookresearch/pycls/blob/master/pycls/models/regnet.py * Reference code: https://github.com/facebookresearch/pycls/blob/master/pycls/models/regnet.py
## RepVGG [[byobnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/byobnet.py)]
* Paper: `Making VGG-style ConvNets Great Again` - https://arxiv.org/abs/2101.03697
* Reference code: https://github.com/DingXiaoH/RepVGG
## ResNet, ResNeXt [[resnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/resnet.py)] ## ResNet, ResNeXt [[resnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/resnet.py)]
* ResNet (V1B) * ResNet (V1B)
@ -136,6 +144,10 @@ NOTE: I am deprecating this version of the networks, the new ones are part of `r
* Paper: `TResNet: High Performance GPU-Dedicated Architecture` - https://arxiv.org/abs/2003.13630 * Paper: `TResNet: High Performance GPU-Dedicated Architecture` - https://arxiv.org/abs/2003.13630
* Code: https://github.com/mrT23/TResNet * Code: https://github.com/mrT23/TResNet
## VGG [[vgg.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vgg.py)]
* Paper: `Very Deep Convolutional Networks For Large-Scale Image Recognition` - https://arxiv.org/pdf/1409.1556.pdf
* Reference code: https://github.com/pytorch/vision/blob/master/torchvision/models/vgg.py
## Vision Transformer [[vision_transformer.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vision_transformer.py)] ## Vision Transformer [[vision_transformer.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vision_transformer.py)]
* Paper: `An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale` - https://arxiv.org/abs/2010.11929 * Paper: `An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale` - https://arxiv.org/abs/2010.11929
* Reference code and pretrained weights: https://github.com/google-research/vision_transformer * Reference code and pretrained weights: https://github.com/google-research/vision_transformer

@ -10,9 +10,9 @@ The variety of training args is large and not all combinations of options (or ev
To train an SE-ResNet34 on ImageNet, locally distributed, 4 GPUs, one process per GPU w/ cosine schedule, random-erasing prob of 50% and per-pixel random value: To train an SE-ResNet34 on ImageNet, locally distributed, 4 GPUs, one process per GPU w/ cosine schedule, random-erasing prob of 50% and per-pixel random value:
`./distributed_train.sh 4 /data/imagenet --model seresnet34 --sched cosine --epochs 150 --warmup-epochs 5 --lr 0.4 --reprob 0.5 --remode pixel --batch-size 256 -j 4` `./distributed_train.sh 4 /data/imagenet --model seresnet34 --sched cosine --epochs 150 --warmup-epochs 5 --lr 0.4 --reprob 0.5 --remode pixel --batch-size 256 --amp -j 4`
NOTE: NVIDIA APEX should be installed to run in per-process distributed via DDP or to enable AMP mixed precision with the --amp flag NOTE: It is recommended to use PyTorch 1.7+ w/ PyTorch native AMP and DDP instead of APEX AMP. `--amp` defaults to native AMP as of timm ver 0.4.3. `--apex-amp` will force use of APEX components if they are installed.
## Validation / Inference Scripts ## Validation / Inference Scripts
@ -24,4 +24,4 @@ To validate with the model's pretrained weights (if they exist):
To run inference from a checkpoint: To run inference from a checkpoint:
`python inference.py /imagenet/validation/ --model mobilenetv3_large_100 --checkpoint ./output/model_best.pth.tar` `python inference.py /imagenet/validation/ --model mobilenetv3_large_100 --checkpoint ./output/train/model_best.pth.tar`

@ -83,7 +83,6 @@ def test_model_default_cfgs(model_name, batch_size):
cfg = model.default_cfg cfg = model.default_cfg
classifier = cfg['classifier'] classifier = cfg['classifier']
first_conv = cfg['first_conv']
pool_size = cfg['pool_size'] pool_size = cfg['pool_size']
input_size = model.default_cfg['input_size'] input_size = model.default_cfg['input_size']
@ -111,9 +110,16 @@ def test_model_default_cfgs(model_name, batch_size):
# FIXME mobilenetv3 forward_features vs removed pooling differ # FIXME mobilenetv3 forward_features vs removed pooling differ
assert outputs.shape[-1] == pool_size[-1] and outputs.shape[-2] == pool_size[-2] assert outputs.shape[-1] == pool_size[-1] and outputs.shape[-2] == pool_size[-2]
# check classifier and first convolution names match those in default_cfg # check classifier name matches default_cfg
assert classifier + ".weight" in state_dict.keys(), f'{classifier} not in model params' assert classifier + ".weight" in state_dict.keys(), f'{classifier} not in model params'
assert first_conv + ".weight" in state_dict.keys(), f'{first_conv} not in model params'
# check first conv(s) names match default_cfg
first_conv = cfg['first_conv']
if isinstance(first_conv, str):
first_conv = (first_conv,)
assert isinstance(first_conv, (tuple, list))
for fc in first_conv:
assert fc + ".weight" in state_dict.keys(), f'{fc} not in model params'
if 'GITHUB_ACTIONS' not in os.environ: if 'GITHUB_ACTIONS' not in os.environ:

@ -1,3 +1,4 @@
from .byobnet import *
from .cspnet import * from .cspnet import *
from .densenet import * from .densenet import *
from .dla import * from .dla import *
@ -23,6 +24,7 @@ from .selecsls import *
from .senet import * from .senet import *
from .sknet import * from .sknet import *
from .tresnet import * from .tresnet import *
from .vgg import *
from .vision_transformer import * from .vision_transformer import *
from .vovnet import * from .vovnet import *
from .xception import * from .xception import *

@ -0,0 +1,739 @@
""" Bring-Your-Own-Blocks Network
A flexible network w/ dataclass based config for stacking those NN blocks.
This model is currently used to implement the following networks:
GPU Efficient (ResNets) - gernet_l/m/s (original versions called genet, but this was already used (by SENet author)).
Paper: `Neural Architecture Design for GPU-Efficient Networks` - https://arxiv.org/abs/2006.14090
Code and weights: https://github.com/idstcv/GPU-Efficient-Networks, licensed Apache 2.0
RepVGG - repvgg_*
Paper: `Making VGG-style ConvNets Great Again` - https://arxiv.org/abs/2101.03697
Code and weights: https://github.com/DingXiaoH/RepVGG, licensed MIT
In all cases the models have been modified to fit within the design of ByobNet. I've remapped
the original weights and verified accuracies.
For GPU Efficient nets, I used the original names for the blocks since they were for the most part
the same as original residual blocks in ResNe(X)t, DarkNet, and other existing models. Note also some
changes introduced in RegNet were also present in the stem and bottleneck blocks for this model.
A significant number of different network archs can be implemented here, including variants of the
above nets that include attention.
Hacked together by / copyright Ross Wightman, 2021.
"""
import math
from dataclasses import dataclass, field
from collections import OrderedDict
from typing import Tuple, Dict, Optional, Union, Any, Callable
from functools import partial
import torch
import torch.nn as nn
from timm.data import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD
from .helpers import build_model_with_cfg
from .layers import ClassifierHead, ConvBnAct, DropPath, AvgPool2dSame, \
create_conv2d, get_act_layer, get_attn, convert_norm_act, make_divisible
from .registry import register_model
__all__ = ['ByobNet', 'ByobCfg', 'BlocksCfg']
def _cfg(url='', **kwargs):
return {
'url': url, 'num_classes': 1000, 'input_size': (3, 224, 224), 'pool_size': (7, 7),
'crop_pct': 0.875, 'interpolation': 'bilinear',
'mean': IMAGENET_DEFAULT_MEAN, 'std': IMAGENET_DEFAULT_STD,
'first_conv': 'stem.conv', 'classifier': 'head.fc',
**kwargs
}
default_cfgs = {
# GPU-Efficient (ResNet) weights
'gernet_s': _cfg(
url='https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-ger-weights/gernet_s-756b4751.pth'),
'gernet_m': _cfg(
url='https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-ger-weights/gernet_m-0873c53a.pth'),
'gernet_l': _cfg(
url='https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-ger-weights/gernet_l-f31e2e8d.pth',
input_size=(3, 256, 256), pool_size=(8, 8)),
# RepVGG weights
'repvgg_a2': _cfg(
url='https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-repvgg-weights/repvgg_a2-c1ee6d2b.pth',
first_conv=('stem.conv_kxk.conv', 'stem.conv_1x1.conv')),
'repvgg_b0': _cfg(
url='https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-repvgg-weights/repvgg_b0-80ac3f1b.pth',
first_conv=('stem.conv_kxk.conv', 'stem.conv_1x1.conv')),
'repvgg_b1': _cfg(
url='https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-repvgg-weights/repvgg_b1-77ca2989.pth',
first_conv=('stem.conv_kxk.conv', 'stem.conv_1x1.conv')),
'repvgg_b1g4': _cfg(
url='https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-repvgg-weights/repvgg_b1g4-abde5d92.pth',
first_conv=('stem.conv_kxk.conv', 'stem.conv_1x1.conv')),
'repvgg_b2': _cfg(
url='https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-repvgg-weights/repvgg_b2-25b7494e.pth',
first_conv=('stem.conv_kxk.conv', 'stem.conv_1x1.conv')),
'repvgg_b2g4': _cfg(
url='https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-repvgg-weights/repvgg_b2g4-165a85f2.pth',
first_conv=('stem.conv_kxk.conv', 'stem.conv_1x1.conv')),
'repvgg_b3': _cfg(
url='https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-repvgg-weights/repvgg_b3-199bc50d.pth',
first_conv=('stem.conv_kxk.conv', 'stem.conv_1x1.conv')),
'repvgg_b3g4': _cfg(
url='https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-repvgg-weights/repvgg_b3g4-73c370bf.pth',
first_conv=('stem.conv_kxk.conv', 'stem.conv_1x1.conv')),
}
@dataclass
class BlocksCfg:
type: Union[str, nn.Module]
d: int # block depth (number of block repeats in stage)
c: int # number of output channels for each block in stage
s: int = 2 # stride of stage (first block)
gs: Optional[Union[int, Callable]] = None # group-size of blocks in stage, conv is depthwise if gs == 1
br: float = 1. # bottleneck-ratio of blocks in stage
@dataclass
class ByobCfg:
blocks: Tuple[BlocksCfg, ...]
downsample: str = 'conv1x1'
stem_type: str = '3x3'
stem_chs: int = 32
width_factor: float = 1.0
num_features: int = 0 # num out_channels for final conv, no final 1x1 conv if 0
zero_init_last_bn: bool = True
act_layer: str = 'relu'
norm_layer: nn.Module = nn.BatchNorm2d
attn_layer: Optional[str] = None
attn_kwargs: dict = field(default_factory=lambda: dict())
def _rep_vgg_bcfg(d=(4, 6, 16, 1), wf=(1., 1., 1., 1.), groups=0):
c = (64, 128, 256, 512)
group_size = 0
if groups > 0:
group_size = lambda chs, idx: chs // groups if (idx + 1) % 2 == 0 else 0
bcfg = tuple([BlocksCfg(type='rep', d=d, c=c * wf, gs=group_size) for d, c, wf in zip(d, c, wf)])
return bcfg
model_cfgs = dict(
gernet_l=ByobCfg(
blocks=(
BlocksCfg(type='basic', d=1, c=128, s=2, gs=0, br=1.),
BlocksCfg(type='basic', d=2, c=192, s=2, gs=0, br=1.),
BlocksCfg(type='bottle', d=6, c=640, s=2, gs=0, br=1 / 4),
BlocksCfg(type='bottle', d=5, c=640, s=2, gs=1, br=3.),
BlocksCfg(type='bottle', d=4, c=640, s=1, gs=1, br=3.),
),
stem_chs=32,
num_features=2560,
),
gernet_m=ByobCfg(
blocks=(
BlocksCfg(type='basic', d=1, c=128, s=2, gs=0, br=1.),
BlocksCfg(type='basic', d=2, c=192, s=2, gs=0, br=1.),
BlocksCfg(type='bottle', d=6, c=640, s=2, gs=0, br=1 / 4),
BlocksCfg(type='bottle', d=4, c=640, s=2, gs=1, br=3.),
BlocksCfg(type='bottle', d=1, c=640, s=1, gs=1, br=3.),
),
stem_chs=32,
num_features=2560,
),
gernet_s=ByobCfg(
blocks=(
BlocksCfg(type='basic', d=1, c=48, s=2, gs=0, br=1.),
BlocksCfg(type='basic', d=3, c=48, s=2, gs=0, br=1.),
BlocksCfg(type='bottle', d=7, c=384, s=2, gs=0, br=1 / 4),
BlocksCfg(type='bottle', d=2, c=560, s=2, gs=1, br=3.),
BlocksCfg(type='bottle', d=1, c=256, s=1, gs=1, br=3.),
),
stem_chs=13,
num_features=1920,
),
repvgg_a2=ByobCfg(
blocks=_rep_vgg_bcfg(d=(2, 4, 14, 1), wf=(1.5, 1.5, 1.5, 2.75)),
stem_type='rep',
stem_chs=64,
),
repvgg_b0=ByobCfg(
blocks=_rep_vgg_bcfg(wf=(1., 1., 1., 2.5)),
stem_type='rep',
stem_chs=64,
),
repvgg_b1=ByobCfg(
blocks=_rep_vgg_bcfg(wf=(2., 2., 2., 4.)),
stem_type='rep',
stem_chs=64,
),
repvgg_b1g4=ByobCfg(
blocks=_rep_vgg_bcfg(wf=(2., 2., 2., 4.), groups=4),
stem_type='rep',
stem_chs=64,
),
repvgg_b2=ByobCfg(
blocks=_rep_vgg_bcfg(wf=(2.5, 2.5, 2.5, 5.)),
stem_type='rep',
stem_chs=64,
),
repvgg_b2g4=ByobCfg(
blocks=_rep_vgg_bcfg(wf=(2.5, 2.5, 2.5, 5.), groups=4),
stem_type='rep',
stem_chs=64,
),
repvgg_b3=ByobCfg(
blocks=_rep_vgg_bcfg(wf=(3., 3., 3., 5.)),
stem_type='rep',
stem_chs=64,
),
repvgg_b3g4=ByobCfg(
blocks=_rep_vgg_bcfg(wf=(3., 3., 3., 5.), groups=4),
stem_type='rep',
stem_chs=64,
),
)
def _na_args(cfg: dict):
return dict(
norm_layer=cfg.get('norm_layer', nn.BatchNorm2d),
act_layer=cfg.get('act_layer', nn.ReLU))
def _ex_tuple(cfg: dict, *names):
return tuple([cfg.get(n, None) for n in names])
def num_groups(group_size, channels):
if not group_size: # 0 or None
return 1 # normal conv with 1 group
else:
# NOTE group_size == 1 -> depthwise conv
assert channels % group_size == 0
return channels // group_size
class DownsampleAvg(nn.Module):
def __init__(self, in_chs, out_chs, stride=1, dilation=1, apply_act=False, norm_layer=None, act_layer=None):
""" AvgPool Downsampling as in 'D' ResNet variants."""
super(DownsampleAvg, self).__init__()
avg_stride = stride if dilation == 1 else 1
if stride > 1 or dilation > 1:
avg_pool_fn = AvgPool2dSame if avg_stride == 1 and dilation > 1 else nn.AvgPool2d
self.pool = avg_pool_fn(2, avg_stride, ceil_mode=True, count_include_pad=False)
else:
self.pool = nn.Identity()
self.conv = ConvBnAct(in_chs, out_chs, 1, apply_act=apply_act, norm_layer=norm_layer, act_layer=act_layer)
def forward(self, x):
return self.conv(self.pool(x))
def create_downsample(type, **kwargs):
if type == 'avg':
return DownsampleAvg(**kwargs)
else:
return ConvBnAct(kwargs.pop('in_chs'), kwargs.pop('out_chs'), kernel_size=1, **kwargs)
class BasicBlock(nn.Module):
""" ResNet Basic Block - kxk + kxk
"""
def __init__(
self, in_chs, out_chs, kernel_size=3, stride=1, dilation=(1, 1), group_size=None, bottle_ratio=1.0,
downsample='avg', linear_out=False, layer_cfg=None, drop_block=None, drop_path_rate=0.):
super(BasicBlock, self).__init__()
layer_cfg = layer_cfg or {}
act_layer, attn_layer = _ex_tuple(layer_cfg, 'act_layer', 'attn_layer')
layer_args = _na_args(layer_cfg)
mid_chs = make_divisible(out_chs * bottle_ratio)
groups = num_groups(group_size, mid_chs)
if in_chs != out_chs or stride != 1 or dilation[0] != dilation[1]:
self.shortcut = create_downsample(
downsample, in_chs=in_chs, out_chs=out_chs, stride=stride, dilation=dilation[0],
apply_act=False, **layer_args)
else:
self.shortcut = nn.Identity()
self.conv1_kxk = ConvBnAct(in_chs, mid_chs, kernel_size, stride=stride, dilation=dilation[0], **layer_args)
self.conv2_kxk = ConvBnAct(
mid_chs, out_chs, kernel_size, dilation=dilation[1], groups=groups,
drop_block=drop_block, apply_act=False, **layer_args)
self.attn = nn.Identity() if attn_layer is None else attn_layer(out_chs)
self.drop_path = DropPath(drop_path_rate) if drop_path_rate > 0. else nn.Identity()
self.act = nn.Identity() if linear_out else act_layer(inplace=True)
def init_weights(self, zero_init_last_bn=False):
if zero_init_last_bn:
nn.init.zeros_(self.conv2_kxk.bn.weight)
def forward(self, x):
shortcut = self.shortcut(x)
# residual path
x = self.conv1_kxk(x)
x = self.conv2_kxk(x)
x = self.attn(x)
x = self.drop_path(x)
x = self.act(x + shortcut)
return x
class BottleneckBlock(nn.Module):
""" ResNet-like Bottleneck Block - 1x1 - kxk - 1x1
"""
def __init__(self, in_chs, out_chs, kernel_size=3, stride=1, dilation=(1, 1), bottle_ratio=1., group_size=None,
downsample='avg', linear_out=False, layer_cfg=None, drop_block=None, drop_path_rate=0.):
super(BottleneckBlock, self).__init__()
layer_cfg = layer_cfg or {}
act_layer, attn_layer = _ex_tuple(layer_cfg, 'act_layer', 'attn_layer')
layer_args = _na_args(layer_cfg)
mid_chs = make_divisible(out_chs * bottle_ratio)
groups = num_groups(group_size, mid_chs)
if in_chs != out_chs or stride != 1 or dilation[0] != dilation[1]:
self.shortcut = create_downsample(
downsample, in_chs=in_chs, out_chs=out_chs, stride=stride, dilation=dilation[0],
apply_act=False, **layer_args)
else:
self.shortcut = nn.Identity()
self.conv1_1x1 = ConvBnAct(in_chs, mid_chs, 1, **layer_args)
self.conv2_kxk = ConvBnAct(
mid_chs, mid_chs, kernel_size, stride=stride, dilation=dilation[0],
groups=groups, drop_block=drop_block, **layer_args)
self.attn = nn.Identity() if attn_layer is None else attn_layer(mid_chs)
self.conv3_1x1 = ConvBnAct(mid_chs, out_chs, 1, apply_act=False, **layer_args)
self.drop_path = DropPath(drop_path_rate) if drop_path_rate > 0. else nn.Identity()
self.act = nn.Identity() if linear_out else act_layer(inplace=True)
def init_weights(self, zero_init_last_bn=False):
if zero_init_last_bn:
nn.init.zeros_(self.conv3_1x1.bn.weight)
def forward(self, x):
shortcut = self.shortcut(x)
x = self.conv1_1x1(x)
x = self.conv2_kxk(x)
x = self.attn(x)
x = self.conv3_1x1(x)
x = self.drop_path(x)
x = self.act(x + shortcut)
return x
class DarkBlock(nn.Module):
""" DarkNet-like (1x1 + 3x3 w/ stride) block
The GE-Net impl included a 1x1 + 3x3 block in their search space. It was not used in the feature models.
This block is pretty much a DarkNet block (also DenseNet) hence the name. Neither DarkNet or DenseNet
uses strides within the block (external 3x3 or maxpool downsampling is done in front of the block repeats).
If one does want to use a lot of these blocks w/ stride, I'd recommend using the EdgeBlock (3x3 /w stride + 1x1)
for more optimal compute.
"""
def __init__(self, in_chs, out_chs, kernel_size=3, stride=1, dilation=(1, 1), bottle_ratio=1.0, group_size=None,
downsample='avg', linear_out=False, layer_cfg=None, drop_block=None, drop_path_rate=0.):
super(DarkBlock, self).__init__()
layer_cfg = layer_cfg or {}
act_layer, attn_layer = _ex_tuple(layer_cfg, 'act_layer', 'attn_layer')
layer_args = _na_args(layer_cfg)
mid_chs = make_divisible(out_chs * bottle_ratio)
groups = num_groups(group_size, mid_chs)
if in_chs != out_chs or stride != 1 or dilation[0] != dilation[1]:
self.shortcut = create_downsample(
downsample, in_chs=in_chs, out_chs=out_chs, stride=stride, dilation=dilation[0],
apply_act=False, **layer_args)
else:
self.shortcut = nn.Identity()
self.conv1_1x1 = ConvBnAct(in_chs, mid_chs, 1, **layer_args)
self.conv2_kxk = ConvBnAct(
mid_chs, out_chs, kernel_size, stride=stride, dilation=dilation[0],
groups=groups, drop_block=drop_block, apply_act=False, **layer_args)
self.attn = nn.Identity() if attn_layer is None else attn_layer(out_chs)
self.drop_path = DropPath(drop_path_rate) if drop_path_rate > 0. else nn.Identity()
self.act = nn.Identity() if linear_out else act_layer(inplace=True)
def init_weights(self, zero_init_last_bn=False):
if zero_init_last_bn:
nn.init.zeros_(self.conv2_kxk.bn.weight)
def forward(self, x):
shortcut = self.shortcut(x)
x = self.conv1_1x1(x)
x = self.conv2_kxk(x)
x = self.attn(x)
x = self.drop_path(x)
x = self.act(x + shortcut)
return x
class EdgeBlock(nn.Module):
""" EdgeResidual-like (3x3 + 1x1) block
A two layer block like DarkBlock, but with the order of the 3x3 and 1x1 convs reversed.
Very similar to the EfficientNet Edge-Residual block but this block it ends with activations, is
intended to be used with either expansion or bottleneck contraction, and can use DW/group/non-grouped convs.
FIXME is there a more common 3x3 + 1x1 conv block to name this after?
"""
def __init__(self, in_chs, out_chs, kernel_size=3, stride=1, dilation=(1, 1), bottle_ratio=1.0, group_size=None,
downsample='avg', linear_out=False, layer_cfg=None, drop_block=None, drop_path_rate=0.):
super(EdgeBlock, self).__init__()
layer_cfg = layer_cfg or {}
act_layer, attn_layer = _ex_tuple(layer_cfg, 'act_layer', 'attn_layer')
layer_args = _na_args(layer_cfg)
mid_chs = make_divisible(out_chs * bottle_ratio)
groups = num_groups(group_size, mid_chs)
if in_chs != out_chs or stride != 1 or dilation[0] != dilation[1]:
self.shortcut = create_downsample(
downsample, in_chs=in_chs, out_chs=out_chs, stride=stride, dilation=dilation[0],
apply_act=False, **layer_args)
else:
self.shortcut = nn.Identity()
self.conv1_kxk = ConvBnAct(
in_chs, mid_chs, kernel_size, stride=stride, dilation=dilation[0],
groups=groups, drop_block=drop_block, **layer_args)
self.attn = nn.Identity() if attn_layer is None else attn_layer(out_chs)
self.conv2_1x1 = ConvBnAct(mid_chs, out_chs, 1, apply_act=False, **layer_args)
self.drop_path = DropPath(drop_path_rate) if drop_path_rate > 0. else nn.Identity()
self.act = nn.Identity() if linear_out else act_layer(inplace=True)
def init_weights(self, zero_init_last_bn=False):
if zero_init_last_bn:
nn.init.zeros_(self.conv2_1x1.bn.weight)
def forward(self, x):
shortcut = self.shortcut(x)
x = self.conv1_kxk(x)
x = self.attn(x)
x = self.conv2_1x1(x)
x = self.drop_path(x)
x = self.act(x + shortcut)
return x
class RepVggBlock(nn.Module):
""" RepVGG Block.
Adapted from impl at https://github.com/DingXiaoH/RepVGG
This version does not currently support the deploy optimization. It is currently fixed in 'train' mode.
"""
def __init__(self, in_chs, out_chs, kernel_size=3, stride=1, dilation=(1, 1), bottle_ratio=1.0, group_size=None,
downsample='', layer_cfg=None, drop_block=None, drop_path_rate=0.):
super(RepVggBlock, self).__init__()
layer_cfg = layer_cfg or {}
act_layer, norm_layer, attn_layer = _ex_tuple(layer_cfg, 'act_layer', 'norm_layer', 'attn_layer')
norm_layer = convert_norm_act(norm_layer=norm_layer, act_layer=act_layer)
layer_args = _na_args(layer_cfg)
groups = num_groups(group_size, in_chs)
use_ident = in_chs == out_chs and stride == 1 and dilation[0] == dilation[1]
self.identity = norm_layer(out_chs, apply_act=False) if use_ident else None
self.conv_kxk = ConvBnAct(
in_chs, out_chs, kernel_size, stride=stride, dilation=dilation[0],
groups=groups, drop_block=drop_block, apply_act=False, **layer_args)
self.conv_1x1 = ConvBnAct(in_chs, out_chs, 1, stride=stride, groups=groups, apply_act=False, **layer_args)
self.attn = nn.Identity() if attn_layer is None else attn_layer(out_chs)
self.drop_path = DropPath(drop_path_rate) if drop_path_rate > 0. and use_ident else nn.Identity()
self.act = act_layer(inplace=True)
def init_weights(self, zero_init_last_bn=False):
# NOTE this init overrides that base model init with specific changes for the block type
for m in self.modules():
if isinstance(m, nn.BatchNorm2d):
nn.init.normal_(m.weight, .1, .1)
nn.init.normal_(m.bias, 0, .1)
def forward(self, x):
if self.identity is None:
x = self.conv_1x1(x) + self.conv_kxk(x)
else:
identity = self.identity(x)
x = self.conv_1x1(x) + self.conv_kxk(x)
x = self.drop_path(x) # not in the paper / official impl, experimental
x = x + identity
x = self.attn(x) # no attn in the paper / official impl, experimental
x = self.act(x)
return x
_block_registry = dict(
basic=BasicBlock,
bottle=BottleneckBlock,
dark=DarkBlock,
edge=EdgeBlock,
rep=RepVggBlock,
)
def register_block(block_type:str, block_fn: nn.Module):
_block_registry[block_type] = block_fn
def create_block(block: Union[str, nn.Module], **kwargs):
if isinstance(block, (nn.Module, partial)):
return block(**kwargs)
assert block in _block_registry, f'Unknown block type ({block}'
return _block_registry[block](**kwargs)
def create_stem(in_chs, out_chs, stem_type='', layer_cfg=None):
layer_cfg = layer_cfg or {}
layer_args = _na_args(layer_cfg)
assert stem_type in ('', 'deep', 'deep_tiered', '3x3', '7x7', 'rep')
if 'deep' in stem_type:
# 3 deep 3x3 conv stack
stem = OrderedDict()
stem_chs = (out_chs // 2, out_chs // 2)
if 'tiered' in stem_type:
stem_chs = (3 * stem_chs[0] // 4, stem_chs[1])
norm_layer, act_layer = _ex_tuple(layer_args, 'norm_layer', 'act_layer')
stem['conv1'] = create_conv2d(in_chs, stem_chs[0], kernel_size=3, stride=2)
stem['conv2'] = create_conv2d(stem_chs[0], stem_chs[1], kernel_size=3, stride=1)
stem['conv3'] = create_conv2d(stem_chs[1], out_chs, kernel_size=3, stride=1)
norm_act_layer = convert_norm_act(norm_layer=norm_layer, act_layer=act_layer)
stem['na'] = norm_act_layer(out_chs)
stem = nn.Sequential(stem)
elif '7x7' in stem_type:
# 7x7 stem conv as in ResNet
stem = ConvBnAct(in_chs, out_chs, 7, stride=2, **layer_args)
elif 'rep' in stem_type:
stem = RepVggBlock(in_chs, out_chs, stride=2, layer_cfg=layer_cfg)
else:
# 3x3 stem conv as in RegNet
stem = ConvBnAct(in_chs, out_chs, 3, stride=2, **layer_args)
return stem
class ByobNet(nn.Module):
""" 'Bring-your-own-blocks' Net
A flexible network backbone that allows building model stem + blocks via
dataclass cfg definition w/ factory functions for module instantiation.
Current assumption is that both stem and blocks are in conv-bn-act order (w/ block ending in act).
"""
def __init__(self, cfg: ByobCfg, num_classes=1000, in_chans=3, global_pool='avg', output_stride=32,
zero_init_last_bn=True, drop_rate=0., drop_path_rate=0.):
super().__init__()
self.num_classes = num_classes
self.drop_rate = drop_rate
norm_layer = cfg.norm_layer
act_layer = get_act_layer(cfg.act_layer)
attn_layer = partial(get_attn(cfg.attn_layer), **cfg.attn_kwargs) if cfg.attn_layer else None
layer_cfg = dict(norm_layer=norm_layer, act_layer=act_layer, attn_layer=attn_layer)
stem_chs = int(round((cfg.stem_chs or cfg.blocks[0].c) * cfg.width_factor))
self.stem = create_stem(in_chans, stem_chs, cfg.stem_type, layer_cfg=layer_cfg)
self.feature_info = []
depths = [bc.d for bc in cfg.blocks]
dpr = [x.tolist() for x in torch.linspace(0, drop_path_rate, sum(depths)).split(depths)]
prev_name = 'stem'
prev_chs = stem_chs
net_stride = 2
dilation = 1
stages = []
for stage_idx, block_cfg in enumerate(cfg.blocks):
stride = block_cfg.s
if stride != 1:
self.feature_info.append(dict(num_chs=prev_chs, reduction=net_stride, module=prev_name))
if net_stride >= output_stride and stride > 1:
dilation *= stride
stride = 1
net_stride *= stride
first_dilation = 1 if dilation in (1, 2) else 2
blocks = []
for block_idx in range(block_cfg.d):
out_chs = make_divisible(block_cfg.c * cfg.width_factor)
group_size = block_cfg.gs
if isinstance(group_size, Callable):
group_size = group_size(out_chs, block_idx)
block_kwargs = dict( # Blocks used in this model must accept these arguments
in_chs=prev_chs,
out_chs=out_chs,
stride=stride if block_idx == 0 else 1,
dilation=(first_dilation, dilation),
group_size=group_size,
bottle_ratio=block_cfg.br,
downsample=cfg.downsample,
drop_path_rate=dpr[stage_idx][block_idx],
layer_cfg=layer_cfg,
)
blocks += [create_block(block_cfg.type, **block_kwargs)]
first_dilation = dilation
prev_chs = out_chs
stages += [nn.Sequential(*blocks)]
prev_name = f'stages.{stage_idx}'
self.stages = nn.Sequential(*stages)
if cfg.num_features:
self.num_features = int(round(cfg.width_factor * cfg.num_features))
self.final_conv = ConvBnAct(prev_chs, self.num_features, 1, **_na_args(layer_cfg))
else:
self.num_features = prev_chs
self.final_conv = nn.Identity()
self.feature_info += [dict(num_chs=self.num_features, reduction=net_stride, module='final_conv')]
self.head = ClassifierHead(self.num_features, num_classes, pool_type=global_pool, drop_rate=self.drop_rate)
for n, m in self.named_modules():
if isinstance(m, nn.Conv2d):
fan_out = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
fan_out //= m.groups
m.weight.data.normal_(0, math.sqrt(2.0 / fan_out))
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, mean=0.0, std=0.01)
nn.init.zeros_(m.bias)
elif isinstance(m, nn.BatchNorm2d):
nn.init.ones_(m.weight)
nn.init.zeros_(m.bias)
for m in self.modules():
# call each block's weight init for block-specific overrides to init above
if hasattr(m, 'init_weights'):
m.init_weights(zero_init_last_bn=zero_init_last_bn)
def get_classifier(self):
return self.head.fc
def reset_classifier(self, num_classes, global_pool='avg'):
self.head = ClassifierHead(self.num_features, num_classes, pool_type=global_pool, drop_rate=self.drop_rate)
def forward_features(self, x):
x = self.stem(x)
x = self.stages(x)
x = self.final_conv(x)
return x
def forward(self, x):
x = self.forward_features(x)
x = self.head(x)
return x
def _create_byobnet(variant, pretrained=False, **kwargs):
return build_model_with_cfg(
ByobNet, variant, pretrained,
default_cfg=default_cfgs[variant],
model_cfg=model_cfgs[variant],
feature_cfg=dict(flatten_sequential=True),
**kwargs)
@register_model
def gernet_l(pretrained=False, **kwargs):
""" GEResNet-Large (GENet-Large from official impl)
`Neural Architecture Design for GPU-Efficient Networks` - https://arxiv.org/abs/2006.14090
"""
return _create_byobnet('gernet_l', pretrained=pretrained, **kwargs)
@register_model
def gernet_m(pretrained=False, **kwargs):
""" GEResNet-Medium (GENet-Normal from official impl)
`Neural Architecture Design for GPU-Efficient Networks` - https://arxiv.org/abs/2006.14090
"""
return _create_byobnet('gernet_m', pretrained=pretrained, **kwargs)
@register_model
def gernet_s(pretrained=False, **kwargs):
""" EResNet-Small (GENet-Small from official impl)
`Neural Architecture Design for GPU-Efficient Networks` - https://arxiv.org/abs/2006.14090
"""
return _create_byobnet('gernet_s', pretrained=pretrained, **kwargs)
@register_model
def repvgg_a2(pretrained=False, **kwargs):
""" RepVGG-A2
`Making VGG-style ConvNets Great Again` - https://arxiv.org/abs/2101.03697
"""
return _create_byobnet('repvgg_a2', pretrained=pretrained, **kwargs)
@register_model
def repvgg_b0(pretrained=False, **kwargs):
""" RepVGG-B0
`Making VGG-style ConvNets Great Again` - https://arxiv.org/abs/2101.03697
"""
return _create_byobnet('repvgg_b0', pretrained=pretrained, **kwargs)
@register_model
def repvgg_b1(pretrained=False, **kwargs):
""" RepVGG-B1
`Making VGG-style ConvNets Great Again` - https://arxiv.org/abs/2101.03697
"""
return _create_byobnet('repvgg_b1', pretrained=pretrained, **kwargs)
@register_model
def repvgg_b1g4(pretrained=False, **kwargs):
""" RepVGG-B1g4
`Making VGG-style ConvNets Great Again` - https://arxiv.org/abs/2101.03697
"""
return _create_byobnet('repvgg_b1g4', pretrained=pretrained, **kwargs)
@register_model
def repvgg_b2(pretrained=False, **kwargs):
""" RepVGG-B2
`Making VGG-style ConvNets Great Again` - https://arxiv.org/abs/2101.03697
"""
return _create_byobnet('repvgg_b2', pretrained=pretrained, **kwargs)
@register_model
def repvgg_b2g4(pretrained=False, **kwargs):
""" RepVGG-B2g4
`Making VGG-style ConvNets Great Again` - https://arxiv.org/abs/2101.03697
"""
return _create_byobnet('repvgg_b2g4', pretrained=pretrained, **kwargs)
@register_model
def repvgg_b3(pretrained=False, **kwargs):
""" RepVGG-B3
`Making VGG-style ConvNets Great Again` - https://arxiv.org/abs/2101.03697
"""
return _create_byobnet('repvgg_b3', pretrained=pretrained, **kwargs)
@register_model
def repvgg_b3g4(pretrained=False, **kwargs):
""" RepVGG-B3g4
`Making VGG-style ConvNets Great Again` - https://arxiv.org/abs/2101.03697
"""
return _create_byobnet('repvgg_b3g4', pretrained=pretrained, **kwargs)

@ -7,6 +7,7 @@ This implementation is compatible with the pretrained weights from cypw's MXNet
Hacked together by / Copyright 2020 Ross Wightman Hacked together by / Copyright 2020 Ross Wightman
""" """
from collections import OrderedDict from collections import OrderedDict
from functools import partial
from typing import Tuple from typing import Tuple
import torch import torch
@ -173,12 +174,14 @@ class DPN(nn.Module):
self.drop_rate = drop_rate self.drop_rate = drop_rate
self.b = b self.b = b
assert output_stride == 32 # FIXME look into dilation support assert output_stride == 32 # FIXME look into dilation support
norm_layer = partial(BatchNormAct2d, eps=.001)
fc_norm_layer = partial(BatchNormAct2d, eps=.001, act_layer=fc_act, inplace=False)
bw_factor = 1 if small else 4 bw_factor = 1 if small else 4
blocks = OrderedDict() blocks = OrderedDict()
# conv1 # conv1
blocks['conv1_1'] = ConvBnAct( blocks['conv1_1'] = ConvBnAct(
in_chans, num_init_features, kernel_size=3 if small else 7, stride=2, norm_kwargs=dict(eps=.001)) in_chans, num_init_features, kernel_size=3 if small else 7, stride=2, norm_layer=norm_layer)
blocks['conv1_pool'] = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) blocks['conv1_pool'] = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.feature_info = [dict(num_chs=num_init_features, reduction=2, module='features.conv1_1')] self.feature_info = [dict(num_chs=num_init_features, reduction=2, module='features.conv1_1')]
@ -226,8 +229,7 @@ class DPN(nn.Module):
in_chs += inc in_chs += inc
self.feature_info += [dict(num_chs=in_chs, reduction=32, module=f'features.conv5_{k_sec[3]}')] self.feature_info += [dict(num_chs=in_chs, reduction=32, module=f'features.conv5_{k_sec[3]}')]
def _fc_norm(f, eps): return BatchNormAct2d(f, eps=eps, act_layer=fc_act, inplace=False) blocks['conv5_bn_ac'] = CatBnAct(in_chs, norm_layer=fc_norm_layer)
blocks['conv5_bn_ac'] = CatBnAct(in_chs, norm_layer=_fc_norm)
self.num_features = in_chs self.num_features = in_chs
self.features = nn.Sequential(blocks) self.features = nn.Sequential(blocks)

@ -42,10 +42,8 @@ for Tensorflow 'SAME' padding. PyTorch symmetric padding behaves the way we'd w
class SeparableConv2d(nn.Module): class SeparableConv2d(nn.Module):
def __init__(self, inplanes, planes, kernel_size=3, stride=1, def __init__(self, inplanes, planes, kernel_size=3, stride=1, dilation=1, bias=False, norm_layer=None):
dilation=1, bias=False, norm_layer=None, norm_kwargs=None):
super(SeparableConv2d, self).__init__() super(SeparableConv2d, self).__init__()
norm_kwargs = norm_kwargs if norm_kwargs is not None else {}
self.kernel_size = kernel_size self.kernel_size = kernel_size
self.dilation = dilation self.dilation = dilation
@ -54,7 +52,7 @@ class SeparableConv2d(nn.Module):
self.conv_dw = nn.Conv2d( self.conv_dw = nn.Conv2d(
inplanes, inplanes, kernel_size, stride=stride, inplanes, inplanes, kernel_size, stride=stride,
padding=padding, dilation=dilation, groups=inplanes, bias=bias) padding=padding, dilation=dilation, groups=inplanes, bias=bias)
self.bn = norm_layer(num_features=inplanes, **norm_kwargs) self.bn = norm_layer(num_features=inplanes)
# pointwise convolution # pointwise convolution
self.conv_pw = nn.Conv2d(inplanes, planes, kernel_size=1, bias=bias) self.conv_pw = nn.Conv2d(inplanes, planes, kernel_size=1, bias=bias)
@ -66,10 +64,8 @@ class SeparableConv2d(nn.Module):
class Block(nn.Module): class Block(nn.Module):
def __init__(self, inplanes, planes, stride=1, dilation=1, start_with_relu=True, def __init__(self, inplanes, planes, stride=1, dilation=1, start_with_relu=True, norm_layer=None):
norm_layer=None, norm_kwargs=None, ):
super(Block, self).__init__() super(Block, self).__init__()
norm_kwargs = norm_kwargs if norm_kwargs is not None else {}
if isinstance(planes, (list, tuple)): if isinstance(planes, (list, tuple)):
assert len(planes) == 3 assert len(planes) == 3
else: else:
@ -80,7 +76,7 @@ class Block(nn.Module):
self.skip = nn.Sequential() self.skip = nn.Sequential()
self.skip.add_module('conv1', nn.Conv2d( self.skip.add_module('conv1', nn.Conv2d(
inplanes, outplanes, 1, stride=stride, bias=False)), inplanes, outplanes, 1, stride=stride, bias=False)),
self.skip.add_module('bn1', norm_layer(num_features=outplanes, **norm_kwargs)) self.skip.add_module('bn1', norm_layer(num_features=outplanes))
else: else:
self.skip = None self.skip = None
@ -88,9 +84,8 @@ class Block(nn.Module):
for i in range(3): for i in range(3):
rep['act%d' % (i + 1)] = nn.ReLU(inplace=True) rep['act%d' % (i + 1)] = nn.ReLU(inplace=True)
rep['conv%d' % (i + 1)] = SeparableConv2d( rep['conv%d' % (i + 1)] = SeparableConv2d(
inplanes, planes[i], 3, stride=stride if i == 2 else 1, dilation=dilation, inplanes, planes[i], 3, stride=stride if i == 2 else 1, dilation=dilation, norm_layer=norm_layer)
norm_layer=norm_layer, norm_kwargs=norm_kwargs) rep['bn%d' % (i + 1)] = norm_layer(planes[i])
rep['bn%d' % (i + 1)] = norm_layer(planes[i], **norm_kwargs)
inplanes = planes[i] inplanes = planes[i]
if not start_with_relu: if not start_with_relu:
@ -115,74 +110,63 @@ class Xception65(nn.Module):
""" """
def __init__(self, num_classes=1000, in_chans=3, output_stride=32, norm_layer=nn.BatchNorm2d, def __init__(self, num_classes=1000, in_chans=3, output_stride=32, norm_layer=nn.BatchNorm2d,
norm_kwargs=None, drop_rate=0., global_pool='avg'): drop_rate=0., global_pool='avg'):
super(Xception65, self).__init__() super(Xception65, self).__init__()
self.num_classes = num_classes self.num_classes = num_classes
self.drop_rate = drop_rate self.drop_rate = drop_rate
norm_kwargs = norm_kwargs if norm_kwargs is not None else {}
if output_stride == 32: if output_stride == 32:
entry_block3_stride = 2 entry_block3_stride = 2
exit_block20_stride = 2 exit_block20_stride = 2
middle_block_dilation = 1 middle_dilation = 1
exit_block_dilations = (1, 1) exit_dilation = (1, 1)
elif output_stride == 16: elif output_stride == 16:
entry_block3_stride = 2 entry_block3_stride = 2
exit_block20_stride = 1 exit_block20_stride = 1
middle_block_dilation = 1 middle_dilation = 1
exit_block_dilations = (1, 2) exit_dilation = (1, 2)
elif output_stride == 8: elif output_stride == 8:
entry_block3_stride = 1 entry_block3_stride = 1
exit_block20_stride = 1 exit_block20_stride = 1
middle_block_dilation = 2 middle_dilation = 2
exit_block_dilations = (2, 4) exit_dilation = (2, 4)
else: else:
raise NotImplementedError raise NotImplementedError
# Entry flow # Entry flow
self.conv1 = nn.Conv2d(in_chans, 32, kernel_size=3, stride=2, padding=1, bias=False) self.conv1 = nn.Conv2d(in_chans, 32, kernel_size=3, stride=2, padding=1, bias=False)
self.bn1 = norm_layer(num_features=32, **norm_kwargs) self.bn1 = norm_layer(num_features=32)
self.act1 = nn.ReLU(inplace=True) self.act1 = nn.ReLU(inplace=True)
self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1, bias=False) self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1, bias=False)
self.bn2 = norm_layer(num_features=64) self.bn2 = norm_layer(num_features=64)
self.act2 = nn.ReLU(inplace=True) self.act2 = nn.ReLU(inplace=True)
self.block1 = Block( self.block1 = Block(64, 128, stride=2, start_with_relu=False, norm_layer=norm_layer)
64, 128, stride=2, start_with_relu=False, norm_layer=norm_layer, norm_kwargs=norm_kwargs)
self.block1_act = nn.ReLU(inplace=True) self.block1_act = nn.ReLU(inplace=True)
self.block2 = Block( self.block2 = Block(128, 256, stride=2, start_with_relu=False, norm_layer=norm_layer)
128, 256, stride=2, start_with_relu=False, norm_layer=norm_layer, norm_kwargs=norm_kwargs) self.block3 = Block(256, 728, stride=entry_block3_stride, norm_layer=norm_layer)
self.block3 = Block(
256, 728, stride=entry_block3_stride, norm_layer=norm_layer, norm_kwargs=norm_kwargs)
# Middle flow # Middle flow
self.mid = nn.Sequential(OrderedDict([('block%d' % i, Block( self.mid = nn.Sequential(OrderedDict([('block%d' % i, Block(
728, 728, stride=1, dilation=middle_block_dilation, 728, 728, stride=1, dilation=middle_dilation, norm_layer=norm_layer)) for i in range(4, 20)]))
norm_layer=norm_layer, norm_kwargs=norm_kwargs)) for i in range(4, 20)]))
# Exit flow # Exit flow
self.block20 = Block( self.block20 = Block(
728, (728, 1024, 1024), stride=exit_block20_stride, dilation=exit_block_dilations[0], 728, (728, 1024, 1024), stride=exit_block20_stride, dilation=exit_dilation[0], norm_layer=norm_layer)
norm_layer=norm_layer, norm_kwargs=norm_kwargs)
self.block20_act = nn.ReLU(inplace=True) self.block20_act = nn.ReLU(inplace=True)
self.conv3 = SeparableConv2d( self.conv3 = SeparableConv2d(1024, 1536, 3, stride=1, dilation=exit_dilation[1], norm_layer=norm_layer)
1024, 1536, 3, stride=1, dilation=exit_block_dilations[1], self.bn3 = norm_layer(num_features=1536)
norm_layer=norm_layer, norm_kwargs=norm_kwargs)
self.bn3 = norm_layer(num_features=1536, **norm_kwargs)
self.act3 = nn.ReLU(inplace=True) self.act3 = nn.ReLU(inplace=True)
self.conv4 = SeparableConv2d( self.conv4 = SeparableConv2d(1536, 1536, 3, stride=1, dilation=exit_dilation[1], norm_layer=norm_layer)
1536, 1536, 3, stride=1, dilation=exit_block_dilations[1], self.bn4 = norm_layer(num_features=1536)
norm_layer=norm_layer, norm_kwargs=norm_kwargs)
self.bn4 = norm_layer(num_features=1536, **norm_kwargs)
self.act4 = nn.ReLU(inplace=True) self.act4 = nn.ReLU(inplace=True)
self.num_features = 2048 self.num_features = 2048
self.conv5 = SeparableConv2d( self.conv5 = SeparableConv2d(
1536, self.num_features, 3, stride=1, dilation=exit_block_dilations[1], 1536, self.num_features, 3, stride=1, dilation=exit_dilation[1], norm_layer=norm_layer)
norm_layer=norm_layer, norm_kwargs=norm_kwargs) self.bn5 = norm_layer(num_features=self.num_features)
self.bn5 = norm_layer(num_features=self.num_features, **norm_kwargs)
self.act5 = nn.ReLU(inplace=True) self.act5 = nn.ReLU(inplace=True)
self.feature_info = [ self.feature_info = [
dict(num_chs=64, reduction=2, module='act2'), dict(num_chs=64, reduction=2, module='act2'),

@ -148,67 +148,71 @@ def load_custom_pretrained(model, cfg=None, load_fn=None, progress=False, check_
_logger.warning("Valid function to load pretrained weights is not available, using random initialization.") _logger.warning("Valid function to load pretrained weights is not available, using random initialization.")
def load_pretrained(model, cfg=None, num_classes=1000, in_chans=3, filter_fn=None, strict=True, progress=False): def adapt_input_conv(in_chans, conv_weight):
if cfg is None: conv_type = conv_weight.dtype
cfg = getattr(model, 'default_cfg') conv_weight = conv_weight.float() # Some weights are in torch.half, ensure it's float for sum on CPU
if cfg is None or 'url' not in cfg or not cfg['url']: O, I, J, K = conv_weight.shape
_logger.warning("Pretrained model URL does not exist, using random initialization.")
return
state_dict = load_state_dict_from_url(cfg['url'], progress=progress, map_location='cpu')
if filter_fn is not None:
state_dict = filter_fn(state_dict)
if in_chans == 1: if in_chans == 1:
conv1_name = cfg['first_conv']
_logger.info('Converting first conv (%s) pretrained weights from 3 to 1 channel' % conv1_name)
conv1_weight = state_dict[conv1_name + '.weight']
# Some weights are in torch.half, ensure it's float for sum on CPU
conv1_type = conv1_weight.dtype
conv1_weight = conv1_weight.float()
O, I, J, K = conv1_weight.shape
if I > 3: if I > 3:
assert conv1_weight.shape[1] % 3 == 0 assert conv_weight.shape[1] % 3 == 0
# For models with space2depth stems # For models with space2depth stems
conv1_weight = conv1_weight.reshape(O, I // 3, 3, J, K) conv_weight = conv_weight.reshape(O, I // 3, 3, J, K)
conv1_weight = conv1_weight.sum(dim=2, keepdim=False) conv_weight = conv_weight.sum(dim=2, keepdim=False)
else: else:
conv1_weight = conv1_weight.sum(dim=1, keepdim=True) conv_weight = conv_weight.sum(dim=1, keepdim=True)
conv1_weight = conv1_weight.to(conv1_type)
state_dict[conv1_name + '.weight'] = conv1_weight
elif in_chans != 3: elif in_chans != 3:
conv1_name = cfg['first_conv']
conv1_weight = state_dict[conv1_name + '.weight']
conv1_type = conv1_weight.dtype
conv1_weight = conv1_weight.float()
O, I, J, K = conv1_weight.shape
if I != 3: if I != 3:
_logger.warning('Deleting first conv (%s) from pretrained weights.' % conv1_name) raise NotImplementedError('Weight format not supported by conversion.')
del state_dict[conv1_name + '.weight']
strict = False
else: else:
# NOTE this strategy should be better than random init, but there could be other combinations of # NOTE this strategy should be better than random init, but there could be other combinations of
# the original RGB input layer weights that'd work better for specific cases. # the original RGB input layer weights that'd work better for specific cases.
_logger.info('Repeating first conv (%s) weights in channel dim.' % conv1_name)
repeat = int(math.ceil(in_chans / 3)) repeat = int(math.ceil(in_chans / 3))
conv1_weight = conv1_weight.repeat(1, repeat, 1, 1)[:, :in_chans, :, :] conv_weight = conv_weight.repeat(1, repeat, 1, 1)[:, :in_chans, :, :]
conv1_weight *= (3 / float(in_chans)) conv_weight *= (3 / float(in_chans))
conv1_weight = conv1_weight.to(conv1_type) conv_weight = conv_weight.to(conv_type)
state_dict[conv1_name + '.weight'] = conv1_weight return conv_weight
def load_pretrained(model, cfg=None, num_classes=1000, in_chans=3, filter_fn=None, strict=True, progress=False):
if cfg is None:
cfg = getattr(model, 'default_cfg')
if cfg is None or 'url' not in cfg or not cfg['url']:
_logger.warning("No pretrained weights exist for this model. Using random initialization.")
return
state_dict = load_state_dict_from_url(cfg['url'], progress=progress, map_location='cpu')
if filter_fn is not None:
state_dict = filter_fn(state_dict)
input_convs = cfg.get('first_conv', None)
if input_convs is not None and in_chans != 3:
if isinstance(input_convs, str):
input_convs = (input_convs,)
for input_conv_name in input_convs:
weight_name = input_conv_name + '.weight'
try:
state_dict[weight_name] = adapt_input_conv(in_chans, state_dict[weight_name])
_logger.info(
f'Converted input conv {input_conv_name} pretrained weights from 3 to {in_chans} channel(s)')
except NotImplementedError as e:
del state_dict[weight_name]
strict = False
_logger.warning(
f'Unable to convert pretrained {input_conv_name} weights, using random init for this layer.')
classifier_name = cfg['classifier'] classifier_name = cfg['classifier']
if num_classes == 1000 and cfg['num_classes'] == 1001: label_offset = cfg.get('label_offset', 0)
# FIXME this special case is problematic as number of pretrained weight sources increases if num_classes != cfg['num_classes']:
# special case for imagenet trained models with extra background class in pretrained weights # completely discard fully connected if model num_classes doesn't match pretrained weights
classifier_weight = state_dict[classifier_name + '.weight']
state_dict[classifier_name + '.weight'] = classifier_weight[1:]
classifier_bias = state_dict[classifier_name + '.bias']
state_dict[classifier_name + '.bias'] = classifier_bias[1:]
elif num_classes != cfg['num_classes']:
# completely discard fully connected for all other differences between pretrained and created model
del state_dict[classifier_name + '.weight'] del state_dict[classifier_name + '.weight']
del state_dict[classifier_name + '.bias'] del state_dict[classifier_name + '.bias']
strict = False strict = False
elif label_offset > 0:
# special case for pretrained weights with an extra background class in pretrained weights
classifier_weight = state_dict[classifier_name + '.weight']
state_dict[classifier_name + '.weight'] = classifier_weight[label_offset:]
classifier_bias = state_dict[classifier_name + '.bias']
state_dict[classifier_name + '.bias'] = classifier_bias[label_offset:]
model.load_state_dict(state_dict, strict=strict) model.load_state_dict(state_dict, strict=strict)

@ -17,18 +17,20 @@ default_cfgs = {
# ported from http://download.tensorflow.org/models/inception_resnet_v2_2016_08_30.tar.gz # ported from http://download.tensorflow.org/models/inception_resnet_v2_2016_08_30.tar.gz
'inception_resnet_v2': { 'inception_resnet_v2': {
'url': 'https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/inception_resnet_v2-940b1cd6.pth', 'url': 'https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/inception_resnet_v2-940b1cd6.pth',
'num_classes': 1001, 'input_size': (3, 299, 299), 'pool_size': (8, 8), 'num_classes': 1000, 'input_size': (3, 299, 299), 'pool_size': (8, 8),
'crop_pct': 0.8975, 'interpolation': 'bicubic', 'crop_pct': 0.8975, 'interpolation': 'bicubic',
'mean': IMAGENET_INCEPTION_MEAN, 'std': IMAGENET_INCEPTION_STD, 'mean': IMAGENET_INCEPTION_MEAN, 'std': IMAGENET_INCEPTION_STD,
'first_conv': 'conv2d_1a.conv', 'classifier': 'classif', 'first_conv': 'conv2d_1a.conv', 'classifier': 'classif',
'label_offset': 1, # 1001 classes in pretrained weights
}, },
# ported from http://download.tensorflow.org/models/ens_adv_inception_resnet_v2_2017_08_18.tar.gz # ported from http://download.tensorflow.org/models/ens_adv_inception_resnet_v2_2017_08_18.tar.gz
'ens_adv_inception_resnet_v2': { 'ens_adv_inception_resnet_v2': {
'url': 'https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/ens_adv_inception_resnet_v2-2592a550.pth', 'url': 'https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/ens_adv_inception_resnet_v2-2592a550.pth',
'num_classes': 1001, 'input_size': (3, 299, 299), 'pool_size': (8, 8), 'num_classes': 1000, 'input_size': (3, 299, 299), 'pool_size': (8, 8),
'crop_pct': 0.8975, 'interpolation': 'bicubic', 'crop_pct': 0.8975, 'interpolation': 'bicubic',
'mean': IMAGENET_INCEPTION_MEAN, 'std': IMAGENET_INCEPTION_STD, 'mean': IMAGENET_INCEPTION_MEAN, 'std': IMAGENET_INCEPTION_STD,
'first_conv': 'conv2d_1a.conv', 'classifier': 'classif', 'first_conv': 'conv2d_1a.conv', 'classifier': 'classif',
'label_offset': 1, # 1001 classes in pretrained weights
} }
} }
@ -222,7 +224,7 @@ class Block8(nn.Module):
class InceptionResnetV2(nn.Module): class InceptionResnetV2(nn.Module):
def __init__(self, num_classes=1001, in_chans=3, drop_rate=0., output_stride=32, global_pool='avg'): def __init__(self, num_classes=1000, in_chans=3, drop_rate=0., output_stride=32, global_pool='avg'):
super(InceptionResnetV2, self).__init__() super(InceptionResnetV2, self).__init__()
self.drop_rate = drop_rate self.drop_rate = drop_rate
self.num_classes = num_classes self.num_classes = num_classes

@ -32,12 +32,12 @@ default_cfgs = {
# my port of Tensorflow SLIM weights (http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz) # my port of Tensorflow SLIM weights (http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz)
'tf_inception_v3': _cfg( 'tf_inception_v3': _cfg(
url='https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_inception_v3-e0069de4.pth', url='https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_inception_v3-e0069de4.pth',
num_classes=1001, has_aux=False), num_classes=1000, has_aux=False, label_offset=1),
# my port of Tensorflow adversarially trained Inception V3 from # my port of Tensorflow adversarially trained Inception V3 from
# http://download.tensorflow.org/models/adv_inception_v3_2017_08_18.tar.gz # http://download.tensorflow.org/models/adv_inception_v3_2017_08_18.tar.gz
'adv_inception_v3': _cfg( 'adv_inception_v3': _cfg(
url='https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/adv_inception_v3-9e27bd63.pth', url='https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/adv_inception_v3-9e27bd63.pth',
num_classes=1001, has_aux=False), num_classes=1000, has_aux=False, label_offset=1),
# from gluon pretrained models, best performing in terms of accuracy/loss metrics # from gluon pretrained models, best performing in terms of accuracy/loss metrics
# https://gluon-cv.mxnet.io/model_zoo/classification.html # https://gluon-cv.mxnet.io/model_zoo/classification.html
'gluon_inception_v3': _cfg( 'gluon_inception_v3': _cfg(

@ -16,10 +16,11 @@ __all__ = ['InceptionV4']
default_cfgs = { default_cfgs = {
'inception_v4': { 'inception_v4': {
'url': 'https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-cadene/inceptionv4-8e4777a0.pth', 'url': 'https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-cadene/inceptionv4-8e4777a0.pth',
'num_classes': 1001, 'input_size': (3, 299, 299), 'pool_size': (8, 8), 'num_classes': 1000, 'input_size': (3, 299, 299), 'pool_size': (8, 8),
'crop_pct': 0.875, 'interpolation': 'bicubic', 'crop_pct': 0.875, 'interpolation': 'bicubic',
'mean': IMAGENET_INCEPTION_MEAN, 'std': IMAGENET_INCEPTION_STD, 'mean': IMAGENET_INCEPTION_MEAN, 'std': IMAGENET_INCEPTION_STD,
'first_conv': 'features.0.conv', 'classifier': 'last_linear', 'first_conv': 'features.0.conv', 'classifier': 'last_linear',
'label_offset': 1, # 1001 classes in pretrained weights
} }
} }
@ -241,7 +242,7 @@ class InceptionC(nn.Module):
class InceptionV4(nn.Module): class InceptionV4(nn.Module):
def __init__(self, num_classes=1001, in_chans=3, output_stride=32, drop_rate=0., global_pool='avg'): def __init__(self, num_classes=1000, in_chans=3, output_stride=32, drop_rate=0., global_pool='avg'):
super(InceptionV4, self).__init__() super(InceptionV4, self).__init__()
assert output_stride == 32 assert output_stride == 32
self.drop_rate = drop_rate self.drop_rate = drop_rate

@ -12,7 +12,7 @@ from .conv_bn_act import ConvBnAct
from .create_act import create_act_layer, get_act_layer, get_act_fn from .create_act import create_act_layer, get_act_layer, get_act_fn
from .create_attn import get_attn, create_attn from .create_attn import get_attn, create_attn
from .create_conv2d import create_conv2d from .create_conv2d import create_conv2d
from .create_norm_act import create_norm_act, get_norm_act_layer from .create_norm_act import get_norm_act_layer, create_norm_act, convert_norm_act
from .drop import DropBlock2d, DropPath, drop_block_2d, drop_path from .drop import DropBlock2d, DropPath, drop_block_2d, drop_path
from .eca import EcaModule, CecaModule from .eca import EcaModule, CecaModule
from .evo_norm import EvoNormBatch2d, EvoNormSample2d from .evo_norm import EvoNormBatch2d, EvoNormSample2d

@ -5,23 +5,23 @@ Hacked together by / Copyright 2020 Ross Wightman
from torch import nn as nn from torch import nn as nn
from .create_conv2d import create_conv2d from .create_conv2d import create_conv2d
from .create_norm_act import convert_norm_act_type from .create_norm_act import convert_norm_act
class ConvBnAct(nn.Module): class ConvBnAct(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size=1, stride=1, padding='', dilation=1, groups=1, def __init__(self, in_channels, out_channels, kernel_size=1, stride=1, padding='', dilation=1, groups=1,
norm_layer=nn.BatchNorm2d, norm_kwargs=None, act_layer=nn.ReLU, apply_act=True, bias=False, apply_act=True, norm_layer=nn.BatchNorm2d, act_layer=nn.ReLU, aa_layer=None,
drop_block=None, aa_layer=None): drop_block=None):
super(ConvBnAct, self).__init__() super(ConvBnAct, self).__init__()
use_aa = aa_layer is not None use_aa = aa_layer is not None
self.conv = create_conv2d( self.conv = create_conv2d(
in_channels, out_channels, kernel_size, stride=1 if use_aa else stride, in_channels, out_channels, kernel_size, stride=1 if use_aa else stride,
padding=padding, dilation=dilation, groups=groups, bias=False) padding=padding, dilation=dilation, groups=groups, bias=bias)
# NOTE for backwards compatibility with models that use separate norm and act layer definitions # NOTE for backwards compatibility with models that use separate norm and act layer definitions
norm_act_layer, norm_act_args = convert_norm_act_type(norm_layer, act_layer, norm_kwargs) norm_act_layer = convert_norm_act(norm_layer, act_layer)
self.bn = norm_act_layer(out_channels, apply_act=apply_act, drop_block=drop_block, **norm_act_args) self.bn = norm_act_layer(out_channels, apply_act=apply_act, drop_block=drop_block)
self.aa = aa_layer(channels=out_channels) if stride == 2 and use_aa else None self.aa = aa_layer(channels=out_channels) if stride == 2 and use_aa else None
@property @property

@ -9,6 +9,8 @@ from .cbam import CbamModule, LightCbamModule
def get_attn(attn_type): def get_attn(attn_type):
if isinstance(attn_type, torch.nn.Module):
return attn_type
module_cls = None module_cls = None
if attn_type is not None: if attn_type is not None:
if isinstance(attn_type, str): if isinstance(attn_type, str):

@ -22,7 +22,8 @@ def create_conv2d(in_channels, out_channels, kernel_size, **kwargs):
m = MixedConv2d(in_channels, out_channels, kernel_size, **kwargs) m = MixedConv2d(in_channels, out_channels, kernel_size, **kwargs)
else: else:
depthwise = kwargs.pop('depthwise', False) depthwise = kwargs.pop('depthwise', False)
groups = out_channels if depthwise else kwargs.pop('groups', 1) # for DW out_channels must be multiple of in_channels as must have out_channels % groups == 0
groups = in_channels if depthwise else kwargs.pop('groups', 1)
if 'num_experts' in kwargs and kwargs['num_experts'] > 0: if 'num_experts' in kwargs and kwargs['num_experts'] > 0:
m = CondConv2d(in_channels, out_channels, kernel_size, groups=groups, **kwargs) m = CondConv2d(in_channels, out_channels, kernel_size, groups=groups, **kwargs)
else: else:

@ -19,6 +19,7 @@ from .inplace_abn import InplaceAbn
_NORM_ACT_TYPES = {BatchNormAct2d, GroupNormAct, EvoNormBatch2d, EvoNormSample2d, InplaceAbn} _NORM_ACT_TYPES = {BatchNormAct2d, GroupNormAct, EvoNormBatch2d, EvoNormSample2d, InplaceAbn}
_NORM_ACT_REQUIRES_ARG = {BatchNormAct2d, GroupNormAct, InplaceAbn} # requires act_layer arg to define act type _NORM_ACT_REQUIRES_ARG = {BatchNormAct2d, GroupNormAct, InplaceAbn} # requires act_layer arg to define act type
def get_norm_act_layer(layer_class): def get_norm_act_layer(layer_class):
layer_class = layer_class.replace('_', '').lower() layer_class = layer_class.replace('_', '').lower()
if layer_class.startswith("batchnorm"): if layer_class.startswith("batchnorm"):
@ -47,16 +48,22 @@ def create_norm_act(layer_type, num_features, apply_act=True, jit=False, **kwarg
return layer_instance return layer_instance
def convert_norm_act_type(norm_layer, act_layer, norm_kwargs=None): def convert_norm_act(norm_layer, act_layer):
assert isinstance(norm_layer, (type, str, types.FunctionType, functools.partial)) assert isinstance(norm_layer, (type, str, types.FunctionType, functools.partial))
assert act_layer is None or isinstance(act_layer, (type, str, types.FunctionType, functools.partial)) assert act_layer is None or isinstance(act_layer, (type, str, types.FunctionType, functools.partial))
norm_act_args = norm_kwargs.copy() if norm_kwargs else {} norm_act_kwargs = {}
# unbind partial fn, so args can be rebound later
if isinstance(norm_layer, functools.partial):
norm_act_kwargs.update(norm_layer.keywords)
norm_layer = norm_layer.func
if isinstance(norm_layer, str): if isinstance(norm_layer, str):
norm_act_layer = get_norm_act_layer(norm_layer) norm_act_layer = get_norm_act_layer(norm_layer)
elif norm_layer in _NORM_ACT_TYPES: elif norm_layer in _NORM_ACT_TYPES:
norm_act_layer = norm_layer norm_act_layer = norm_layer
elif isinstance(norm_layer, (types.FunctionType, functools.partial)): elif isinstance(norm_layer, types.FunctionType):
# assuming this is a lambda/fn/bound partial that creates norm_act layer # if function type, must be a lambda/fn that creates a norm_act layer
norm_act_layer = norm_layer norm_act_layer = norm_layer
else: else:
type_name = norm_layer.__name__.lower() type_name = norm_layer.__name__.lower()
@ -66,9 +73,11 @@ def convert_norm_act_type(norm_layer, act_layer, norm_kwargs=None):
norm_act_layer = GroupNormAct norm_act_layer = GroupNormAct
else: else:
assert False, f"No equivalent norm_act layer for {type_name}" assert False, f"No equivalent norm_act layer for {type_name}"
if norm_act_layer in _NORM_ACT_REQUIRES_ARG: if norm_act_layer in _NORM_ACT_REQUIRES_ARG:
# Must pass `act_layer` through for backwards compat where `act_layer=None` implies no activation. # pass `act_layer` through for backwards compat where `act_layer=None` implies no activation.
# In the future, may force use of `apply_act` with `act_layer` arg bound to relevant NormAct types # In the future, may force use of `apply_act` with `act_layer` arg bound to relevant NormAct types
# It is intended that functions/partial does not trigger this, they should define act. norm_act_kwargs.setdefault('act_layer', act_layer)
norm_act_args.update(dict(act_layer=act_layer)) if norm_act_kwargs:
return norm_act_layer, norm_act_args norm_act_layer = functools.partial(norm_act_layer, **norm_act_kwargs) # bind/rebind args
return norm_act_layer

@ -34,7 +34,7 @@ class MixedConv2d(nn.ModuleDict):
self.in_channels = sum(in_splits) self.in_channels = sum(in_splits)
self.out_channels = sum(out_splits) self.out_channels = sum(out_splits)
for idx, (k, in_ch, out_ch) in enumerate(zip(kernel_size, in_splits, out_splits)): for idx, (k, in_ch, out_ch) in enumerate(zip(kernel_size, in_splits, out_splits)):
conv_groups = out_ch if depthwise else 1 conv_groups = in_ch if depthwise else 1
# use add_module to keep key space clean # use add_module to keep key space clean
self.add_module( self.add_module(
str(idx), str(idx),

@ -24,7 +24,7 @@ class BatchNormAct2d(nn.BatchNorm2d):
act_args = dict(inplace=True) if inplace else {} act_args = dict(inplace=True) if inplace else {}
self.act = act_layer(**act_args) self.act = act_layer(**act_args)
else: else:
self.act = None self.act = nn.Identity()
def _forward_jit(self, x): def _forward_jit(self, x):
""" A cut & paste of the contents of the PyTorch BatchNorm2d forward function """ A cut & paste of the contents of the PyTorch BatchNorm2d forward function
@ -62,8 +62,7 @@ class BatchNormAct2d(nn.BatchNorm2d):
x = self._forward_jit(x) x = self._forward_jit(x)
else: else:
x = self._forward_python(x) x = self._forward_python(x)
if self.act is not None: x = self.act(x)
x = self.act(x)
return x return x
@ -75,12 +74,12 @@ class GroupNormAct(nn.GroupNorm):
if isinstance(act_layer, str): if isinstance(act_layer, str):
act_layer = get_act_layer(act_layer) act_layer = get_act_layer(act_layer)
if act_layer is not None and apply_act: if act_layer is not None and apply_act:
self.act = act_layer(inplace=inplace) act_args = dict(inplace=True) if inplace else {}
self.act = act_layer(**act_args)
else: else:
self.act = None self.act = nn.Identity()
def forward(self, x): def forward(self, x):
x = F.group_norm(x, self.num_groups, self.weight, self.bias, self.eps) x = F.group_norm(x, self.num_groups, self.weight, self.bias, self.eps)
if self.act is not None: x = self.act(x)
x = self.act(x)
return x return x

@ -8,17 +8,16 @@ Hacked together by / Copyright 2020 Ross Wightman
from torch import nn as nn from torch import nn as nn
from .create_conv2d import create_conv2d from .create_conv2d import create_conv2d
from .create_norm_act import convert_norm_act_type from .create_norm_act import convert_norm_act
class SeparableConvBnAct(nn.Module): class SeparableConvBnAct(nn.Module):
""" Separable Conv w/ trailing Norm and Activation """ Separable Conv w/ trailing Norm and Activation
""" """
def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, dilation=1, padding='', bias=False, def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, dilation=1, padding='', bias=False,
channel_multiplier=1.0, pw_kernel_size=1, norm_layer=nn.BatchNorm2d, norm_kwargs=None, channel_multiplier=1.0, pw_kernel_size=1, norm_layer=nn.BatchNorm2d, act_layer=nn.ReLU,
act_layer=nn.ReLU, apply_act=True, drop_block=None): apply_act=True, drop_block=None):
super(SeparableConvBnAct, self).__init__() super(SeparableConvBnAct, self).__init__()
norm_kwargs = norm_kwargs or {}
self.conv_dw = create_conv2d( self.conv_dw = create_conv2d(
in_channels, int(in_channels * channel_multiplier), kernel_size, in_channels, int(in_channels * channel_multiplier), kernel_size,
@ -27,8 +26,8 @@ class SeparableConvBnAct(nn.Module):
self.conv_pw = create_conv2d( self.conv_pw = create_conv2d(
int(in_channels * channel_multiplier), out_channels, pw_kernel_size, padding=padding, bias=bias) int(in_channels * channel_multiplier), out_channels, pw_kernel_size, padding=padding, bias=bias)
norm_act_layer, norm_act_args = convert_norm_act_type(norm_layer, act_layer, norm_kwargs) norm_act_layer = convert_norm_act(norm_layer, act_layer)
self.bn = norm_act_layer(out_channels, apply_act=apply_act, drop_block=drop_block, **norm_act_args) self.bn = norm_act_layer(out_channels, apply_act=apply_act, drop_block=drop_block)
@property @property
def in_channels(self): def in_channels(self):

@ -1,6 +1,9 @@
""" NasNet-A (Large)
nasnetalarge implementation grabbed from Cadene's pretrained models
https://github.com/Cadene/pretrained-models.pytorch
""" """
from functools import partial
"""
import torch import torch
import torch.nn as nn import torch.nn as nn
import torch.nn.functional as F import torch.nn.functional as F
@ -20,9 +23,10 @@ default_cfgs = {
'interpolation': 'bicubic', 'interpolation': 'bicubic',
'mean': (0.5, 0.5, 0.5), 'mean': (0.5, 0.5, 0.5),
'std': (0.5, 0.5, 0.5), 'std': (0.5, 0.5, 0.5),
'num_classes': 1001, 'num_classes': 1000,
'first_conv': 'conv0.conv', 'first_conv': 'conv0.conv',
'classifier': 'last_linear', 'classifier': 'last_linear',
'label_offset': 1, # 1001 classes in pretrained weights
}, },
} }
@ -418,7 +422,7 @@ class NASNetALarge(nn.Module):
self.conv0 = ConvBnAct( self.conv0 = ConvBnAct(
in_channels=in_chans, out_channels=self.stem_size, kernel_size=3, padding=0, stride=2, in_channels=in_chans, out_channels=self.stem_size, kernel_size=3, padding=0, stride=2,
norm_kwargs=dict(eps=0.001, momentum=0.1), act_layer=None) norm_layer=partial(nn.BatchNorm2d, eps=0.001, momentum=0.1), apply_act=False)
self.cell_stem_0 = CellStem0( self.cell_stem_0 = CellStem0(
self.stem_size, num_channels=channels // (channel_multiplier ** 2), pad_type=pad_type) self.stem_size, num_channels=channels // (channel_multiplier ** 2), pad_type=pad_type)

@ -395,8 +395,11 @@ def _create_normfreenet(variant, pretrained=False, **kwargs):
feature_cfg['out_indices'] = (1, 2, 3, 4) # no stride 2, 0 level feat for stride 4 maxpool stems in ResNet feature_cfg['out_indices'] = (1, 2, 3, 4) # no stride 2, 0 level feat for stride 4 maxpool stems in ResNet
return build_model_with_cfg( return build_model_with_cfg(
NormalizerFreeNet, variant, pretrained, model_cfg=model_cfg, default_cfg=default_cfgs[variant], NormalizerFreeNet, variant, pretrained,
feature_cfg=feature_cfg, **kwargs) default_cfg=default_cfgs[variant],
model_cfg=model_cfg,
feature_cfg=feature_cfg,
**kwargs)
@register_model @register_model

@ -6,6 +6,7 @@
""" """
from collections import OrderedDict from collections import OrderedDict
from functools import partial
import torch import torch
import torch.nn as nn import torch.nn as nn
@ -26,9 +27,10 @@ default_cfgs = {
'interpolation': 'bicubic', 'interpolation': 'bicubic',
'mean': (0.5, 0.5, 0.5), 'mean': (0.5, 0.5, 0.5),
'std': (0.5, 0.5, 0.5), 'std': (0.5, 0.5, 0.5),
'num_classes': 1001, 'num_classes': 1000,
'first_conv': 'conv_0.conv', 'first_conv': 'conv_0.conv',
'classifier': 'last_linear', 'classifier': 'last_linear',
'label_offset': 1, # 1001 classes in pretrained weights
}, },
} }
@ -234,7 +236,7 @@ class Cell(CellBase):
class PNASNet5Large(nn.Module): class PNASNet5Large(nn.Module):
def __init__(self, num_classes=1001, in_chans=3, output_stride=32, drop_rate=0., global_pool='avg', pad_type=''): def __init__(self, num_classes=1000, in_chans=3, output_stride=32, drop_rate=0., global_pool='avg', pad_type=''):
super(PNASNet5Large, self).__init__() super(PNASNet5Large, self).__init__()
self.num_classes = num_classes self.num_classes = num_classes
self.drop_rate = drop_rate self.drop_rate = drop_rate
@ -243,7 +245,7 @@ class PNASNet5Large(nn.Module):
self.conv_0 = ConvBnAct( self.conv_0 = ConvBnAct(
in_chans, 96, kernel_size=3, stride=2, padding=0, in_chans, 96, kernel_size=3, stride=2, padding=0,
norm_kwargs=dict(eps=0.001, momentum=0.1), act_layer=None) norm_layer=partial(nn.BatchNorm2d, eps=0.001, momentum=0.1), apply_act=False)
self.cell_stem_0 = CellStem0( self.cell_stem_0 = CellStem0(
in_chs_left=96, out_chs_left=54, in_chs_right=96, out_chs_right=54, pad_type=pad_type) in_chs_left=96, out_chs_left=54, in_chs_right=96, out_chs_right=54, pad_type=pad_type)

@ -0,0 +1,261 @@
"""VGG
Adapted from https://github.com/pytorch/vision 'vgg.py' (BSD-3-Clause) with a few changes for
timm functionality.
Copyright 2021 Ross Wightman
"""
import torch
import torch.nn as nn
import torch.nn.functional as F
from typing import Union, List, Dict, Any, cast
from timm.data import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD
from .helpers import build_model_with_cfg
from .layers import ClassifierHead, ConvBnAct
from .registry import register_model
__all__ = [
'VGG', 'vgg11', 'vgg11_bn', 'vgg13', 'vgg13_bn', 'vgg16', 'vgg16_bn',
'vgg19_bn', 'vgg19',
]
def _cfg(url='', **kwargs):
return {
'url': url,
'num_classes': 1000, 'input_size': (3, 224, 224), 'pool_size': (1, 1),
'crop_pct': 0.875, 'interpolation': 'bilinear',
'mean': IMAGENET_DEFAULT_MEAN, 'std': IMAGENET_DEFAULT_STD,
'first_conv': 'features.0', 'classifier': 'head.fc',
**kwargs
}
default_cfgs = {
'vgg11': _cfg(url='https://download.pytorch.org/models/vgg11-bbd30ac9.pth'),
'vgg13': _cfg(url='https://download.pytorch.org/models/vgg13-c768596a.pth'),
'vgg16': _cfg(url='https://download.pytorch.org/models/vgg16-397923af.pth'),
'vgg19': _cfg(url='https://download.pytorch.org/models/vgg19-dcbb9e9d.pth'),
'vgg11_bn': _cfg(url='https://download.pytorch.org/models/vgg11_bn-6002323d.pth'),
'vgg13_bn': _cfg(url='https://download.pytorch.org/models/vgg13_bn-abd245e5.pth'),
'vgg16_bn': _cfg(url='https://download.pytorch.org/models/vgg16_bn-6c64b313.pth'),
'vgg19_bn': _cfg(url='https://download.pytorch.org/models/vgg19_bn-c79401a0.pth'),
}
cfgs: Dict[str, List[Union[str, int]]] = {
'vgg11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
'vgg13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
'vgg16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
'vgg19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
}
class ConvMlp(nn.Module):
def __init__(self, in_features=512, out_features=4096, kernel_size=7, mlp_ratio=1.0,
drop_rate: float = 0.2, act_layer: nn.Module = None, conv_layer: nn.Module = None):
super(ConvMlp, self).__init__()
self.input_kernel_size = kernel_size
mid_features = int(out_features * mlp_ratio)
self.fc1 = conv_layer(in_features, mid_features, kernel_size, bias=True)
self.act1 = act_layer(True)
self.drop = nn.Dropout(drop_rate)
self.fc2 = conv_layer(mid_features, out_features, 1, bias=True)
self.act2 = act_layer(True)
def forward(self, x):
if x.shape[-2] < self.input_kernel_size or x.shape[-1] < self.input_kernel_size:
# keep the input size >= 7x7
output_size = (max(self.input_kernel_size, x.shape[-2]), max(self.input_kernel_size, x.shape[-1]))
x = F.adaptive_avg_pool2d(x, output_size)
x = self.fc1(x)
x = self.act1(x)
x = self.drop(x)
x = self.fc2(x)
x = self.act2(x)
return x
class VGG(nn.Module):
def __init__(
self,
cfg: List[Any],
num_classes: int = 1000,
in_chans: int = 3,
output_stride: int = 32,
mlp_ratio: float = 1.0,
act_layer: nn.Module = nn.ReLU,
conv_layer: nn.Module = nn.Conv2d,
norm_layer: nn.Module = None,
global_pool: str = 'avg',
drop_rate: float = 0.,
) -> None:
super(VGG, self).__init__()
assert output_stride == 32
self.num_classes = num_classes
self.num_features = 4096
self.drop_rate = drop_rate
self.feature_info = []
prev_chs = in_chans
net_stride = 1
pool_layer = nn.MaxPool2d
layers: List[nn.Module] = []
for v in cfg:
last_idx = len(layers) - 1
if v == 'M':
self.feature_info.append(dict(num_chs=prev_chs, reduction=net_stride, module=f'features.{last_idx}'))
layers += [pool_layer(kernel_size=2, stride=2)]
net_stride *= 2
else:
v = cast(int, v)
conv2d = conv_layer(prev_chs, v, kernel_size=3, padding=1)
if norm_layer is not None:
layers += [conv2d, norm_layer(v), act_layer(inplace=True)]
else:
layers += [conv2d, act_layer(inplace=True)]
prev_chs = v
self.features = nn.Sequential(*layers)
self.feature_info.append(dict(num_chs=prev_chs, reduction=net_stride, module=f'features.{len(layers) - 1}'))
self.pre_logits = ConvMlp(
prev_chs, self.num_features, 7, mlp_ratio=mlp_ratio,
drop_rate=drop_rate, act_layer=act_layer, conv_layer=conv_layer)
self.head = ClassifierHead(
self.num_features, num_classes, pool_type=global_pool, drop_rate=drop_rate)
self._initialize_weights()
def get_classifier(self):
return self.head.fc
def reset_classifier(self, num_classes, global_pool='avg'):
self.num_classes = num_classes
self.head = ClassifierHead(
self.num_features, self.num_classes, pool_type=global_pool, drop_rate=self.drop_rate)
def forward_features(self, x: torch.Tensor) -> torch.Tensor:
x = self.features(x)
x = self.pre_logits(x)
return x
def forward(self, x: torch.Tensor) -> torch.Tensor:
x = self.forward_features(x)
x = self.head(x)
return x
def _initialize_weights(self) -> None:
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01)
nn.init.constant_(m.bias, 0)
def _filter_fn(state_dict):
""" convert patch embedding weight from manual patchify + linear proj to conv"""
out_dict = {}
for k, v in state_dict.items():
k_r = k
k_r = k_r.replace('classifier.0', 'pre_logits.fc1')
k_r = k_r.replace('classifier.3', 'pre_logits.fc2')
k_r = k_r.replace('classifier.6', 'head.fc')
if 'classifier.0.weight' in k:
v = v.reshape(-1, 512, 7, 7)
if 'classifier.3.weight' in k:
v = v.reshape(-1, 4096, 1, 1)
out_dict[k_r] = v
return out_dict
def _create_vgg(variant: str, pretrained: bool, **kwargs: Any) -> VGG:
cfg = variant.split('_')[0]
# NOTE: VGG is one of the only models with stride==1 features, so indices are offset from other models
out_indices = kwargs.get('out_indices', (0, 1, 2, 3, 4, 5))
model = build_model_with_cfg(
VGG, variant, pretrained=pretrained,
model_cfg=cfgs[cfg],
default_cfg=default_cfgs[variant],
feature_cfg=dict(flatten_sequential=True, out_indices=out_indices),
pretrained_filter_fn=_filter_fn,
**kwargs)
return model
@register_model
def vgg11(pretrained: bool = False, **kwargs: Any) -> VGG:
r"""VGG 11-layer model (configuration "A") from
`"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`._
"""
model_args = dict(**kwargs)
return _create_vgg('vgg11', pretrained=pretrained, **model_args)
@register_model
def vgg11_bn(pretrained: bool = False, **kwargs: Any) -> VGG:
r"""VGG 11-layer model (configuration "A") with batch normalization
`"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`._
"""
model_args = dict(norm_layer=nn.BatchNorm2d, **kwargs)
return _create_vgg('vgg11_bn', pretrained=pretrained, **model_args)
@register_model
def vgg13(pretrained: bool = False, **kwargs: Any) -> VGG:
r"""VGG 13-layer model (configuration "B")
`"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`._
"""
model_args = dict(**kwargs)
return _create_vgg('vgg13', pretrained=pretrained, **model_args)
@register_model
def vgg13_bn(pretrained: bool = False, **kwargs: Any) -> VGG:
r"""VGG 13-layer model (configuration "B") with batch normalization
`"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`._
"""
model_args = dict(norm_layer=nn.BatchNorm2d, **kwargs)
return _create_vgg('vgg13_bn', pretrained=pretrained, **model_args)
@register_model
def vgg16(pretrained: bool = False, **kwargs: Any) -> VGG:
r"""VGG 16-layer model (configuration "D")
`"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`._
"""
model_args = dict(**kwargs)
return _create_vgg('vgg16', pretrained=pretrained, **model_args)
@register_model
def vgg16_bn(pretrained: bool = False, **kwargs: Any) -> VGG:
r"""VGG 16-layer model (configuration "D") with batch normalization
`"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`._
"""
model_args = dict(norm_layer=nn.BatchNorm2d, **kwargs)
return _create_vgg('vgg16_bn', pretrained=pretrained, **model_args)
@register_model
def vgg19(pretrained: bool = False, **kwargs: Any) -> VGG:
r"""VGG 19-layer model (configuration "E")
`"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`._
"""
model_args = dict(**kwargs)
return _create_vgg('vgg19', pretrained=pretrained, **model_args)
@register_model
def vgg19_bn(pretrained: bool = False, **kwargs: Any) -> VGG:
r"""VGG 19-layer model (configuration 'E') with batch normalization
`"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`._
"""
model_args = dict(norm_layer=nn.BatchNorm2d, **kwargs)
return _create_vgg('vgg19_bn', pretrained=pretrained, **model_args)

@ -5,7 +5,7 @@ https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zo
Hacked together by / Copyright 2020 Ross Wightman Hacked together by / Copyright 2020 Ross Wightman
""" """
from collections import OrderedDict from functools import partial
import torch.nn as nn import torch.nn as nn
import torch.nn.functional as F import torch.nn.functional as F
@ -43,9 +43,8 @@ default_cfgs = dict(
class SeparableConv2d(nn.Module): class SeparableConv2d(nn.Module):
def __init__( def __init__(
self, inplanes, planes, kernel_size=3, stride=1, dilation=1, padding='', self, inplanes, planes, kernel_size=3, stride=1, dilation=1, padding='',
act_layer=nn.ReLU, norm_layer=nn.BatchNorm2d, norm_kwargs=None): act_layer=nn.ReLU, norm_layer=nn.BatchNorm2d):
super(SeparableConv2d, self).__init__() super(SeparableConv2d, self).__init__()
norm_kwargs = norm_kwargs if norm_kwargs is not None else {}
self.kernel_size = kernel_size self.kernel_size = kernel_size
self.dilation = dilation self.dilation = dilation
@ -53,7 +52,7 @@ class SeparableConv2d(nn.Module):
self.conv_dw = create_conv2d( self.conv_dw = create_conv2d(
inplanes, inplanes, kernel_size, stride=stride, inplanes, inplanes, kernel_size, stride=stride,
padding=padding, dilation=dilation, depthwise=True) padding=padding, dilation=dilation, depthwise=True)
self.bn_dw = norm_layer(inplanes, **norm_kwargs) self.bn_dw = norm_layer(inplanes)
if act_layer is not None: if act_layer is not None:
self.act_dw = act_layer(inplace=True) self.act_dw = act_layer(inplace=True)
else: else:
@ -61,7 +60,7 @@ class SeparableConv2d(nn.Module):
# pointwise convolution # pointwise convolution
self.conv_pw = create_conv2d(inplanes, planes, kernel_size=1) self.conv_pw = create_conv2d(inplanes, planes, kernel_size=1)
self.bn_pw = norm_layer(planes, **norm_kwargs) self.bn_pw = norm_layer(planes)
if act_layer is not None: if act_layer is not None:
self.act_pw = act_layer(inplace=True) self.act_pw = act_layer(inplace=True)
else: else:
@ -82,17 +81,15 @@ class SeparableConv2d(nn.Module):
class XceptionModule(nn.Module): class XceptionModule(nn.Module):
def __init__( def __init__(
self, in_chs, out_chs, stride=1, dilation=1, pad_type='', self, in_chs, out_chs, stride=1, dilation=1, pad_type='',
start_with_relu=True, no_skip=False, act_layer=nn.ReLU, norm_layer=None, norm_kwargs=None): start_with_relu=True, no_skip=False, act_layer=nn.ReLU, norm_layer=None):
super(XceptionModule, self).__init__() super(XceptionModule, self).__init__()
norm_kwargs = norm_kwargs if norm_kwargs is not None else {}
out_chs = to_3tuple(out_chs) out_chs = to_3tuple(out_chs)
self.in_channels = in_chs self.in_channels = in_chs
self.out_channels = out_chs[-1] self.out_channels = out_chs[-1]
self.no_skip = no_skip self.no_skip = no_skip
if not no_skip and (self.out_channels != self.in_channels or stride != 1): if not no_skip and (self.out_channels != self.in_channels or stride != 1):
self.shortcut = ConvBnAct( self.shortcut = ConvBnAct(
in_chs, self.out_channels, 1, stride=stride, in_chs, self.out_channels, 1, stride=stride, norm_layer=norm_layer, act_layer=None)
norm_layer=norm_layer, norm_kwargs=norm_kwargs, act_layer=None)
else: else:
self.shortcut = None self.shortcut = None
@ -103,7 +100,7 @@ class XceptionModule(nn.Module):
self.stack.add_module(f'act{i + 1}', nn.ReLU(inplace=i > 0)) self.stack.add_module(f'act{i + 1}', nn.ReLU(inplace=i > 0))
self.stack.add_module(f'conv{i + 1}', SeparableConv2d( self.stack.add_module(f'conv{i + 1}', SeparableConv2d(
in_chs, out_chs[i], 3, stride=stride if i == 2 else 1, dilation=dilation, padding=pad_type, in_chs, out_chs[i], 3, stride=stride if i == 2 else 1, dilation=dilation, padding=pad_type,
act_layer=separable_act_layer, norm_layer=norm_layer, norm_kwargs=norm_kwargs)) act_layer=separable_act_layer, norm_layer=norm_layer))
in_chs = out_chs[i] in_chs = out_chs[i]
def forward(self, x): def forward(self, x):
@ -121,14 +118,13 @@ class XceptionAligned(nn.Module):
""" """
def __init__(self, block_cfg, num_classes=1000, in_chans=3, output_stride=32, def __init__(self, block_cfg, num_classes=1000, in_chans=3, output_stride=32,
act_layer=nn.ReLU, norm_layer=nn.BatchNorm2d, norm_kwargs=None, drop_rate=0., global_pool='avg'): act_layer=nn.ReLU, norm_layer=nn.BatchNorm2d, drop_rate=0., global_pool='avg'):
super(XceptionAligned, self).__init__() super(XceptionAligned, self).__init__()
self.num_classes = num_classes self.num_classes = num_classes
self.drop_rate = drop_rate self.drop_rate = drop_rate
assert output_stride in (8, 16, 32) assert output_stride in (8, 16, 32)
norm_kwargs = norm_kwargs if norm_kwargs is not None else {}
layer_args = dict(act_layer=act_layer, norm_layer=norm_layer, norm_kwargs=norm_kwargs) layer_args = dict(act_layer=act_layer, norm_layer=norm_layer)
self.stem = nn.Sequential(*[ self.stem = nn.Sequential(*[
ConvBnAct(in_chans, 32, kernel_size=3, stride=2, **layer_args), ConvBnAct(in_chans, 32, kernel_size=3, stride=2, **layer_args),
ConvBnAct(32, 64, kernel_size=3, stride=1, **layer_args) ConvBnAct(32, 64, kernel_size=3, stride=1, **layer_args)
@ -196,7 +192,7 @@ def xception41(pretrained=False, **kwargs):
dict(in_chs=728, out_chs=(728, 1024, 1024), stride=2), dict(in_chs=728, out_chs=(728, 1024, 1024), stride=2),
dict(in_chs=1024, out_chs=(1536, 1536, 2048), stride=1, no_skip=True, start_with_relu=False), dict(in_chs=1024, out_chs=(1536, 1536, 2048), stride=1, no_skip=True, start_with_relu=False),
] ]
model_args = dict(block_cfg=block_cfg, norm_kwargs=dict(eps=.001, momentum=.1), **kwargs) model_args = dict(block_cfg=block_cfg, norm_layer=partial(nn.BatchNorm2d, eps=.001, momentum=.1), **kwargs)
return _xception('xception41', pretrained=pretrained, **model_args) return _xception('xception41', pretrained=pretrained, **model_args)
@ -215,7 +211,7 @@ def xception65(pretrained=False, **kwargs):
dict(in_chs=728, out_chs=(728, 1024, 1024), stride=2), dict(in_chs=728, out_chs=(728, 1024, 1024), stride=2),
dict(in_chs=1024, out_chs=(1536, 1536, 2048), stride=1, no_skip=True, start_with_relu=False), dict(in_chs=1024, out_chs=(1536, 1536, 2048), stride=1, no_skip=True, start_with_relu=False),
] ]
model_args = dict(block_cfg=block_cfg, norm_kwargs=dict(eps=.001, momentum=.1), **kwargs) model_args = dict(block_cfg=block_cfg, norm_layer=partial(nn.BatchNorm2d, eps=.001, momentum=.1), **kwargs)
return _xception('xception65', pretrained=pretrained, **model_args) return _xception('xception65', pretrained=pretrained, **model_args)
@ -236,5 +232,5 @@ def xception71(pretrained=False, **kwargs):
dict(in_chs=728, out_chs=(728, 1024, 1024), stride=2), dict(in_chs=728, out_chs=(728, 1024, 1024), stride=2),
dict(in_chs=1024, out_chs=(1536, 1536, 2048), stride=1, no_skip=True, start_with_relu=False), dict(in_chs=1024, out_chs=(1536, 1536, 2048), stride=1, no_skip=True, start_with_relu=False),
] ]
model_args = dict(block_cfg=block_cfg, norm_kwargs=dict(eps=.001, momentum=.1), **kwargs) model_args = dict(block_cfg=block_cfg, norm_layer=partial(nn.BatchNorm2d, eps=.001, momentum=.1), **kwargs)
return _xception('xception71', pretrained=pretrained, **model_args) return _xception('xception71', pretrained=pretrained, **model_args)

@ -1 +1 @@
__version__ = '0.4.2' __version__ = '0.4.3'

@ -310,11 +310,11 @@ def main():
# resolve AMP arguments based on PyTorch / Apex availability # resolve AMP arguments based on PyTorch / Apex availability
use_amp = None use_amp = None
if args.amp: if args.amp:
# for backwards compat, `--amp` arg tries apex before native amp # `--amp` chooses native amp before apex (APEX ver not actively maintained)
if has_apex: if has_native_amp:
args.apex_amp = True
elif has_native_amp:
args.native_amp = True args.native_amp = True
elif has_apex:
args.apex_amp = True
if args.apex_amp and has_apex: if args.apex_amp and has_apex:
use_amp = 'apex' use_amp = 'apex'
elif args.native_amp and has_native_amp: elif args.native_amp and has_native_amp:

@ -116,15 +116,20 @@ def validate(args):
args.prefetcher = not args.no_prefetcher args.prefetcher = not args.no_prefetcher
amp_autocast = suppress # do nothing amp_autocast = suppress # do nothing
if args.amp: if args.amp:
if has_apex: if has_native_amp:
args.apex_amp = True
elif has_native_amp:
args.native_amp = True args.native_amp = True
elif has_apex:
args.apex_amp = True
else: else:
_logger.warning("Neither APEX or Native Torch AMP is available, using FP32.") _logger.warning("Neither APEX or Native Torch AMP is available.")
assert not args.apex_amp or not args.native_amp, "Only one AMP mode should be set." assert not args.apex_amp or not args.native_amp, "Only one AMP mode should be set."
if args.native_amp: if args.native_amp:
amp_autocast = torch.cuda.amp.autocast amp_autocast = torch.cuda.amp.autocast
_logger.info('Validating in mixed precision with native PyTorch AMP.')
elif args.apex_amp:
_logger.info('Validating in mixed precision with NVIDIA APEX AMP.')
else:
_logger.info('Validating in float32. AMP not enabled.')
if args.legacy_jit: if args.legacy_jit:
set_jit_legacy() set_jit_legacy()
@ -284,7 +289,7 @@ def main():
if args.model == 'all': if args.model == 'all':
# validate all models in a list of names with pretrained checkpoints # validate all models in a list of names with pretrained checkpoints
args.pretrained = True args.pretrained = True
model_names = list_models(pretrained=True) model_names = list_models(pretrained=True, exclude_filters=['*in21k'])
model_cfgs = [(n, '') for n in model_names] model_cfgs = [(n, '') for n in model_names]
elif not is_model(args.model): elif not is_model(args.model):
# model name doesn't exist, try as wildcard filter # model name doesn't exist, try as wildcard filter

Loading…
Cancel
Save