* All models updated with revised foward_features / forward_head interface
* Vision transformer and MLP based models consistently output sequence from forward_features (pooling or token selection considered part of 'head')
* WIP param grouping interface to allow consistent grouping of parameters for layer-wise decay across all model types
* Add gradient checkpointing support to a significant % of models, especially popular architectures
* Formatting and interface consistency improvements across models
* layer-wise LR decay impl part of optimizer factory w/ scale support in scheduler
* Poolformer and Volo architectures added
* improve consistency of model creation helper fns
* add comments to some of the model helpers
* support passing external default_cfgs so they can be sourced from hub
* remove redundant GluonResNet model/blocks and use the code in ResNet for Gluon weights
* change SEModules back to using AdaptiveAvgPool instead of mean, PyTorch issue long fixed
* host some of Cadene's weights on github instead of .fr for speed
* add my old port of ensemble adversarial inception resnet v2
* switch to my TF port of normal inception res v2 and change FC layer back to 'classif' for compat with ens_adv