diff --git a/README.md b/README.md index 704bc32c..76261ccc 100644 --- a/README.md +++ b/README.md @@ -23,6 +23,25 @@ I'm fortunate to be able to dedicate significant time and money of my own suppor ## What's New +### June 20, 2021 +* Release Vision Transformer 'AugReg' weights from [How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers](https://arxiv.org/abs/2106.10270) + * .npz weight loading support added, can load any of the 50K+ weights from the [AugReg series](https://console.cloud.google.com/storage/browser/vit_models/augreg) + * See [example notebook](https://colab.research.google.com/github/google-research/vision_transformer/blob/master/vit_jax_augreg.ipynb) from official impl for navigating the augreg weights + * Replaced all default weights w/ best AugReg variant (if possible). All AugReg 21k classifiers work. + * Highlights: `vit_large_patch16_384` (87.1 top-1), `vit_large_r50_s32_384` (86.2 top-1), `vit_base_patch16_384` (86.0 top-1) + * `vit_deit_*` renamed to just `deit_*` + * Remove my old small model, replace with DeiT compatible small w/ AugReg weights +* Add 1st training of my `gmixer_24_224` MLP /w GLU, 78.1 top-1 w/ 25M params. +* Add weights from official ResMLP release (https://github.com/facebookresearch/deit) +* Add `eca_nfnet_l2` weights from my 'lightweight' series. 84.7 top-1 at 384x384. +* Add distilled BiT 50x1 student and 152x2 Teacher weights from [Knowledge distillation: A good teacher is patient and consistent](https://arxiv.org/abs/2106.05237) +* NFNets and ResNetV2-BiT models work w/ Pytorch XLA now + * weight standardization uses F.batch_norm instead of std_mean (std_mean wasn't lowered) + * eps values adjusted, will be slight differences but should be quite close +* Improve test coverage and classifier interface of non-conv (vision transformer and mlp) models +* Cleanup a few classifier / flatten details for models w/ conv classifiers or early global pool +* Please report any regressions, this PR touched quite a few models. + ### June 8, 2021 * Add first ResMLP weights, trained in PyTorch XLA on TPU-VM w/ my XLA branch. 24 block variant, 79.2 top-1. * Add ResNet51-Q model w/ pretrained weights at 82.36 top-1. diff --git a/timm/models/vision_transformer.py b/timm/models/vision_transformer.py index b8fc6fa5..89fba7de 100644 --- a/timm/models/vision_transformer.py +++ b/timm/models/vision_transformer.py @@ -1,7 +1,12 @@ """ Vision Transformer (ViT) in PyTorch -A PyTorch implement of Vision Transformers as described in -'An Image Is Worth 16 x 16 Words: Transformers for Image Recognition at Scale' - https://arxiv.org/abs/2010.11929 +A PyTorch implement of Vision Transformers as described in: + +'An Image Is Worth 16 x 16 Words: Transformers for Image Recognition at Scale' + - https://arxiv.org/abs/2010.11929 + +`How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers` + - https://arxiv.org/abs/2106.TODO The official jax code is released and available at https://github.com/google-research/vision_transformer @@ -15,7 +20,7 @@ for some einops/einsum fun * Simple transformer style inspired by Andrej Karpathy's https://github.com/karpathy/minGPT * Bert reference code checks against Huggingface Transformers and Tensorflow Bert -Hacked together by / Copyright 2020 Ross Wightman +Hacked together by / Copyright 2021 Ross Wightman """ import math import logging diff --git a/timm/models/vision_transformer_hybrid.py b/timm/models/vision_transformer_hybrid.py index 5d725c58..d5f0a537 100644 --- a/timm/models/vision_transformer_hybrid.py +++ b/timm/models/vision_transformer_hybrid.py @@ -1,13 +1,17 @@ """ Hybrid Vision Transformer (ViT) in PyTorch -A PyTorch implement of the Hybrid Vision Transformers as described in +A PyTorch implement of the Hybrid Vision Transformers as described in: + 'An Image Is Worth 16 x 16 Words: Transformers for Image Recognition at Scale' - https://arxiv.org/abs/2010.11929 -NOTE This relies on code in vision_transformer.py. The hybrid model definitions were moved here to -keep file sizes sane. +`How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers` + - https://arxiv.org/abs/2106.TODO + +NOTE These hybrid model definitions depend on code in vision_transformer.py. +They were moved here to keep file sizes sane. -Hacked together by / Copyright 2020 Ross Wightman +Hacked together by / Copyright 2021 Ross Wightman """ from copy import deepcopy from functools import partial