AugReg release

cleanup_xla_model_fixes
Ross Wightman 4 years ago
parent 381b279785
commit 9c9755a808

@ -23,6 +23,25 @@ I'm fortunate to be able to dedicate significant time and money of my own suppor
## What's New ## What's New
### June 20, 2021
* Release Vision Transformer 'AugReg' weights from [How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers](https://arxiv.org/abs/2106.10270)
* .npz weight loading support added, can load any of the 50K+ weights from the [AugReg series](https://console.cloud.google.com/storage/browser/vit_models/augreg)
* See [example notebook](https://colab.research.google.com/github/google-research/vision_transformer/blob/master/vit_jax_augreg.ipynb) from official impl for navigating the augreg weights
* Replaced all default weights w/ best AugReg variant (if possible). All AugReg 21k classifiers work.
* Highlights: `vit_large_patch16_384` (87.1 top-1), `vit_large_r50_s32_384` (86.2 top-1), `vit_base_patch16_384` (86.0 top-1)
* `vit_deit_*` renamed to just `deit_*`
* Remove my old small model, replace with DeiT compatible small w/ AugReg weights
* Add 1st training of my `gmixer_24_224` MLP /w GLU, 78.1 top-1 w/ 25M params.
* Add weights from official ResMLP release (https://github.com/facebookresearch/deit)
* Add `eca_nfnet_l2` weights from my 'lightweight' series. 84.7 top-1 at 384x384.
* Add distilled BiT 50x1 student and 152x2 Teacher weights from [Knowledge distillation: A good teacher is patient and consistent](https://arxiv.org/abs/2106.05237)
* NFNets and ResNetV2-BiT models work w/ Pytorch XLA now
* weight standardization uses F.batch_norm instead of std_mean (std_mean wasn't lowered)
* eps values adjusted, will be slight differences but should be quite close
* Improve test coverage and classifier interface of non-conv (vision transformer and mlp) models
* Cleanup a few classifier / flatten details for models w/ conv classifiers or early global pool
* Please report any regressions, this PR touched quite a few models.
### June 8, 2021 ### June 8, 2021
* Add first ResMLP weights, trained in PyTorch XLA on TPU-VM w/ my XLA branch. 24 block variant, 79.2 top-1. * Add first ResMLP weights, trained in PyTorch XLA on TPU-VM w/ my XLA branch. 24 block variant, 79.2 top-1.
* Add ResNet51-Q model w/ pretrained weights at 82.36 top-1. * Add ResNet51-Q model w/ pretrained weights at 82.36 top-1.

@ -1,7 +1,12 @@
""" Vision Transformer (ViT) in PyTorch """ Vision Transformer (ViT) in PyTorch
A PyTorch implement of Vision Transformers as described in A PyTorch implement of Vision Transformers as described in:
'An Image Is Worth 16 x 16 Words: Transformers for Image Recognition at Scale' - https://arxiv.org/abs/2010.11929
'An Image Is Worth 16 x 16 Words: Transformers for Image Recognition at Scale'
- https://arxiv.org/abs/2010.11929
`How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers`
- https://arxiv.org/abs/2106.TODO
The official jax code is released and available at https://github.com/google-research/vision_transformer The official jax code is released and available at https://github.com/google-research/vision_transformer
@ -15,7 +20,7 @@ for some einops/einsum fun
* Simple transformer style inspired by Andrej Karpathy's https://github.com/karpathy/minGPT * Simple transformer style inspired by Andrej Karpathy's https://github.com/karpathy/minGPT
* Bert reference code checks against Huggingface Transformers and Tensorflow Bert * Bert reference code checks against Huggingface Transformers and Tensorflow Bert
Hacked together by / Copyright 2020 Ross Wightman Hacked together by / Copyright 2021 Ross Wightman
""" """
import math import math
import logging import logging

@ -1,13 +1,17 @@
""" Hybrid Vision Transformer (ViT) in PyTorch """ Hybrid Vision Transformer (ViT) in PyTorch
A PyTorch implement of the Hybrid Vision Transformers as described in A PyTorch implement of the Hybrid Vision Transformers as described in:
'An Image Is Worth 16 x 16 Words: Transformers for Image Recognition at Scale' 'An Image Is Worth 16 x 16 Words: Transformers for Image Recognition at Scale'
- https://arxiv.org/abs/2010.11929 - https://arxiv.org/abs/2010.11929
NOTE This relies on code in vision_transformer.py. The hybrid model definitions were moved here to `How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers`
keep file sizes sane. - https://arxiv.org/abs/2106.TODO
NOTE These hybrid model definitions depend on code in vision_transformer.py.
They were moved here to keep file sizes sane.
Hacked together by / Copyright 2020 Ross Wightman Hacked together by / Copyright 2021 Ross Wightman
""" """
from copy import deepcopy from copy import deepcopy
from functools import partial from functools import partial

Loading…
Cancel
Save