<li>ConvNeXt-V2 models and weights added to existing <code>convnext.py</code><ul>
<li>Paper: <ahref="http://arxiv.org/abs/2301.00808">ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders</a></li>
<li>Reference impl: <ahref="https://github.com/facebookresearch/ConvNeXt-V2">https://github.com/facebookresearch/ConvNeXt-V2</a> (NOTE: weights currently CC-BY-NC)</li>
</ul>
</li>
</ul>
<h3id="dec-23-2022">Dec 23, 2022 🎄☃</h3>
<ul>
<li>Add FlexiViT models and weights from <ahref="https://github.com/google-research/big_vision">https://github.com/google-research/big_vision</a> (check out paper at <ahref="https://arxiv.org/abs/2212.08013">https://arxiv.org/abs/2212.08013</a>)<ul>
<li>NOTE currently resizing is static on model creation, on-the-fly dynamic / train patch size sampling is a WIP</li>
</ul>
</li>
<li>Many more models updated to multi-weight and downloadable via HF hub now (convnext, efficientnet, mobilenet, vision_transformer*, beit)</li>
<li>More model pretrained tag and adjustments, some model names changed (working on deprecation translations, consider main branch DEV branch right now, use 0.6.x for stable use)</li>
<li>More ImageNet-12k (subset of 22k) pretrain models popping up:<ul>
<li>Add 'EVA l' to <code>vision_transformer.py</code>, MAE style ViT-L/14 MIM pretrain w/ EVA-CLIP targets, FT on ImageNet-1k (w/ ImageNet-22k intermediate for some)<ul>
<li>Pre-release (<code>0.8.0dev0</code>) of multi-weight support (<code>model_arch.pretrained_tag</code>). Install with <code>pip install --pre timm</code><ul>
<li>vision_transformer, maxvit, convnext are the first three model impl w/ support</li>
<li>model names are changing with this (previous _21k, etc. fn will merge), still sorting out deprecation handling</li>
<li>bugs are likely, but I need feedback so please try it out</li>
<li>if stability is needed, please use 0.6.x pypi releases or clone from <ahref="https://github.com/rwightman/pytorch-image-models/tree/0.6.x">0.6.x branch</a></li>
</ul>
</li>
<li>Support for PyTorch 2.0 compile is added in train/validate/inference/benchmark, use <code>--torchcompile</code> argument</li>
<li>Inference script allows more control over output, select k for top-class index + prob json, csv or parquet output</li>
<li>Add a full set of fine-tuned CLIP image tower weights from both LAION-2B and original OpenAI CLIP models</li>
<li>Port of MaxViT Tensorflow Weights from official impl at <ahref="https://github.com/google-research/maxvit">https://github.com/google-research/maxvit</a><ul>
<li>There was larger than expected drops for the upscaled 384/512 in21k fine-tune weights, possible detail missing, but the 21k FT did seem sensitive to small preprocessing</li>
<li>NOTE: official MaxVit weights (in1k) have been released at <ahref="https://github.com/google-research/maxvit">https://github.com/google-research/maxvit</a> -- some extra work is needed to port and adapt since my impl was created independently of theirs and has a few small differences + the whole TF same padding fun.</li>
</ul>
</li>
</ul>
<h3id="sept-23-2022">Sept 23, 2022</h3>
<ul>
<li>LAION-2B CLIP image towers supported as pretrained backbones for fine-tune or features (no classifier)<ul>
<li>vit_base_patch32_224_clip_laion2b</li>
<li>vit_large_patch14_224_clip_laion2b</li>
<li>vit_huge_patch14_224_clip_laion2b</li>
<li>vit_giant_patch14_224_clip_laion2b</li>
</ul>
</li>
</ul>
<h3id="sept-7-2022">Sept 7, 2022</h3>
<ul>
<li>Hugging Face <ahref="https://huggingface.co/docs/hub/timm"><code>timm</code> docs</a> home now exists, look for more here in the future</li>
<li>Add BEiT-v2 weights for base and large 224x224 models from <ahref="https://github.com/microsoft/unilm/tree/master/beit2">https://github.com/microsoft/unilm/tree/master/beit2</a></li>
<li>Add more weights in <code>maxxvit</code> series incl a <code>pico</code> (7.5M params, 1.9 GMACs), two <code>tiny</code> variants:<ul>
<li>CoAtNet (<ahref="https://arxiv.org/abs/2106.04803">https://arxiv.org/abs/2106.04803</a>) and MaxVit (<ahref="https://arxiv.org/abs/2204.01697">https://arxiv.org/abs/2204.01697</a>) <code>timm</code> original models<ul>
<li>both found in <ahref="https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/maxxvit.py"><code>maxxvit.py</code></a> model def, contains numerous experiments outside scope of original papers</li>
<li>an unfinished Tensorflow version from MaxVit authors can be found <ahref="https://github.com/google-research/maxvit">https://github.com/google-research/maxvit</a></li>
</ul>
</li>
<li>Initial CoAtNet and MaxVit timm pretrained weights (working on more):<ul>
<li>Updated EdgeNeXt to improve ONNX export, add new base variant and weights from original (<ahref="https://github.com/mmaaz60/EdgeNeXt">https://github.com/mmaaz60/EdgeNeXt</a>)</li>
</ul>
<h3id="july-28-2022">July 28, 2022</h3>
<ul>
<li>Add freshly minted DeiT-III Medium (width=512, depth=12, num_heads=8) model weights. Thanks <ahref="https://github.com/TouvronHugo">Hugo Touvron</a>!</li>
</ul>
<h3id="july-27-2022">July 27, 2022</h3>
<ul>
<li>All runtime benchmark and validation result csv files are up-to-date!</li>
<li>A few more weights & model defs added:<ul>
<li><code>cs3*</code> weights above all trained on TPU w/ <code>bits_and_tpu</code> branch. Thanks to TRC program!</li>
<li>Add output_stride=8 and 16 support to ConvNeXt (dilation)</li>
<li>deit3 models not being able to resize pos_emb fixed</li>
<li>Version 0.6.7 PyPi release (/w above bug fixes and new weighs since 0.6.5)</li>
</ul>
<h3id="july-8-2022">July 8, 2022</h3>
<p>More models, more fixes
* Official research models (w/ weights) added:
* EdgeNeXt from (<ahref="https://github.com/mmaaz60/EdgeNeXt">https://github.com/mmaaz60/EdgeNeXt</a>)
* MobileViT-V2 from (<ahref="https://github.com/apple/ml-cvnets">https://github.com/apple/ml-cvnets</a>)
* DeiT III (Revenge of the ViT) from (<ahref="https://github.com/facebookresearch/deit">https://github.com/facebookresearch/deit</a>)
* My own models:
* Small <code>ResNet</code> defs added by request with 1 block repeats for both basic and bottleneck (resnet10 and resnet14)
* <code>CspNet</code> refactored with dataclass config, simplified CrossStage3 (<code>cs3</code>) option. These are closer to YOLO-v5+ backbone defs.
* More relative position vit fiddling. Two <code>srelpos</code> (shared relative position) models trained, and a medium w/ class token.
* Add an alternate downsample mode to EdgeNeXt and train a <code>small</code> model. Better than original small, but not their new USI trained weights.
* My own model weight results (all ImageNet-1k training)
* <code>cs3</code>, <code>darknet</code>, and <code>vit_*relpos</code> weights above all trained on TPU thanks to TRC program! Rest trained on overheating GPUs.
* Hugging Face Hub support fixes verified, demo notebook TBA
* Pretrained weights / configs can be loaded externally (ie from local disk) w/ support for head adaptation.
* Add support to change image extensions scanned by <code>timm</code> datasets/parsers. See (<ahref="https://github.com/rwightman/pytorch-image-models/pull/1274#issuecomment-1178303103">https://github.com/rwightman/pytorch-image-models/pull/1274#issuecomment-1178303103</a>)
* Default ConvNeXt LayerNorm impl to use <code>F.layer_norm(x.permute(0, 2, 3, 1), ...).permute(0, 3, 1, 2)</code> via <code>LayerNorm2d</code> in all cases.
* a bit slower than previous custom impl on some hardware (ie Ampere w/ CL), but overall fewer regressions across wider HW / PyTorch version ranges.
* previous impl exists as <code>LayerNormExp2d</code> in <code>models/layers/norm.py</code>
* Numerous bug fixes
* Currently testing for imminent PyPi 0.6.x release
* LeViT pretraining of larger models still a WIP, they don't train well / easily without distillation. Time to add distill support (finally)?
* ImageNet-22k weight training + finetune ongoing, work on multi-weight support (slowly) chugging along (there are a LOT of weights, sigh) ...</p>
<h3id="may-13-2022">May 13, 2022</h3>
<ul>
<li>Official Swin-V2 models and weights added from (<ahref="https://github.com/microsoft/Swin-Transformer">https://github.com/microsoft/Swin-Transformer</a>). Cleaned up to support torchscript.</li>
<li>Some refactoring for existing <code>timm</code> Swin-V2-CR impl, will likely do a bit more to bring parts closer to official and decide whether to merge some aspects.</li>
<li>More Vision Transformer relative position / residual post-norm experiments (all trained on TPU thanks to TRC program)<ul>
<li><code>vit_relpos_small_patch16_224</code> - 81.5 @ 224, 82.5 @ 320 -- rel pos, layer scale, no class token, avg pool</li>
<li><code>vit_relpos_medium_patch16_rpn_224</code> - 82.3 @ 224, 83.1 @ 320 -- rel pos + res-post-norm, no class token, avg pool</li>
<li><code>vit_relpos_medium_patch16_224</code> - 82.5 @ 224, 83.3 @ 320 -- rel pos, layer scale, no class token, avg pool</li>
<li><code>vit_relpos_base_patch16_gapcls_224</code> - 82.8 @ 224, 83.9 @ 320 -- rel pos, layer scale, class token, avg pool (by mistake)</li>
</ul>
</li>
<li>Bring 512 dim, 8-head 'medium' ViT model variant back to life (after using in a pre DeiT 'small' model for first ViT impl back in 2020)</li>
<li>Add ViT relative position support for switching btw existing impl and some additions in official Swin-V2 impl for future trials</li>
<li>Sequencer2D impl (<ahref="https://arxiv.org/abs/2205.01972">https://arxiv.org/abs/2205.01972</a>), added via PR from author (<ahref="https://github.com/okojoalg">https://github.com/okojoalg</a>)</li>
</ul>
<h3id="may-2-2022">May 2, 2022</h3>
<ul>
<li>Vision Transformer experiments adding Relative Position (Swin-V2 log-coord) (<code>vision_transformer_relpos.py</code>) and Residual Post-Norm branches (from Swin-V2) (<code>vision_transformer*.py</code>)<ul>
<li><code>vit_relpos_base_patch16_224</code> - 82.5 @ 224, 83.6 @ 320 -- rel pos, layer scale, no class token, avg pool</li>
<li><code>vit_base_patch16_rpn_224</code> - 82.3 @ 224 -- rel pos + res-post-norm, no class token, avg pool</li>
</ul>
</li>
<li>Vision Transformer refactor to remove representation layer that was only used in initial vit and rarely used since with newer pretrain (ie <code>How to Train Your ViT</code>)</li>
<li><code>vit_*</code> models support removal of class token, use of global average pool, use of fc_norm (ala beit, mae).</li>
</ul>
<h3id="april-22-2022">April 22, 2022</h3>
<ul>
<li><code>timm</code> models are now officially supported in <ahref="https://www.fast.ai/">fast.ai</a>! Just in time for the new Practical Deep Learning course. <code>timmdocs</code> documentation link updated to <ahref="http://timm.fast.ai/">timm.fast.ai</a>.</li>
<li>Two more model weights added in the TPU trained <ahref="https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-tpu-weights">series</a>. Some In22k pretrain still in progress.<ul>
<li>Add <code>ParallelBlock</code> and <code>LayerScale</code> option to base vit models to support model configs in <ahref="https://arxiv.org/abs/2203.09795">Three things everyone should know about ViT</a></li>
<li><code>convnext_tiny_hnf</code> (head norm first) weights trained with (close to) A2 recipe, 82.2% top-1, could do better with more epochs.</li>
<li>Merge <code>norm_norm_norm</code>. <strong>IMPORTANT</strong> this update for a coming 0.6.x release will likely de-stabilize the master branch for a while. Branch <ahref="https://github.com/rwightman/pytorch-image-models/tree/0.5.x"><code>0.5.x</code></a> or a previous 0.5.x release can be used if stability is required.</li>
<li>Significant weights update (all TPU trained) as described in this <ahref="https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-tpu-weights">release</a><ul>
<li>HuggingFace hub support fixed w/ initial groundwork for allowing alternative 'config sources' for pretrained model definitions and weights (generic local file / remote url support soon)</li>
<li>SwinTransformer-V2 implementation added. Submitted by <ahref="https://github.com/ChristophReich1996">Christoph Reich</a>. Training experiments and model changes by myself are ongoing so expect compat breaks.</li>
<li>Swin-S3 (AutoFormerV2) models / weights added from <ahref="https://github.com/microsoft/Cream/tree/main/AutoFormerV2">https://github.com/microsoft/Cream/tree/main/AutoFormerV2</a></li>
<li>MobileViT models w/ weights adapted from <ahref="https://github.com/apple/ml-cvnets">https://github.com/apple/ml-cvnets</a></li>
<li>PoolFormer models w/ weights adapted from <ahref="https://github.com/sail-sg/poolformer">https://github.com/sail-sg/poolformer</a></li>
<li>VOLO models w/ weights adapted from <ahref="https://github.com/sail-sg/volo">https://github.com/sail-sg/volo</a></li>
<li>Significant work experimenting with non-BatchNorm norm layers such as EvoNorm, FilterResponseNorm, GroupNorm, etc</li>
<li>Enhance support for alternate norm + act ('NormAct') layers added to a number of models, esp EfficientNet/MobileNetV3, RegNet, and aligned Xception</li>
<li>Grouped conv support added to EfficientNet family</li>
<li>Add 'group matching' API to all models to allow grouping model parameters for application of 'layer-wise' LR decay, lr scale added to LR scheduler</li>
<li>Gradient checkpointing support added to many models</li>
<li><code>forward_head(x, pre_logits=False)</code> fn added to all models to allow separate calls of <code>forward_features</code> + <code>forward_head</code></li>
<li>All vision transformer and vision MLP models update to return non-pooled / non-token selected features from <code>foward_features</code>, for consistency with CNN models, token selection or pooling now applied in <code>forward_head</code></li>
<li><ahref="https://github.com/Chris-hughes10">Chris Hughes</a> posted an exhaustive run through of <code>timm</code> on his blog yesterday. Well worth a read. <ahref="https://towardsdatascience.com/getting-started-with-pytorch-image-models-timm-a-practitioners-guide-4e77b4bf9055">Getting Started with PyTorch Image Models (timm): A Practitioner’s Guide</a></li>
<li>I'm currently prepping to merge the <code>norm_norm_norm</code> branch back to master (ver 0.6.x) in next week or so.<ul>
<li>The changes are more extensive than usual and may destabilize and break some model API use (aiming for full backwards compat). So, beware <code>pip install git+https://github.com/rwightman/pytorch-image-models</code> installs!</li>
<li><code>0.5.x</code> releases and a <code>0.5.x</code> branch will remain stable with a cherry pick or two until dust clears. Recommend sticking to pypi install for a bit if you want stable.</li>
<li>Version 0.5.4 w/ release to be pushed to pypi. It's been a while since last pypi update and riskier changes will be merged to main branch soon....</li>
<li>Add ConvNeXT models /w weights from official impl (<ahref="https://github.com/facebookresearch/ConvNeXt">https://github.com/facebookresearch/ConvNeXt</a>), a few perf tweaks, compatible with timm features</li>
<li>Tried training a few small (~1.8-3M param) / mobile optimized models, a few are good so far, more on the way...<ul>
<scriptid="__config"type="application/json">{"base":"..","features":[],"search":"../assets/javascripts/workers/search.12658920.min.js","translations":{"clipboard.copied":"Copied to clipboard","clipboard.copy":"Copy to clipboard","search.result.more.one":"1 more on this page","search.result.more.other":"# more on this page","search.result.none":"No matching documents","search.result.one":"1 matching document","search.result.other":"# matching documents","search.result.placeholder":"Type to start searching","search.result.term.missing":"Missing","select.version":"Select version"}}</script>