@ -21,6 +21,68 @@ And a big thanks to all GitHub sponsors who helped with some of my costs before
## What's New
## What's New
# Dec 5, 2022
* Pre-release (`0.8.0dev0`) of multi-weight support (`model_arch.pretrained_tag`)
* vision_transformer, maxvit, convnext are the first three model impl w/ support
* model names are changing with this (previous _21k, etc. fn will merge), still sorting out deprecation handling
* bugs are likely, but I need feedback so please try it out
* if stability is needed, please use 0.6.x pypi releases or clone from [0.6.x branch ](https://github.com/rwightman/pytorch-image-models/tree/0.6.x )
* Support for PyTorch 2.0 compile is added in train/validate/inference/benchmark, use `--torchcompile` argument
* Inference script allows more control over output, select k for top-class index + prob json, csv or parquet output
* Add a full set of fine-tuned CLIP image tower weights from both LAION-2B and original OpenAI CLIP models
| model | top1 | param_count | gmac | macts | hub |
|:-------------------------------------------------|-------:|--------------:|-------:|--------:|:-------------------------------------------------------------------------------------|
| vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k | 88.6 | 632.5 | 391 | 407.5 | [link ](https://huggingface.co/timm/vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k ) |
| vit_large_patch14_clip_336.openai_ft_in12k_in1k | 88.3 | 304.5 | 191.1 | 270.2 | [link ](https://huggingface.co/timm/vit_large_patch14_clip_336.openai_ft_in12k_in1k ) |
| vit_huge_patch14_clip_224.laion2b_ft_in12k_in1k | 88.2 | 632 | 167.4 | 139.4 | [link ](https://huggingface.co/timm/vit_huge_patch14_clip_224.laion2b_ft_in12k_in1k ) |
| vit_large_patch14_clip_336.laion2b_ft_in12k_in1k | 88.2 | 304.5 | 191.1 | 270.2 | [link ](https://huggingface.co/timm/vit_large_patch14_clip_336.laion2b_ft_in12k_in1k ) |
| vit_large_patch14_clip_224.openai_ft_in12k_in1k | 88.2 | 304.2 | 81.1 | 88.8 | [link ](https://huggingface.co/timm/vit_large_patch14_clip_224.openai_ft_in12k_in1k ) |
| vit_large_patch14_clip_224.laion2b_ft_in12k_in1k | 87.9 | 304.2 | 81.1 | 88.8 | [link ](https://huggingface.co/timm/vit_large_patch14_clip_224.laion2b_ft_in12k_in1k ) |
| vit_large_patch14_clip_224.openai_ft_in1k | 87.9 | 304.2 | 81.1 | 88.8 | [link ](https://huggingface.co/timm/vit_large_patch14_clip_224.openai_ft_in1k ) |
| vit_large_patch14_clip_336.laion2b_ft_in1k | 87.9 | 304.5 | 191.1 | 270.2 | [link ](https://huggingface.co/timm/vit_large_patch14_clip_336.laion2b_ft_in1k ) |
| vit_huge_patch14_clip_224.laion2b_ft_in1k | 87.6 | 632 | 167.4 | 139.4 | [link ](https://huggingface.co/timm/vit_huge_patch14_clip_224.laion2b_ft_in1k ) |
| vit_large_patch14_clip_224.laion2b_ft_in1k | 87.3 | 304.2 | 81.1 | 88.8 | [link ](https://huggingface.co/timm/vit_large_patch14_clip_224.laion2b_ft_in1k ) |
| vit_base_patch16_clip_384.laion2b_ft_in12k_in1k | 87.2 | 86.9 | 55.5 | 101.6 | [link ](https://huggingface.co/timm/vit_base_patch16_clip_384.laion2b_ft_in12k_in1k ) |
| vit_base_patch16_clip_384.openai_ft_in12k_in1k | 87 | 86.9 | 55.5 | 101.6 | [link ](https://huggingface.co/timm/vit_base_patch16_clip_384.openai_ft_in12k_in1k ) |
| vit_base_patch16_clip_384.laion2b_ft_in1k | 86.6 | 86.9 | 55.5 | 101.6 | [link ](https://huggingface.co/timm/vit_base_patch16_clip_384.laion2b_ft_in1k ) |
| vit_base_patch16_clip_384.openai_ft_in1k | 86.2 | 86.9 | 55.5 | 101.6 | [link ](https://huggingface.co/timm/vit_base_patch16_clip_384.openai_ft_in1k ) |
| vit_base_patch16_clip_224.laion2b_ft_in12k_in1k | 86.2 | 86.6 | 17.6 | 23.9 | [link ](https://huggingface.co/timm/vit_base_patch16_clip_224.laion2b_ft_in12k_in1k ) |
| vit_base_patch16_clip_224.openai_ft_in12k_in1k | 85.9 | 86.6 | 17.6 | 23.9 | [link ](https://huggingface.co/timm/vit_base_patch16_clip_224.openai_ft_in12k_in1k ) |
| vit_base_patch32_clip_448.laion2b_ft_in12k_in1k | 85.8 | 88.3 | 17.9 | 23.9 | [link ](https://huggingface.co/timm/vit_base_patch32_clip_448.laion2b_ft_in12k_in1k ) |
| vit_base_patch16_clip_224.laion2b_ft_in1k | 85.5 | 86.6 | 17.6 | 23.9 | [link ](https://huggingface.co/timm/vit_base_patch16_clip_224.laion2b_ft_in1k ) |
| vit_base_patch32_clip_384.laion2b_ft_in12k_in1k | 85.4 | 88.3 | 13.1 | 16.5 | [link ](https://huggingface.co/timm/vit_base_patch32_clip_384.laion2b_ft_in12k_in1k ) |
| vit_base_patch16_clip_224.openai_ft_in1k | 85.3 | 86.6 | 17.6 | 23.9 | [link ](https://huggingface.co/timm/vit_base_patch16_clip_224.openai_ft_in1k ) |
| vit_base_patch32_clip_384.openai_ft_in12k_in1k | 85.2 | 88.3 | 13.1 | 16.5 | [link ](https://huggingface.co/timm/vit_base_patch32_clip_384.openai_ft_in12k_in1k ) |
| vit_base_patch32_clip_224.laion2b_ft_in12k_in1k | 83.3 | 88.2 | 4.4 | 5 | [link ](https://huggingface.co/timm/vit_base_patch32_clip_224.laion2b_ft_in12k_in1k ) |
| vit_base_patch32_clip_224.laion2b_ft_in1k | 82.6 | 88.2 | 4.4 | 5 | [link ](https://huggingface.co/timm/vit_base_patch32_clip_224.laion2b_ft_in1k ) |
| vit_base_patch32_clip_224.openai_ft_in1k | 81.9 | 88.2 | 4.4 | 5 | [link ](https://huggingface.co/timm/vit_base_patch32_clip_224.openai_ft_in1k ) |
* Port of MaxViT Tensorflow Weights from official impl at https://github.com/google-research/maxvit
* There was larger than expected drops for the upscaled 384/512 in21k fine-tune weights, possible detail missing, but the 21k FT did seem sensitive to small preprocessing
| model | top1 | param_count | gmac | macts | hub |
|:-----------------------------------|-------:|--------------:|-------:|--------:|:-----------------------------------------------------------------------|
| maxvit_xlarge_tf_512.in21k_ft_in1k | 88.5 | 475.8 | 534.1 | 1413.2 | [link ](https://huggingface.co/timm/maxvit_xlarge_tf_512.in21k_ft_in1k ) |
| maxvit_xlarge_tf_384.in21k_ft_in1k | 88.3 | 475.3 | 292.8 | 668.8 | [link ](https://huggingface.co/timm/maxvit_xlarge_tf_384.in21k_ft_in1k ) |
| maxvit_base_tf_512.in21k_ft_in1k | 88.2 | 119.9 | 138 | 704 | [link ](https://huggingface.co/timm/maxvit_base_tf_512.in21k_ft_in1k ) |
| maxvit_large_tf_512.in21k_ft_in1k | 88 | 212.3 | 244.8 | 942.2 | [link ](https://huggingface.co/timm/maxvit_large_tf_512.in21k_ft_in1k ) |
| maxvit_large_tf_384.in21k_ft_in1k | 88 | 212 | 132.6 | 445.8 | [link ](https://huggingface.co/timm/maxvit_large_tf_384.in21k_ft_in1k ) |
| maxvit_base_tf_384.in21k_ft_in1k | 87.9 | 119.6 | 73.8 | 332.9 | [link ](https://huggingface.co/timm/maxvit_base_tf_384.in21k_ft_in1k ) |
| maxvit_base_tf_512.in1k | 86.6 | 119.9 | 138 | 704 | [link ](https://huggingface.co/timm/maxvit_base_tf_512.in1k ) |
| maxvit_large_tf_512.in1k | 86.5 | 212.3 | 244.8 | 942.2 | [link ](https://huggingface.co/timm/maxvit_large_tf_512.in1k ) |
| maxvit_base_tf_384.in1k | 86.3 | 119.6 | 73.8 | 332.9 | [link ](https://huggingface.co/timm/maxvit_base_tf_384.in1k ) |
| maxvit_large_tf_384.in1k | 86.2 | 212 | 132.6 | 445.8 | [link ](https://huggingface.co/timm/maxvit_large_tf_384.in1k ) |
| maxvit_small_tf_512.in1k | 86.1 | 69.1 | 67.3 | 383.8 | [link ](https://huggingface.co/timm/maxvit_small_tf_512.in1k ) |
| maxvit_tiny_tf_512.in1k | 85.7 | 31 | 33.5 | 257.6 | [link ](https://huggingface.co/timm/maxvit_tiny_tf_512.in1k ) |
| maxvit_small_tf_384.in1k | 85.5 | 69 | 35.9 | 183.6 | [link ](https://huggingface.co/timm/maxvit_small_tf_384.in1k ) |
| maxvit_tiny_tf_384.in1k | 85.1 | 31 | 17.5 | 123.4 | [link ](https://huggingface.co/timm/maxvit_tiny_tf_384.in1k ) |
| maxvit_large_tf_224.in1k | 84.9 | 211.8 | 43.7 | 127.4 | [link ](https://huggingface.co/timm/maxvit_large_tf_224.in1k ) |
| maxvit_base_tf_224.in1k | 84.9 | 119.5 | 24 | 95 | [link ](https://huggingface.co/timm/maxvit_base_tf_224.in1k ) |
| maxvit_small_tf_224.in1k | 84.4 | 68.9 | 11.7 | 53.2 | [link ](https://huggingface.co/timm/maxvit_small_tf_224.in1k ) |
| maxvit_tiny_tf_224.in1k | 83.4 | 30.9 | 5.6 | 35.8 | [link ](https://huggingface.co/timm/maxvit_tiny_tf_224.in1k ) |
### Oct 15, 2022
### Oct 15, 2022
* Train and validation script enhancements
* Train and validation script enhancements
* Non-GPU (ie CPU) device support
* Non-GPU (ie CPU) device support