pytorch-image-models/notebooks/EffResNetComparison.ipynb

{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "EffResNetComparison",
      "version": "0.3.2",
      "provenance": [],
      "collapsed_sections": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/rwightman/pytorch-image-models/blob/master/notebooks/EffResNetComparison.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7AUmKc2yMHz0",
        "colab_type": "text"
      },
      "source": [
        "# EfficientNets vs ResNets in PyTorch: On Why I Won't Be Tossing My ResNets\n",
        "\n",
        "First off, I want to be clear that I am not panning EfficientNets (https://arxiv.org/abs/1905.11946) here. They are unprecident in their parameter and FLOP efficiency. Thanks Mingxing Tan, Quoc V. Le, and the Google Brain team for releasing the code and weights.\n",
        "\n",
        "I dug into the EfficientNet paper the day it was released. I had recently implemented MobileNet-v3 and MNasNet architectures in PyTorch and EfficientNets have a lot in common with those models. After defining new model definitions strings, adding the depth scaling, and hacking together some weight porting code they were alive. \n",
        "\n",
        "First impressions were positive, \"Wow, that's some impressive accuracy for so few parameters (and such small checkpoints)''. After spending more time with the models, training them, running numerous validations, etc. some realities sank in. These models are less efficient in actual use than I'd expected. I started doing more detailed comparisons with familiar ResNet models and that's how this notebook came to be...\n",
        "\n",
        "## Objectives\n",
        "A few points I'm hoping to illustrate in this notebook:\n",
        "\n",
        "1. The efficiencies of EfficientNets may not translate to better real-world performance on all frameworks and hardware platforms.  Your trusty old ResNets may be just as good for your NN framework of choice running on an NVIDIA GPU. What consumes less resources in Tensorflow with an XLA optimized graph on a TPU, may end up being more resource hungry in PyTorch running with a CUDA backend.\n",
        "\n",
        "2. The story of a ResNet-50 does not end with a top-1 of 76.3% on ImageNet-1k. Neither do the other ResNe(X)t networks end with the results of the original papers or the pretrained weights of canonical Caffe, Tensorflow, or PyTorch implementations. Many papers compare shiny new architectures trained with recent techniques (or algorithmically searched hyper-parameters) to ResNet baselines that aren't given the same training effort. A ResNet-50 can be trained to well over 78% on ImageNet -- better than an 'original' ResNet-152 -- a 35M parameter difference!  I've selected better pretrained models to compare against the EfficientNets. \n",
        "\n",
        "3. Most PyTorch implementations of EfficientNet that I'm aware of are using the Tensorflow ported weights, like my 'tf_efficientnet_b*' models. These ported weights requires explicit padding ops to match the behaviour of Tensorflow 'SAME' padding. This padding adds a runtime penalty (about 2% for forward) and a memory penalty (reducing max batch sizes by roughly 15-20%). I've natively trained the B0 through B2 models in PyTorch now, but haven't made progress on B3 and up (very slow to train).\n",
        "\n",
        "4. There are some nifty inference tricks, like test time pooling, that can breathe life into old models and allow them to be used outside of their standard resolutions without retraining. A few ResNets were run with TTP here at resolutions similar to the EffNet models as a comparison.\n",
        "\n",
        "## Considerations\n",
        "\n",
        "A few additional considerations:\n",
        "* I'm only running the numbers on validation here to keep the Colab notebook sane. I have trained with all of the architectures, the relative differences in throughtput and memory usage/batch size limits fit my experience training as well.\n",
        "\n",
        "* This comparison is for PyTorch 1.0/1.1 with a CUDA backend. Future versions of PyTorch, CUDA, or the PyTorch XLA TPU backend may change things significantly. I'm hoping to compare these models with the PyTorch XLA impl at some point. Not sure if it's ready yet?\n",
        "\n",
        "* The analysis is for the ImageNet classification task. The extra resolution in all EfficientNet > b0 is arguably less beneficial for this task than say fine-grained classification, segmentation, object detection and other more interesting tasks. Since the input resolution is responsible for a large amount of the GPU memory use, and ResNets for those other tasks are also run at higher res, the comparisons made do highly depend on the task.\n",
        "\n",
        "## What's TIMM and where are the models?\n",
        "\n",
        "The `timm` module use here is a PyPi packaging of my PyTorch Image Models \n",
        "- https://github.com/rwightman/pytorch-image-models\n",
        "\n",
        "Stand alone version of the EfficientNet, MobileNet-V3, MNasNet, etc can also be found at \n",
        "- https://github.com/rwightman/gen-efficientnet-pytorch"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "0f8AXYsjtKs5",
        "colab_type": "code",
        "outputId": "c8a180e8-8b39-4905-aa46-f82c58b974a0",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 224
        }
      },
      "source": [
        "# Install necessary modules\n",
        "!pip install timm"
      ],
      "execution_count": 1,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Collecting timm\n",
            "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/1e/87/7de9e1175bda1151de177198bb2e99ac78cf0bdf97309b19f6d22b215b79/timm-0.1.6-py3-none-any.whl (83kB)\n",
            "\u001b[K     |████████████████████████████████| 92kB 28.0MB/s \n",
            "\u001b[?25hRequirement already satisfied: torchvision in /usr/local/lib/python3.6/dist-packages (from timm) (0.3.0)\n",
            "Requirement already satisfied: torch>=1.0 in /usr/local/lib/python3.6/dist-packages (from timm) (1.1.0)\n",
            "Requirement already satisfied: pillow>=4.1.1 in /usr/local/lib/python3.6/dist-packages (from torchvision->timm) (4.3.0)\n",
            "Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from torchvision->timm) (1.16.4)\n",
            "Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from torchvision->timm) (1.12.0)\n",
            "Requirement already satisfied: olefile in /usr/local/lib/python3.6/dist-packages (from pillow>=4.1.1->torchvision->timm) (0.46)\n",
            "Installing collected packages: timm\n",
            "Successfully installed timm-0.1.6\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1qh-__YFuWrS",
        "colab_type": "text"
      },
      "source": [
        ""
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "_GEzMzggMxBw",
        "colab_type": "code",
        "outputId": "183aad75-69aa-4e00-c1bc-06f5b40baecf",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 306
        }
      },
      "source": [
        "# For our convenience, take a peek at what we're working with\n",
        "!nvidia-smi"
      ],
      "execution_count": 2,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Mon Jul  1 20:17:45 2019       \n",
            "+-----------------------------------------------------------------------------+\n",
            "| NVIDIA-SMI 418.67       Driver Version: 410.79       CUDA Version: 10.0     |\n",
            "|-------------------------------+----------------------+----------------------+\n",
            "| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\n",
            "| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\n",
            "|===============================+======================+======================|\n",
            "|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |\n",
            "| N/A   44C    P8    15W /  70W |      0MiB / 15079MiB |      0%      Default |\n",
            "+-------------------------------+----------------------+----------------------+\n",
            "                                                                               \n",
            "+-----------------------------------------------------------------------------+\n",
            "| Processes:                                                       GPU Memory |\n",
            "|  GPU       PID   Type   Process name                             Usage      |\n",
            "|=============================================================================|\n",
            "|  No running processes found                                                 |\n",
            "+-----------------------------------------------------------------------------+\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "_69zvVb7v4cw",
        "colab_type": "code",
        "outputId": "3ca2e609-6c50-47e2-823d-d0e9a07f985f",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 51
        }
      },
      "source": [
        "# Import the core modules, check which GPU we end up with and scale batch size accordingly\n",
        "import torch\n",
        "\n",
        "# Flipping this on/off will change the memory dynamics, since I usually\n",
        "# validate and train with it on, will leave it on by default\n",
        "torch.backends.cudnn.benchmark = True\n",
        "\n",
        "import timm\n",
        "from timm.data import *\n",
        "from timm.utils import *\n",
        "\n",
        "import pynvml\n",
        "from collections import OrderedDict\n",
        "import logging\n",
        "import time\n",
        "\n",
        "def log_gpu_memory():\n",
        "    handle = pynvml.nvmlDeviceGetHandleByIndex(0)\n",
        "    info = pynvml.nvmlDeviceGetMemoryInfo(handle)\n",
        "    info.free = round(info.free / 1024**2)\n",
        "    info.used = round(info.used / 1024**2)\n",
        "    logging.info('GPU memory free: {}, memory used: {}'.format(info.free, info.used))\n",
        "    return info.used\n",
        "\n",
        "def get_gpu_memory_total():\n",
        "    handle = pynvml.nvmlDeviceGetHandleByIndex(0)\n",
        "    info = pynvml.nvmlDeviceGetMemoryInfo(handle)\n",
        "    info.total = round(info.total / 1024**2)\n",
        "    return info.total\n",
        "  \n",
        "pynvml.nvmlInit()\n",
        "setup_default_logging()\n",
        "log_gpu_memory()\n",
        "\n",
        "total_gpu_mem = get_gpu_memory_total()\n",
        "if total_gpu_mem > 12300:\n",
        "  logging.info('Running on a T4 GPU or other with > 12GB memory, setting batch size to 128')\n",
        "  batch_size = 128\n",
        "else:\n",
        "  logging.info('Running on a K80 GPU or other with < 12GB memory, batch size set to 80')\n",
        "  batch_size = 80"
      ],
      "execution_count": 3,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "GPU memory free: 15080, memory used: 0\n",
            "Running on a T4 GPU or other with > 12GB memory, setting batch size to 128\n"
          ],
          "name": "stderr"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "OVQORlCtNEkX",
        "colab_type": "text"
      },
      "source": [
        "# ImageNet-'V2' Validation\n",
        "\n",
        "If you're not aware, ImageNet-V2 (https://github.com/modestyachts/ImageNetV2) is a useful collection of 3 ImageNet-like validation sets that have been collected more recently, 10 years after the original ImageNet.\n",
        "\n",
        "Aside from being conveniently smaller and easier to deploy in a notebook, it's a useful test set to compare how models might generalize beyond the original ImageNet-1k data. We're going to use the 'Matched Frequency' version of the dataset. There is a markedly lower accuracy rate across the board for this test set. It's very interesting to see how different models fall relative to each other. I've included an analysis of those differences at the bottom.\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "IfBJUXdPxa2C",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Download and extract the dataset (note it's not actually a gz like the file says)\n",
        "if not os.path.exists('./imagenetv2-matched-frequency'):\n",
        "    !curl -s https://s3-us-west-2.amazonaws.com/imagenetv2public/imagenetv2-matched-frequency.tar.gz | tar x\n",
        "dataset = Dataset('./imagenetv2-matched-frequency/')\n",
        "for i in range(len(dataset)): # warmup\n",
        "    dummy = dataset[i]"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "yPPC-A50wUji",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# A basic validation routine with timing and accuracy metrics\n",
        "\n",
        "def validate(model, loader):\n",
        "    batch_time = AverageMeter()\n",
        "    losses = AverageMeter()\n",
        "    top1 = AverageMeter()\n",
        "    top5 = AverageMeter()\n",
        "\n",
        "    model.eval()\n",
        "    #torch.cuda.reset_max_memory_allocated()\n",
        "    #torch.cuda.reset_max_memory_cached()\n",
        "    gpu_used_baseline = log_gpu_memory()\n",
        "    gpu_used = 0\n",
        "    start = end = time.time()\n",
        "    num_batches = len(loader)\n",
        "    log_iter = round(0.25 * num_batches)\n",
        "    with torch.no_grad():\n",
        "        for i, (input, target) in enumerate(loader):\n",
        "            target = target.cuda()\n",
        "            input = input.cuda()\n",
        "\n",
        "            output = model(input)\n",
        "\n",
        "            prec1, prec5 = accuracy(output.data, target, topk=(1, 5))\n",
        "            top1.update(prec1.item(), input.size(0))\n",
        "            top5.update(prec5.item(), input.size(0))\n",
        "\n",
        "            batch_time.update(time.time() - end)\n",
        "            end = time.time()\n",
        "\n",
        "            if i and i % log_iter == 0:\n",
        "                if gpu_used == 0:\n",
        "                    gpu_used = log_gpu_memory()\n",
        "                logging.info(\n",
        "                    'Test: [{0:>4d}/{1}]  '\n",
        "                    'Time: {batch_time.val:.3f} ({batch_time.avg:.3f}) '\n",
        "                    'Rate: {rate_avg:.3f} img/sec '\n",
        "                    'Prec@1: {top1.val:>7.4f} ({top1.avg:>7.4f}) '\n",
        "                    'Prec@5: {top5.val:>7.4f} ({top5.avg:>7.4f})'.format(\n",
        "                        i, len(loader), batch_time=batch_time,\n",
        "                        rate_avg=input.size(0) / batch_time.avg,\n",
        "                        loss=losses, top1=top1, top5=top5))\n",
        "    gpu_used = gpu_used - gpu_used_baseline\n",
        "    # These measures are less consistent than method being used wrt\n",
        "    # where the batch sizes can be pushed for each model\n",
        "    #gpu_used = torch.cuda.max_memory_allocated()\n",
        "    #gpu_cached = torch.cuda.max_memory_cached()\n",
        "    elapsed = time.time() - start\n",
        "    results = OrderedDict(\n",
        "        top1=round(top1.avg, 3), top1_err=round(100 - top1.avg, 3),\n",
        "        top5=round(top5.avg, 3), top5_err=round(100 - top5.avg, 3),\n",
        "        rate=len(loader.dataset) / elapsed, gpu_used=gpu_used,\n",
        "    )\n",
        "\n",
        "    logging.info(' * Prec@1 {:.3f} ({:.3f}) Prec@5 {:.3f} ({:.3f}) Rate {:.3f}'.format(\n",
        "       results['top1'], results['top1_err'], results['top5'],\n",
        "       results['top5_err'], results['rate']))\n",
        "\n",
        "    return results\n"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9hj8cy16Wnju",
        "colab_type": "text"
      },
      "source": [
        "# Model Selection\n",
        "\n",
        "As per the intro, one of the goals here is to compare EfficientNets with a more capable set of baseline models. I've gone through the various models included in my collection and selected several that I feel are more appropriate matches based on their Top-1 scores from much better training setups than originals.\n",
        "\n",
        "Here we will split them into 4 lists for analysis and charting:\n",
        "* EfficientNet models with natively trained PyTorch weights and no padding hacks\n",
        "* EfficientNet models with weights ported from Tensorflow and SAME padding hack\n",
        "* ResNe(X)t (or DPN) models at 224x224 native resoultion with weights from myself, Gluon model zoo, or Facebook Instagram trained models\n",
        "* ResNe(X)t models at non-native resolutions with Test Time Pooling enabled\n",
        "\n",
        "Note: I realize it's not entirely fair to include the IG ResNext model since it's not technically trained purely on ImageNet like the others. But, it's a truly impressive model, and actually quite a bit easier to work with in PyTorch than even the B4 EfficientNet."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "DCQg0hky5lVm",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Define the models and arguments that will be used for comparisons\n",
        "\n",
        "# include original ImageNet-1k validation results for comparison against ImageNet-V2 here\n",
        "orig_top1 = dict(\n",
        "    efficientnet_b0=76.912,\n",
        "    efficientnet_b1=78.692,\n",
        "    efficientnet_b2=79.760,\n",
        "    tf_efficientnet_b1=78.554,\n",
        "    tf_efficientnet_b2=79.606,\n",
        "    tf_efficientnet_b3=80.874,\n",
        "    tf_efficientnet_b4=82.604,\n",
        "    dpn68b=77.514,\n",
        "    seresnext26_32x4d=77.104,\n",
        "    resnet50=78.486,\n",
        "    gluon_seresnext50_32x4d=79.912,\n",
        "    gluon_seresnext101_32x4d=80.902,\n",
        "    ig_resnext101_32x8d=82.688,\n",
        ")\n",
        "\n",
        "models_effnet = [\n",
        "    dict(model_name='efficientnet_b0'),\n",
        "    dict(model_name='efficientnet_b1'),\n",
        "    dict(model_name='efficientnet_b2'),\n",
        "]\n",
        "\n",
        "models_effnet_tf = [\n",
        "    dict(model_name='tf_efficientnet_b2'), # overlapping between TF non-TF for comparison\n",
        "    dict(model_name='tf_efficientnet_b3'),\n",
        "    dict(model_name='tf_efficientnet_b4'),\n",
        "]\n",
        "\n",
        "models_resnet = [\n",
        "    dict(model_name='dpn68b'),  # b0, yes, not a ResNet, need to find a better b0 comparison\n",
        "    #dict(model_name='seresnext26_32x4d'),  # b0, not the best b0 comparison either, a little slow\n",
        "    dict(model_name='resnet50'), # b1\n",
        "    dict(model_name='gluon_seresnext50_32x4d'), # b2-b3\n",
        "    dict(model_name='gluon_seresnext101_32x4d'), # b3\n",
        "    dict(model_name='ig_resnext101_32x8d'), # b4\n",
        "]\n",
        "\n",
        "models_resnet_ttp = [\n",
        "    dict(model_name='resnet50', input_size=(3, 240, 240), ttp=True),\n",
        "    dict(model_name='resnet50', input_size=(3, 260, 260), ttp=True),\n",
        "    dict(model_name='gluon_seresnext50_32x4d', input_size=(3, 260, 260), ttp=True),\n",
        "    dict(model_name='gluon_seresnext50_32x4d', input_size=(3, 300, 300), ttp=True),\n",
        "    dict(model_name='gluon_seresnext101_32x4d', input_size=(3, 260, 260), ttp=True),\n",
        "    dict(model_name='gluon_seresnext101_32x4d', input_size=(3, 300, 300), ttp=True),\n",
        "    dict(model_name='ig_resnext101_32x8d', input_size=(3, 300, 300), ttp=True),\n",
        "]"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "PPloo-oE545b",
        "colab_type": "text"
      },
      "source": [
        "# Model Runner\n",
        "\n",
        "The runner creates each model, a matching data loader, and runs the validation. It uses several features of my image collection module for this.\n",
        "\n",
        "Test time pooling is enabled here if requested in the model_args. The pooling is implemented as a module the wraps the base network. It's important to set the crop factor for the images to 1.0 when enabling pooling."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "BX_CKBnM8XNO",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "from timm.models import TestTimePoolHead\n",
        "\n",
        "def model_runner(model_args):\n",
        "    model_name = model_args['model_name']\n",
        "    pretrained = True\n",
        "    checkpoint_path = ''\n",
        "    if 'model_url' in model_args and model_args['model_url']:\n",
        "        !wget -q {model_args['model_url']}\n",
        "        checkpoint_path = './' + os.path.basename(model_args['model_url'])\n",
        "        logging.info('Downloaded checkpoint {} from specified URL'.format(checkpoint_path))\n",
        "        pretrained = False\n",
        "    \n",
        "    model = timm.create_model(\n",
        "        model_name,\n",
        "        num_classes=1000,\n",
        "        in_chans=3,\n",
        "        pretrained=pretrained,\n",
        "        checkpoint_path=checkpoint_path)\n",
        "\n",
        "    data_config = timm.data.resolve_data_config(model_args, model=model, verbose=True)\n",
        "    \n",
        "    ttp = False\n",
        "    if 'ttp' in model_args and model_args['ttp']:\n",
        "        ttp = True\n",
        "        logging.info('Applying test time pooling to model')\n",
        "        model = TestTimePoolHead(model, original_pool=model.default_cfg['pool_size'])\n",
        "  \n",
        "    model_key = [model_name, str(data_config['input_size'][-1])]\n",
        "    if ttp:\n",
        "        model_key += ['ttp']\n",
        "    model_key = '-'.join(model_key)\n",
        "    param_count = sum([m.numel() for m in model.parameters()])\n",
        "    logging.info('Model {} created, param count: {}. Running...'.format(model_key, param_count))\n",
        "\n",
        "    model = model.cuda()\n",
        "\n",
        "    loader = create_loader(\n",
        "        dataset,\n",
        "        input_size=data_config['input_size'],\n",
        "        batch_size=batch_size,\n",
        "        use_prefetcher=True,\n",
        "        interpolation='bicubic',\n",
        "        mean=data_config['mean'],\n",
        "        std=data_config['std'],\n",
        "        crop_pct=1.0 if ttp else data_config['crop_pct'],\n",
        "        num_workers=2)\n",
        "\n",
        "    result = validate(model, loader)\n",
        "    \n",
        "    logging.info('Model {} done.\\n'.format(model_key))\n",
        "    result['param_count'] = param_count / 1e6\n",
        "    # add extra non-metric keys for comparisons    \n",
        "    result['model_name'] = model_name\n",
        "    result['input_size'] = data_config['input_size']\n",
        "    result['ttp'] = ttp\n",
        "\n",
        "    del model\n",
        "    del loader\n",
        "    torch.cuda.empty_cache()\n",
        "    \n",
        "    return model_key, result"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "xx-j8Z-z_EGo",
        "colab_type": "code",
        "outputId": "8c6571b5-131e-419d-b9e6-2366a45cda8e",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1000
        }
      },
      "source": [
        "# Run validation on all the models, get a coffee (or two)\n",
        "results_effnet = {}\n",
        "results_effnet_tf = {}\n",
        "results_resnet = {}\n",
        "results_resnet_ttp = {}\n",
        "\n",
        "logging.info('Running validation on native PyTorch EfficientNet models')\n",
        "for ma in models_effnet:\n",
        "    mk, mr = model_runner(ma)\n",
        "    results_effnet[mk] = mr\n",
        "    \n",
        "logging.info('Running validation on ported Tensorflow EfficientNet models')\n",
        "for ma in models_effnet_tf:\n",
        "    mk, mr = model_runner(ma)\n",
        "    results_effnet_tf[mk] = mr\n",
        "    \n",
        "logging.info('Running validation on ResNe(X)t models')\n",
        "for ma in models_resnet:\n",
        "    mk, mr = model_runner(ma)\n",
        "    results_resnet[mk] = mr\n",
        "    \n",
        "logging.info('Running validation on ResNe(X)t models w/ Test Time Pooling enabled')\n",
        "for ma in models_resnet_ttp:\n",
        "    mk, mr = model_runner(ma)\n",
        "    results_resnet_ttp[mk] = mr\n",
        "    \n",
        "results = {**results_effnet, **results_effnet_tf, **results_resnet, **results_resnet_ttp}"
      ],
      "execution_count": 8,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Running validation on native PyTorch EfficientNet models\n",
            "Downloading: \"https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/efficientnet_b0-d6904d92.pth\" to /root/.cache/torch/checkpoints/efficientnet_b0-d6904d92.pth\n",
            "100%|██████████| 21376958/21376958 [00:02<00:00, 8676444.76it/s]\n",
            "Data processing configuration for current model + dataset:\n",
            "\tinput_size: (3, 224, 224)\n",
            "\tinterpolation: bicubic\n",
            "\tmean: (0.485, 0.456, 0.406)\n",
            "\tstd: (0.229, 0.224, 0.225)\n",
            "\tcrop_pct: 0.875\n",
            "Model efficientnet_b0-224 created, param count: 5288548. Running...\n",
            "GPU memory free: 14276, memory used: 804\n",
            "GPU memory free: 11346, memory used: 3734\n",
            "Test: [  20/79]  Time: 0.190 (0.805) Rate: 159.098 img/sec Prec@1: 64.8438 (69.6801) Prec@5: 87.5000 (88.9509)\n",
            "Test: [  40/79]  Time: 0.194 (0.800) Rate: 159.972 img/sec Prec@1: 51.5625 (68.8072) Prec@5: 79.6875 (88.5671)\n",
            "Test: [  60/79]  Time: 0.186 (0.790) Rate: 162.028 img/sec Prec@1: 60.9375 (66.1501) Prec@5: 83.5938 (86.6035)\n",
            " * Prec@1 64.580 (35.420) Prec@5 85.890 (14.110) Rate 165.732\n",
            "Model efficientnet_b0-224 done.\n",
            "\n",
            "Downloading: \"https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/efficientnet_b1-533bc792.pth\" to /root/.cache/torch/checkpoints/efficientnet_b1-533bc792.pth\n",
            "100%|██████████| 31502706/31502706 [00:03<00:00, 9936470.52it/s] \n",
            "Data processing configuration for current model + dataset:\n",
            "\tinput_size: (3, 240, 240)\n",
            "\tinterpolation: bicubic\n",
            "\tmean: (0.485, 0.456, 0.406)\n",
            "\tstd: (0.229, 0.224, 0.225)\n",
            "\tcrop_pct: 0.882\n",
            "Model efficientnet_b1-240 created, param count: 7794184. Running...\n",
            "GPU memory free: 14260, memory used: 820\n",
            "GPU memory free: 10890, memory used: 4190\n",
            "Test: [  20/79]  Time: 0.311 (0.919) Rate: 139.286 img/sec Prec@1: 69.5312 (73.9583) Prec@5: 86.7188 (90.7366)\n",
            "Test: [  40/79]  Time: 0.310 (0.878) Rate: 145.851 img/sec Prec@1: 58.5938 (72.1799) Prec@5: 81.2500 (89.9200)\n",
            "Test: [  60/79]  Time: 0.312 (0.867) Rate: 147.679 img/sec Prec@1: 67.1875 (69.0958) Prec@5: 81.2500 (87.9867)\n",
            " * Prec@1 67.550 (32.450) Prec@5 87.290 (12.710) Rate 151.628\n",
            "Model efficientnet_b1-240 done.\n",
            "\n",
            "Downloading: \"https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/efficientnet_b2-cf78dc4d.pth\" to /root/.cache/torch/checkpoints/efficientnet_b2-cf78dc4d.pth\n",
            "100%|██████████| 36788101/36788101 [00:03<00:00, 11752398.17it/s]\n",
            "Data processing configuration for current model + dataset:\n",
            "\tinput_size: (3, 260, 260)\n",
            "\tinterpolation: bicubic\n",
            "\tmean: (0.485, 0.456, 0.406)\n",
            "\tstd: (0.229, 0.224, 0.225)\n",
            "\tcrop_pct: 0.89\n",
            "Model efficientnet_b2-260 created, param count: 9109994. Running...\n",
            "GPU memory free: 14258, memory used: 822\n",
            "GPU memory free: 10266, memory used: 4814\n",
            "Test: [  20/79]  Time: 0.416 (0.941) Rate: 136.036 img/sec Prec@1: 68.7500 (72.9539) Prec@5: 88.2812 (91.0714)\n",
            "Test: [  40/79]  Time: 0.429 (0.914) Rate: 140.068 img/sec Prec@1: 58.5938 (71.9893) Prec@5: 82.0312 (90.4535)\n",
            "Test: [  60/79]  Time: 0.527 (0.894) Rate: 143.120 img/sec Prec@1: 64.0625 (69.3904) Prec@5: 85.9375 (88.8960)\n",
            " * Prec@1 67.800 (32.200) Prec@5 88.200 (11.800) Rate 144.201\n",
            "Model efficientnet_b2-260 done.\n",
            "\n",
            "Running validation on ported Tensorflow EfficientNet models\n",
            "Downloading: \"https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_efficientnet_b2-e393ef04.pth\" to /root/.cache/torch/checkpoints/tf_efficientnet_b2-e393ef04.pth\n",
            "100%|██████████| 36797929/36797929 [00:03<00:00, 11014399.83it/s]\n",
            "Data processing configuration for current model + dataset:\n",
            "\tinput_size: (3, 260, 260)\n",
            "\tinterpolation: bicubic\n",
            "\tmean: (0.485, 0.456, 0.406)\n",
            "\tstd: (0.229, 0.224, 0.225)\n",
            "\tcrop_pct: 0.89\n",
            "Model tf_efficientnet_b2-260 created, param count: 9109994. Running...\n",
            "GPU memory free: 14258, memory used: 822\n",
            "GPU memory free: 9568, memory used: 5512\n",
            "Test: [  20/79]  Time: 1.217 (0.960) Rate: 133.306 img/sec Prec@1: 66.4062 (72.7679) Prec@5: 87.5000 (90.4018)\n",
            "Test: [  40/79]  Time: 0.522 (0.917) Rate: 139.645 img/sec Prec@1: 58.5938 (71.3986) Prec@5: 79.6875 (89.7675)\n",
            "Test: [  60/79]  Time: 0.939 (0.908) Rate: 141.046 img/sec Prec@1: 64.8438 (68.9037) Prec@5: 85.1562 (88.2172)\n",
            " * Prec@1 67.400 (32.600) Prec@5 87.580 (12.420) Rate 142.727\n",
            "Model tf_efficientnet_b2-260 done.\n",
            "\n",
            "Downloading: \"https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_efficientnet_b3-e3bd6955.pth\" to /root/.cache/torch/checkpoints/tf_efficientnet_b3-e3bd6955.pth\n",
            "100%|██████████| 49381362/49381362 [00:03<00:00, 12584590.15it/s]\n",
            "Data processing configuration for current model + dataset:\n",
            "\tinput_size: (3, 300, 300)\n",
            "\tinterpolation: bicubic\n",
            "\tmean: (0.485, 0.456, 0.406)\n",
            "\tstd: (0.229, 0.224, 0.225)\n",
            "\tcrop_pct: 0.904\n",
            "Model tf_efficientnet_b3-300 created, param count: 12233232. Running...\n",
            "GPU memory free: 14242, memory used: 838\n",
            "GPU memory free: 5604, memory used: 9476\n",
            "Test: [  20/79]  Time: 1.267 (1.161) Rate: 110.269 img/sec Prec@1: 66.4062 (73.8467) Prec@5: 90.6250 (91.6667)\n",
            "Test: [  40/79]  Time: 0.833 (1.097) Rate: 116.649 img/sec Prec@1: 60.9375 (72.8087) Prec@5: 83.5938 (90.7393)\n",
            "Test: [  60/79]  Time: 1.242 (1.082) Rate: 118.310 img/sec Prec@1: 67.1875 (70.1588) Prec@5: 84.3750 (89.1522)\n",
            " * Prec@1 68.520 (31.480) Prec@5 88.700 (11.300) Rate 119.134\n",
            "Model tf_efficientnet_b3-300 done.\n",
            "\n",
            "Downloading: \"https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_efficientnet_b4-74ee3bed.pth\" to /root/.cache/torch/checkpoints/tf_efficientnet_b4-74ee3bed.pth\n",
            "100%|██████████| 77989689/77989689 [00:06<00:00, 12751872.12it/s]\n",
            "Data processing configuration for current model + dataset:\n",
            "\tinput_size: (3, 380, 380)\n",
            "\tinterpolation: bicubic\n",
            "\tmean: (0.485, 0.456, 0.406)\n",
            "\tstd: (0.229, 0.224, 0.225)\n",
            "\tcrop_pct: 0.922\n",
            "Model tf_efficientnet_b4-380 created, param count: 19341616. Running...\n",
            "GPU memory free: 14214, memory used: 866\n",
            "GPU memory free: 2460, memory used: 12620\n",
            "Test: [  20/79]  Time: 1.761 (2.057) Rate: 62.222 img/sec Prec@1: 69.5312 (76.4509) Prec@5: 91.4062 (92.6339)\n",
            "Test: [  40/79]  Time: 1.740 (1.914) Rate: 66.889 img/sec Prec@1: 64.8438 (75.4954) Prec@5: 83.5938 (92.2637)\n",
            "Test: [  60/79]  Time: 1.782 (1.866) Rate: 68.600 img/sec Prec@1: 71.0938 (72.8740) Prec@5: 85.1562 (90.6634)\n",
            " * Prec@1 71.340 (28.660) Prec@5 90.110 (9.890) Rate 69.103\n",
            "Model tf_efficientnet_b4-380 done.\n",
            "\n",
            "Running validation on ResNe(X)t models\n",
            "Downloading: \"https://github.com/rwightman/pytorch-dpn-pretrained/releases/download/v0.1/dpn68b_extra-84854c156.pth\" to /root/.cache/torch/checkpoints/dpn68b_extra-84854c156.pth\n",
            "100%|██████████| 50765517/50765517 [00:04<00:00, 12271223.44it/s]\n",
            "Data processing configuration for current model + dataset:\n",
            "\tinput_size: (3, 224, 224)\n",
            "\tinterpolation: bicubic\n",
            "\tmean: (0.48627450980392156, 0.4588235294117647, 0.40784313725490196)\n",
            "\tstd: (0.23482446870963955, 0.23482446870963955, 0.23482446870963955)\n",
            "\tcrop_pct: 0.875\n",
            "Model dpn68b-224 created, param count: 12611602. Running...\n",
            "GPU memory free: 14240, memory used: 840\n",
            "GPU memory free: 11342, memory used: 3738\n",
            "Test: [  20/79]  Time: 0.442 (0.876) Rate: 146.176 img/sec Prec@1: 54.6875 (70.2381) Prec@5: 85.9375 (88.9509)\n",
            "Test: [  40/79]  Time: 1.007 (0.847) Rate: 151.177 img/sec Prec@1: 57.8125 (69.5122) Prec@5: 78.9062 (88.4337)\n",
            "Test: [  60/79]  Time: 1.015 (0.834) Rate: 153.556 img/sec Prec@1: 60.1562 (66.8033) Prec@5: 78.9062 (86.5907)\n",
            " * Prec@1 65.600 (34.400) Prec@5 85.940 (14.060) Rate 155.150\n",
            "Model dpn68b-224 done.\n",
            "\n",
            "Downloading: \"https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/rw_resnet50-86acaeed.pth\" to /root/.cache/torch/checkpoints/rw_resnet50-86acaeed.pth\n",
            "100%|██████████| 102488165/102488165 [00:07<00:00, 13755311.81it/s]\n",
            "Data processing configuration for current model + dataset:\n",
            "\tinput_size: (3, 224, 224)\n",
            "\tinterpolation: bicubic\n",
            "\tmean: (0.485, 0.456, 0.406)\n",
            "\tstd: (0.229, 0.224, 0.225)\n",
            "\tcrop_pct: 0.875\n",
            "Model resnet50-224 created, param count: 25557032. Running...\n",
            "GPU memory free: 14182, memory used: 898\n",
            "GPU memory free: 12652, memory used: 2428\n",
            "Test: [  20/79]  Time: 0.406 (0.859) Rate: 149.042 img/sec Prec@1: 66.4062 (72.6562) Prec@5: 90.6250 (90.4762)\n",
            "Test: [  40/79]  Time: 0.662 (0.820) Rate: 156.156 img/sec Prec@1: 58.5938 (71.1128) Prec@5: 85.9375 (89.5960)\n",
            "Test: [  60/79]  Time: 0.601 (0.807) Rate: 158.594 img/sec Prec@1: 61.7188 (68.3017) Prec@5: 82.0312 (87.7946)\n",
            " * Prec@1 66.810 (33.190) Prec@5 87.000 (13.000) Rate 159.510\n",
            "Model resnet50-224 done.\n",
            "\n",
            "Downloading: \"https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_seresnext50_32x4d-90cf2d6e.pth\" to /root/.cache/torch/checkpoints/gluon_seresnext50_32x4d-90cf2d6e.pth\n",
            "100%|██████████| 110578827/110578827 [00:08<00:00, 12788555.61it/s]\n",
            "Data processing configuration for current model + dataset:\n",
            "\tinput_size: (3, 224, 224)\n",
            "\tinterpolation: bicubic\n",
            "\tmean: (0.485, 0.456, 0.406)\n",
            "\tstd: (0.229, 0.224, 0.225)\n",
            "\tcrop_pct: 0.875\n",
            "Model gluon_seresnext50_32x4d-224 created, param count: 27559896. Running...\n",
            "GPU memory free: 14180, memory used: 900\n",
            "GPU memory free: 12510, memory used: 2570\n",
            "Test: [  20/79]  Time: 1.013 (0.875) Rate: 146.238 img/sec Prec@1: 70.3125 (74.2188) Prec@5: 88.2812 (91.0714)\n",
            "Test: [  40/79]  Time: 1.197 (0.859) Rate: 149.059 img/sec Prec@1: 60.9375 (72.8849) Prec@5: 82.8125 (90.4345)\n",
            "Test: [  60/79]  Time: 1.185 (0.859) Rate: 148.930 img/sec Prec@1: 64.8438 (70.0307) Prec@5: 84.3750 (88.8064)\n",
            " * Prec@1 68.670 (31.330) Prec@5 88.320 (11.680) Rate 150.435\n",
            "Model gluon_seresnext50_32x4d-224 done.\n",
            "\n",
            "Downloading: \"https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_seresnext101_32x4d-cf52900d.pth\" to /root/.cache/torch/checkpoints/gluon_seresnext101_32x4d-cf52900d.pth\n",
            "100%|██████████| 196505510/196505510 [00:12<00:00, 16164511.02it/s]\n",
            "Data processing configuration for current model + dataset:\n",
            "\tinput_size: (3, 224, 224)\n",
            "\tinterpolation: bicubic\n",
            "\tmean: (0.485, 0.456, 0.406)\n",
            "\tstd: (0.229, 0.224, 0.225)\n",
            "\tcrop_pct: 0.875\n",
            "Model gluon_seresnext101_32x4d-224 created, param count: 48955416. Running...\n",
            "GPU memory free: 14086, memory used: 994\n",
            "GPU memory free: 12272, memory used: 2808\n",
            "Test: [  20/79]  Time: 0.897 (1.016) Rate: 125.932 img/sec Prec@1: 72.6562 (75.5580) Prec@5: 88.2812 (91.6667)\n",
            "Test: [  40/79]  Time: 0.899 (0.997) Rate: 128.324 img/sec Prec@1: 64.8438 (74.4284) Prec@5: 83.5938 (91.2538)\n",
            "Test: [  60/79]  Time: 0.867 (0.986) Rate: 129.853 img/sec Prec@1: 67.1875 (71.7597) Prec@5: 89.0625 (89.6644)\n",
            " * Prec@1 70.010 (29.990) Prec@5 88.910 (11.090) Rate 131.572\n",
            "Model gluon_seresnext101_32x4d-224 done.\n",
            "\n",
            "Downloading: \"https://download.pytorch.org/models/ig_resnext101_32x8-c38310e5.pth\" to /root/.cache/torch/checkpoints/ig_resnext101_32x8-c38310e5.pth\n",
            "100%|██████████| 356056638/356056638 [00:11<00:00, 31320647.42it/s]\n",
            "Data processing configuration for current model + dataset:\n",
            "\tinput_size: (3, 224, 224)\n",
            "\tinterpolation: bilinear\n",
            "\tmean: (0.485, 0.456, 0.406)\n",
            "\tstd: (0.229, 0.224, 0.225)\n",
            "\tcrop_pct: 0.875\n",
            "Model ig_resnext101_32x8d-224 created, param count: 88791336. Running...\n",
            "GPU memory free: 13946, memory used: 1134\n",
            "GPU memory free: 10564, memory used: 4516\n",
            "Test: [  20/79]  Time: 1.560 (1.664) Rate: 76.934 img/sec Prec@1: 76.5625 (78.9807) Prec@5: 93.7500 (94.2708)\n",
            "Test: [  40/79]  Time: 1.450 (1.582) Rate: 80.907 img/sec Prec@1: 66.4062 (77.9535) Prec@5: 88.2812 (93.7881)\n",
            "Test: [  60/79]  Time: 1.470 (1.540) Rate: 83.129 img/sec Prec@1: 74.2188 (75.0256) Prec@5: 91.4062 (92.6358)\n",
            " * Prec@1 73.830 (26.170) Prec@5 92.280 (7.720) Rate 83.352\n",
            "Model ig_resnext101_32x8d-224 done.\n",
            "\n",
            "Running validation on ResNe(X)t models w/ Test Time Pooling enabled\n",
            "Data processing configuration for current model + dataset:\n",
            "\tinput_size: (3, 240, 240)\n",
            "\tinterpolation: bicubic\n",
            "\tmean: (0.485, 0.456, 0.406)\n",
            "\tstd: (0.229, 0.224, 0.225)\n",
            "\tcrop_pct: 0.875\n",
            "Applying test time pooling to model\n",
            "Model resnet50-240-ttp created, param count: 25557032. Running...\n",
            "GPU memory free: 14182, memory used: 898\n",
            "GPU memory free: 12098, memory used: 2982\n",
            "Test: [  20/79]  Time: 0.429 (0.892) Rate: 143.505 img/sec Prec@1: 67.1875 (72.7679) Prec@5: 89.0625 (90.3274)\n",
            "Test: [  40/79]  Time: 0.757 (0.845) Rate: 151.416 img/sec Prec@1: 55.4688 (71.1128) Prec@5: 84.3750 (89.5198)\n",
            "Test: [  60/79]  Time: 1.154 (0.831) Rate: 154.108 img/sec Prec@1: 61.7188 (68.4170) Prec@5: 82.8125 (87.6537)\n",
            " * Prec@1 67.020 (32.980) Prec@5 87.040 (12.960) Rate 154.346\n",
            "Model resnet50-240-ttp done.\n",
            "\n",
            "Data processing configuration for current model + dataset:\n",
            "\tinput_size: (3, 260, 260)\n",
            "\tinterpolation: bicubic\n",
            "\tmean: (0.485, 0.456, 0.406)\n",
            "\tstd: (0.229, 0.224, 0.225)\n",
            "\tcrop_pct: 0.875\n",
            "Applying test time pooling to model\n",
            "Model resnet50-260-ttp created, param count: 25557032. Running...\n",
            "GPU memory free: 14182, memory used: 898\n",
            "GPU memory free: 11650, memory used: 3430\n",
            "Test: [  20/79]  Time: 1.172 (1.097) Rate: 116.650 img/sec Prec@1: 68.7500 (72.9911) Prec@5: 87.5000 (90.5134)\n",
            "Test: [  40/79]  Time: 0.902 (0.976) Rate: 131.211 img/sec Prec@1: 57.8125 (72.0084) Prec@5: 82.8125 (89.9581)\n",
            "Test: [  60/79]  Time: 0.832 (0.940) Rate: 136.223 img/sec Prec@1: 60.1562 (69.2751) Prec@5: 85.9375 (88.2684)\n",
            " * Prec@1 67.630 (32.370) Prec@5 87.630 (12.370) Rate 135.915\n",
            "Model resnet50-260-ttp done.\n",
            "\n",
            "Data processing configuration for current model + dataset:\n",
            "\tinput_size: (3, 260, 260)\n",
            "\tinterpolation: bicubic\n",
            "\tmean: (0.485, 0.456, 0.406)\n",
            "\tstd: (0.229, 0.224, 0.225)\n",
            "\tcrop_pct: 0.875\n",
            "Applying test time pooling to model\n",
            "Model gluon_seresnext50_32x4d-260-ttp created, param count: 27559896. Running...\n",
            "GPU memory free: 14180, memory used: 900\n",
            "GPU memory free: 11594, memory used: 3486\n",
            "Test: [  20/79]  Time: 1.229 (1.147) Rate: 111.577 img/sec Prec@1: 71.8750 (74.4420) Prec@5: 86.7188 (91.2946)\n",
            "Test: [  40/79]  Time: 1.056 (1.053) Rate: 121.593 img/sec Prec@1: 62.5000 (73.8567) Prec@5: 85.1562 (90.6822)\n",
            "Test: [  60/79]  Time: 1.133 (1.015) Rate: 126.067 img/sec Prec@1: 68.7500 (71.1194) Prec@5: 86.7188 (89.0625)\n",
            " * Prec@1 69.670 (30.330) Prec@5 88.620 (11.380) Rate 126.519\n",
            "Model gluon_seresnext50_32x4d-260-ttp done.\n",
            "\n",
            "Data processing configuration for current model + dataset:\n",
            "\tinput_size: (3, 300, 300)\n",
            "\tinterpolation: bicubic\n",
            "\tmean: (0.485, 0.456, 0.406)\n",
            "\tstd: (0.229, 0.224, 0.225)\n",
            "\tcrop_pct: 0.875\n",
            "Applying test time pooling to model\n",
            "Model gluon_seresnext50_32x4d-300-ttp created, param count: 27559896. Running...\n",
            "GPU memory free: 14180, memory used: 900\n",
            "GPU memory free: 10880, memory used: 4200\n",
            "Test: [  20/79]  Time: 1.041 (1.484) Rate: 86.250 img/sec Prec@1: 71.8750 (76.3021) Prec@5: 89.0625 (91.9271)\n",
            "Test: [  40/79]  Time: 1.037 (1.287) Rate: 99.457 img/sec Prec@1: 64.0625 (75.0572) Prec@5: 86.7188 (91.3300)\n",
            "Test: [  60/79]  Time: 1.064 (1.216) Rate: 105.295 img/sec Prec@1: 71.0938 (72.1952) Prec@5: 88.2812 (89.7285)\n",
            " * Prec@1 70.470 (29.530) Prec@5 89.180 (10.820) Rate 104.694\n",
            "Model gluon_seresnext50_32x4d-300-ttp done.\n",
            "\n",
            "Data processing configuration for current model + dataset:\n",
            "\tinput_size: (3, 260, 260)\n",
            "\tinterpolation: bicubic\n",
            "\tmean: (0.485, 0.456, 0.406)\n",
            "\tstd: (0.229, 0.224, 0.225)\n",
            "\tcrop_pct: 0.875\n",
            "Applying test time pooling to model\n",
            "Model gluon_seresnext101_32x4d-260-ttp created, param count: 48955416. Running...\n",
            "GPU memory free: 14086, memory used: 994\n",
            "GPU memory free: 11634, memory used: 3446\n",
            "Test: [  20/79]  Time: 1.307 (1.413) Rate: 90.559 img/sec Prec@1: 71.8750 (76.3393) Prec@5: 89.0625 (92.0387)\n",
            "Test: [  40/79]  Time: 1.307 (1.362) Rate: 93.981 img/sec Prec@1: 61.7188 (75.6479) Prec@5: 82.0312 (91.8826)\n",
            "Test: [  60/79]  Time: 1.303 (1.343) Rate: 95.329 img/sec Prec@1: 74.2188 (72.8868) Prec@5: 87.5000 (90.1895)\n",
            " * Prec@1 71.140 (28.860) Prec@5 89.470 (10.530) Rate 95.842\n",
            "Model gluon_seresnext101_32x4d-260-ttp done.\n",
            "\n",
            "Data processing configuration for current model + dataset:\n",
            "\tinput_size: (3, 300, 300)\n",
            "\tinterpolation: bicubic\n",
            "\tmean: (0.485, 0.456, 0.406)\n",
            "\tstd: (0.229, 0.224, 0.225)\n",
            "\tcrop_pct: 0.875\n",
            "Applying test time pooling to model\n",
            "Model gluon_seresnext101_32x4d-300-ttp created, param count: 48955416. Running...\n",
            "GPU memory free: 14086, memory used: 994\n",
            "GPU memory free: 10834, memory used: 4246\n",
            "Test: [  20/79]  Time: 1.691 (1.786) Rate: 71.683 img/sec Prec@1: 71.8750 (77.5298) Prec@5: 91.4062 (93.1176)\n",
            "Test: [  40/79]  Time: 1.669 (1.732) Rate: 73.888 img/sec Prec@1: 63.2812 (76.2767) Prec@5: 85.1562 (92.5877)\n",
            "Test: [  60/79]  Time: 1.693 (1.715) Rate: 74.635 img/sec Prec@1: 75.0000 (73.7193) Prec@5: 92.1875 (90.9964)\n",
            " * Prec@1 71.990 (28.010) Prec@5 90.100 (9.900) Rate 74.874\n",
            "Model gluon_seresnext101_32x4d-300-ttp done.\n",
            "\n",
            "Data processing configuration for current model + dataset:\n",
            "\tinput_size: (3, 300, 300)\n",
            "\tinterpolation: bilinear\n",
            "\tmean: (0.485, 0.456, 0.406)\n",
            "\tstd: (0.229, 0.224, 0.225)\n",
            "\tcrop_pct: 0.875\n",
            "Applying test time pooling to model\n",
            "Model ig_resnext101_32x8d-300-ttp created, param count: 88791336. Running...\n",
            "GPU memory free: 13946, memory used: 1134\n",
            "GPU memory free: 9288, memory used: 5792\n",
            "Test: [  20/79]  Time: 2.850 (3.122) Rate: 41.006 img/sec Prec@1: 75.0000 (79.3155) Prec@5: 93.7500 (94.8661)\n",
            "Test: [  40/79]  Time: 2.855 (2.989) Rate: 42.826 img/sec Prec@1: 64.8438 (78.6966) Prec@5: 87.5000 (94.3979)\n",
            "Test: [  60/79]  Time: 2.856 (2.945) Rate: 43.463 img/sec Prec@1: 74.2188 (76.2295) Prec@5: 89.0625 (93.0456)\n",
            " * Prec@1 75.170 (24.830) Prec@5 92.660 (7.340) Rate 43.622\n",
            "Model ig_resnext101_32x8d-300-ttp done.\n",
            "\n"
          ],
          "name": "stderr"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "URXdMbNaOYtq",
        "colab_type": "text"
      },
      "source": [
        "# Results\n",
        "\n",
        "We're going walk through the results and look at several things:\n",
        "\n",
        "1. A look at the Top-1 accuracy % across all the models\n",
        "2. Parameter efficiency\n",
        "3. Model throughput (images/sec)\n",
        "4. (Practical) GPU memory usage in PyTorch\n",
        "5. A comparison of model-model pairings\n",
        "6. ImageNet-V2 generalization"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "MvVqWbobe9Jo",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Setup common charting variables\n",
        "import numpy as np\n",
        "import matplotlib\n",
        "import matplotlib.pyplot as plt\n",
        "matplotlib.rcParams['figure.figsize'] = [16, 10]\n",
        "\n",
        "def annotate(ax, xv, yv, names, xo=0., yo=0., align='left'):\n",
        "    for i, (x, y) in enumerate(zip(xv, yv)):\n",
        "        ax1.text(x + xo, y + yo, names[i], fontsize=9, ha=align)\n",
        "\n",
        "names_all = list(results.keys())\n",
        "names_effnet = list(results_effnet.keys())\n",
        "names_effnet_tf = list(results_effnet_tf.keys())\n",
        "names_resnet = list(results_resnet.keys())\n",
        "names_resnet_ttp = list(results_resnet_ttp.keys())\n",
        "\n",
        "acc_all = np.array([results[m]['top1'] for m in names_all])\n",
        "acc_effnet = np.array([results[m]['top1'] for m in names_effnet])\n",
        "acc_effnet_tf = np.array([results[m]['top1'] for m in names_effnet_tf])\n",
        "acc_resnet = np.array([results[m]['top1'] for m in names_resnet])\n",
        "acc_resnet_ttp = np.array([results[m]['top1'] for m in names_resnet_ttp])"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "P9vtQbVa48kW",
        "colab_type": "text"
      },
      "source": [
        "#  Top-1 accuracy\n",
        "\n",
        "We'll start by ranking the models by Top-1 accuracy on the ImageNet-V2 validation set. \n",
        "\n",
        "You'll notice that a well trained\n",
        "* ResNet-50 is holding it's own against an EfficientNet-B1, much closer to that than the B0 it's paired with in the paper\n",
        "* SE-ResNeXt50-32x4d can best the B2 and B3\n",
        "* SE-ResNeXt101-32x4d is very close to the B4.\n",
        "\n",
        "The ResNeXt101-32x8d pretrained on Facebook's Instagram is in a class of it's own. Somewhat unfairly since pretrained on a larger dataset. However, since it generalizes better than any model I've seen to this dataset (see bottom) and runs faster with less memory overehead than the EfficientNet-B4 (despite it's 88M parameters), I've included it."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "MjM-eMtSalDS",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 340
        },
        "outputId": "3bdd1164-4395-47f8-d090-f4c5868027c6"
      },
      "source": [
        "print('Results by top-1 accuracy:')\n",
        "results_by_top1 = list(sorted(results.keys(), key=lambda x: results[x]['top1'], reverse=True))\n",
        "for m in results_by_top1:\n",
        "  print('  Model: {:34}, Top-1 {:4.2f}, Top-5 {:4.2f}, Rate: {:4.2f}'.format(m, results[m]['top1'], results[m]['top5'], results[m]['rate']))"
      ],
      "execution_count": 10,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Results by top-1 accuracy:\n",
            "  Model: ig_resnext101_32x8d-300-ttp       , Top-1 75.17, Top-5 92.66, Rate: 43.62\n",
            "  Model: ig_resnext101_32x8d-224           , Top-1 73.83, Top-5 92.28, Rate: 83.35\n",
            "  Model: gluon_seresnext101_32x4d-300-ttp  , Top-1 71.99, Top-5 90.10, Rate: 74.87\n",
            "  Model: tf_efficientnet_b4-380            , Top-1 71.34, Top-5 90.11, Rate: 69.10\n",
            "  Model: gluon_seresnext101_32x4d-260-ttp  , Top-1 71.14, Top-5 89.47, Rate: 95.84\n",
            "  Model: gluon_seresnext50_32x4d-300-ttp   , Top-1 70.47, Top-5 89.18, Rate: 104.69\n",
            "  Model: gluon_seresnext101_32x4d-224      , Top-1 70.01, Top-5 88.91, Rate: 131.57\n",
            "  Model: gluon_seresnext50_32x4d-260-ttp   , Top-1 69.67, Top-5 88.62, Rate: 126.52\n",
            "  Model: gluon_seresnext50_32x4d-224       , Top-1 68.67, Top-5 88.32, Rate: 150.43\n",
            "  Model: tf_efficientnet_b3-300            , Top-1 68.52, Top-5 88.70, Rate: 119.13\n",
            "  Model: efficientnet_b2-260               , Top-1 67.80, Top-5 88.20, Rate: 144.20\n",
            "  Model: resnet50-260-ttp                  , Top-1 67.63, Top-5 87.63, Rate: 135.92\n",
            "  Model: efficientnet_b1-240               , Top-1 67.55, Top-5 87.29, Rate: 151.63\n",
            "  Model: tf_efficientnet_b2-260            , Top-1 67.40, Top-5 87.58, Rate: 142.73\n",
            "  Model: resnet50-240-ttp                  , Top-1 67.02, Top-5 87.04, Rate: 154.35\n",
            "  Model: resnet50-224                      , Top-1 66.81, Top-5 87.00, Rate: 159.51\n",
            "  Model: dpn68b-224                        , Top-1 65.60, Top-5 85.94, Rate: 155.15\n",
            "  Model: efficientnet_b0-224               , Top-1 64.58, Top-5 85.89, Rate: 165.73\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ENtozBUwwdO-",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 745
        },
        "outputId": "c0834583-d1f3-4976-9c7b-c54ef4520e79"
      },
      "source": [
        "sort_ix = np.argsort(acc_all)\n",
        "acc_sorted = acc_all[sort_ix]\n",
        "acc_min, acc_max = acc_sorted[0], acc_sorted[-1]\n",
        "names_sorted = np.array(names_all)[sort_ix]\n",
        "fig = plt.figure()\n",
        "ax1 = fig.add_subplot(111)\n",
        "ix = np.arange(len(acc_sorted))\n",
        "ix_effnet = ix[np.in1d(names_sorted[ix], names_effnet)]\n",
        "ix_effnet_tf = ix[np.in1d(names_sorted[ix], names_effnet_tf)]\n",
        "ix_resnet = ix[np.in1d(names_sorted[ix], names_resnet)]\n",
        "ix_resnet_ttp = ix[np.in1d(names_sorted[ix], names_resnet_ttp)]\n",
        "ax1.bar(ix_effnet, acc_sorted[ix_effnet], color='r', label='EfficientNet')\n",
        "ax1.bar(ix_effnet_tf, acc_sorted[ix_effnet_tf], color='#8C001A', label='TF-EfficientNet')\n",
        "ax1.bar(ix_resnet, acc_sorted[ix_resnet], color='b', label='ResNet')\n",
        "ax1.bar(ix_resnet_ttp, acc_sorted[ix_resnet_ttp], color='#43C6DB', label='ResNet + TTP')\n",
        "plt.ylim([math.ceil(acc_min - .3*(acc_max - acc_min)),\n",
        "          math.ceil(acc_max + .3*(acc_max - acc_min))])\n",
        "ax1.set_title('Top-1 Comparison')\n",
        "ax1.set_ylabel('Top-1 Accuracy (%)')\n",
        "ax1.set_xlabel('Network Architecture')\n",
        "ax1.set_xticks(ix)\n",
        "ax1.set_xticklabels(names_sorted, rotation='45', ha='right')\n",
        "ax1.legend()\n",
        "plt.show()"
      ],
      "execution_count": 11,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "image/png": "iVBORw0KGgoAAAANSUhEUgAAA7AAAALYCAYAAABFbR5BAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzs3XuUXmV9N/zvzwSECMUKoRaRF3yt\nFghhCIEaKZYzgoIPLXZVgRcECvi09S3FVGhB0YIiB1EoIlSwHpBiUbBV7BuxUvHAIYTIo4AQNViC\nkghFIYQQ8Hr/mMk4OU2GkHsmO34+a83K7Gvvfe3ffc9aLr9ch12ttQAAAMC67gVjXQAAAACMhAAL\nAABAJwiwAAAAdIIACwAAQCcIsAAAAHSCAAsAAEAnCLAAwKCq2q+qvjvWdQDAygiwAHRSVT0x5OdX\nVbVoyPERa/lZL6qqL1TVA1XVquo1I7jnDVX1zYF65lfVf1bVQWuzrl5ord3YWtt5rOsAgJURYAHo\npNbaJkt/kvwkySFD2q5a249LclOStyT5n9VdPBCgP5vkn5JsleR3k5yd5E1rua61qqrGj3UNADAc\nARaA9VJVbVxVl1TVT6vqwao6r6o2GDj3+qqaU1XvrapHq+rHVfXmVfXVWnuytXZRa+3bSX61mueO\nT3JBktNba59srf2ytfZsa+1rrbWTBq4ZN/Dsn1TVw1V1ZVVtOnDu96vqmao6rqrmVdUjVXVsVb22\nqr5XVY9V1YeGPO+kgdHdy6rql1V1d1W9bsj5E6vq3qp6fOAzHzvk3NLv4YyqejjJpUvbhlxzxsB3\n+Muquqeq9nwO3+/fVdWCgc+xVkfFAfjNJMACsL56b5LJSXZKsmuSvZL87ZDz2ybZMMlLk/x5kk9W\n1XZr4bmTkvxOkmuHuebEJH+aZM8kv5dkyyQfGnJ+3EDtr0jytiQXJzll4DNMTvK2qvqDIde/Lsl3\nk2ye5Jwk11fVbw2c+2mSg5L8VpKTklxSVTsOuXfbJBskeXmSdwwtsqp2Hnh+X5LNkrwhyYMDp1f3\n/f5fSSr9I9B/meRjVbXJMN8JAKyWAAvA+uqIJO9prf28tfZwkrOSHDXk/DNJ3ttae7q1dmOSG5Mc\nvhaeu3n6pxw/vJrazmutPdBa+2WSv09yRFXVkGve11pb3Fr7t4HjTw18lp8k+XaSXYZc+9+ttY+2\n1pa01j6V/pB5YJK01v6ttfbj1u/GJP+V5A+H3Ls4yT8MfA+LlqvzmSQbJ9khybjW2o9aaz8e8hmG\n+36fTPKBgZquG/hOXjnMdwIAqyXAArDeGQiCL03ywJDmB5K8bMjxgtbaU8ud36qqXjVkM6ifr8Hj\nH0n/yOPvDHPNViupbeMkLxk4fra19siQ84uybCBelGToaOaDWdYDA89IVR1aVbcNTJV+LMk+SbYY\ncu3PWmtLVlZka+37SU5N//rd+VV1VVX9znP4fodOt35yuZoB4DkTYAFY77TWWpKfpX8a61LbJJk3\n5HiLqtpoufMPtdbuG7IZ1NCgN1LfS3/Y/JNhrnloJbUtSvLoGjwvSbZe7nibJA9V1YuS/GuSf0iy\nZWvtxUn+M/0Be6k2XMcD63hfm/7pzBslOWuE3y8ArHUCLADrq6uTvKeqNq+qLdM/TfczQ85vkOSM\nqtqwqvZJsn+Sz6+qs6p64ZDAu+Fy4XdQa+2ZJO9MclZVHVVVm1bVC6rqj6rqo0Nqe2dVbTOwedNZ\nST47EAzXxMsHNnMaX1VHpn8964z0j+pukGR+kl9V1aHpX6s6IlW1w0DdL0x/wF6UX29itbrvFwDW\nOtvlA7C+enf6dwP+fvpD178kOXfI+bnpX+P5syS/TPK21tqPhunvgfx6WvB/JUlV/W5r7WfLX9ha\n+8zAdN3Tklya/umz30vywYFLLk3/FNxvp38jqRuS/M1z/oS/9o30r4l9NP2joH/cWvvFQI3vTPLv\n6Q+y1w08a6Q2Tv93+OokSwaes7TO1X2/ALDW1Zr/x14A6Kaqen2Sf2ytdX5Toao6KcnhrbX9xroW\nAOg1U4gBAADoBAEWAACATjCFGAAAgE4wAgsAAEAnCLAAAAB0Qideo7PFFlu0bbfddqzLAAAAoAfu\nuOOOn7fWJq7uuk4E2G233TYzZ84c6zIAAADogap6YCTXmUIMAABAJwiwAAAAdIIACwAAQCd0Yg0s\nAADA8pYsWZIHH3wwTz311FiXwghttNFG2XrrrbPBBhus0f0CLAAA0EkPPvhgNt1002y77bapqrEu\nh9VoreWRRx7Jgw8+mO22226N+jCFGAAA6KSnnnoqm2++ufDaEVWVzTff/HmNmAuwAABAZwmv3fJ8\n/14CLAAAwBoaN25c+vr6Bn/OOeecJMnNN9+cHXfcMX19fVm0aFGmT5+eHXfcMdOnT8/HPvaxfOpT\nn1plnw899FAOP/zwNa7pwx/+cJ588snB42233TZ/8id/Mnh87bXX5phjjhm2j9mzZ+eGG25Y4xp6\nxRpYAABg/bC2R2NbW+0lG2+8cWbPnr1C+1VXXZXTTjstRx55ZJLk8ssvz6OPPppx48atts+tttoq\n11577XOvd8CHP/zhHHnkkZkwYcJg2x133JG77747O+yww4j6mD17dmbOnJmDDz54jevoBSOwAAAA\na9HHP/7xfO5zn8sZZ5yRI444IoceemieeOKJ7Lrrrrnmmmty5pln5vzzz0+SzJkzJ/vtt1923nnn\nTJkyJT/84Q8zd+7cTJo0KUny7LPPZvr06dltt90yefLkXHbZZUmSm266KXvttVcOP/zw/P7v/36O\nOOKItNZy0UUX5aGHHsree++dvffee7CmU045JWefffYKtS5cuDDHHntsdt999+yyyy754he/mKef\nfjrvfve7c80116Svry/XXHPNKHxrI2MEFgAAYA0tWrQofX19g8ennXZajj/++Hzzm9/MG9/4xsGp\nwJtsssngSO2ZZ545eP0RRxyRU089NYcddlieeuqp/OpXv8r8+fMHz19xxRXZbLPNcvvtt2fx4sXZ\nY489csABByRJ7rzzznz/+9/PVlttlT322CPf+ta38o53vCMf+tCH8vWvfz1bbLHFYD9/+qd/mo9+\n9KOZM2fOMvWfffbZ2WeffXLllVfmsccey+6775799tsv73vf+zJz5sz84z/+41r/zp4PARYAAGAN\nrWoK8Ug8/vjjmTdvXg477LAk/e9IXd6MGTNy1113DU4p/sUvfpH7778/G264YXbfffdsvfXWSZK+\nvr7MnTs3f/iHf7jSZ40bNy7Tp0/PBz7wgRx00EHL9P9v//ZvgyPCTz31VH7yk5+s0ecZDQIsAADA\nOqq1losvvjgHHnjgMu033XRTXvjCFw4ejxs3Ls8888ywfR111FH5wAc+MDg9eWn/n//85/PqV796\nmWtvvfXWtVD92mcNLAAAwBjYdNNNs/XWW+f6669PkixevHiZ3YOT5MADD8yll16aJUuWJEnuu+++\nLFy4cLX9Pv744yu0b7DBBjn55JNz4YUXLtP/xRdfnDawYdWdd945bB9jTYAFAABYQ0vXwC79OfXU\nU5/T/Z/+9Kdz0UUXZfLkyXnta1+bn/3sZ8ucP/7447PDDjtkypQpmTRpUk488cTVjrSecMIJef3r\nX7/MJk5LHXfcccvcf8YZZ2TJkiWZPHlydtxxx5xxxhlJkr333jt33333OreJU7URbA091qZOndpm\nzpw51mUAAADrkHvuuSfbb7/9WJfBc7Syv1tV3dFam7q6e43AAgAA0AkCLAAAAJ0gwAIAANAJAiwA\nAACdIMACAADQCQIsAAAAnSDAAgAArIFHHnlk8P2vL33pS/Oyl71s8Liqlnk/7Ny5c1e4/5hjjsl2\n2203eM1rX/vaJMnixYuz3377Db6D9eabb86OO+6Yvr6+zJs3L4cffviwdR1//PG5++671+gz3XTT\nTfn2t789eHzmm
            "text/plain": [
              "<Figure size 1152x720 with 1 Axes>"
            ]
          },
          "metadata": {
            "tags": []
          }
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jpI0KfoM5lCZ",
        "colab_type": "text"
      },
      "source": [
        "# Parameter Efficiency\n",
        "\n",
        "No surprises here, exactly as per the EfficientNet paper, they are in a class of their own in terms of parameter efficiency.\n",
        "\n",
        "The test time pooling effectively increases the parameter efficiency of the ResNet models, but at the cost of both throughput and memory efficency (see later graphs).\n",
        "\n",
        "I'm not going to repeat the FLOP differences as there are again no surprises, same as paper barring differences in the models being compare to. If you are looking at FLOP counts for the EfficientNet models, do keep in mind, their counts appear to be for inference optiized models with the BatcNorm layers fused. The counts will be higher if you're working with trainable models that still have their BN layers. You can see some counts I did on ONNX optimized models here (https://github.com/rwightman/gen-efficientnet-pytorch/blob/master/BENCHMARK.md)"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "iE69A1asS4_n",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 621
        },
        "outputId": "ee70eb92-8618-42a5-a5af-d584821f471e"
      },
      "source": [
        "params_effnet = np.array([results[m]['param_count'] for m in names_effnet])\n",
        "params_effnet_tf = np.array([results[m]['param_count'] for m in names_effnet_tf])\n",
        "params_resnet = np.array([results[m]['param_count'] for m in names_resnet])\n",
        "params_resnet_ttp = np.array([results[m]['param_count'] for m in names_resnet_ttp])\n",
        "\n",
        "fig = plt.figure()\n",
        "ax1 = fig.add_subplot(111)\n",
        "ax1.scatter(params_effnet, acc_effnet, s=10, c='r', marker=\"s\", label='EfficientNet')\n",
        "ax1.plot(params_effnet, acc_effnet, c='r')\n",
        "annotate(ax1, params_effnet, acc_effnet, names_effnet, xo=-.5, align='right')\n",
        "\n",
        "ax1.scatter(params_effnet_tf, acc_effnet_tf, s=10, c='#8C001A', marker=\"v\", label='TF-EfficientNet')\n",
        "ax1.plot(params_effnet_tf, acc_effnet_tf, c='#8C001A')\n",
        "annotate(ax1, params_effnet_tf, acc_effnet_tf, names_effnet_tf, xo=.5, align='left')\n",
        "\n",
        "ax1.scatter(params_resnet, acc_resnet, s=10, c='b', marker=\"o\", label='ResNet')\n",
        "ax1.plot(params_resnet, acc_resnet, c='b')\n",
        "annotate(ax1, params_resnet, acc_resnet, names_resnet, xo=0.5, align='left')\n",
        "\n",
        "ax1.scatter(params_resnet_ttp, acc_resnet_ttp, s=10, c='#43C6DB', marker=\"x\", label='ResNet TTP')\n",
        "ax1.plot(params_resnet_ttp, acc_resnet_ttp, c='#43C6DB')\n",
        "annotate(ax1, params_resnet_ttp, acc_resnet_ttp, names_resnet_ttp, xo=0.3, align='left')\n",
        "\n",
        "ax1.set_title('Top-1 vs Parameter Count')\n",
        "ax1.set_ylabel('Top-1 Accuracy (%)')\n",
        "ax1.set_xlabel('Parameters (Millions)')\n",
        "ax1.legend()\n",
        "plt.show()"
      ],
      "execution_count": 12,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "image/png": "iVBORw0KGgoAAAANSUhEUgAABB4AAAJcCAYAAABe5mduAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzs3Xl8VNX5x/HPk8meTELIoiIiiBsg\nMSAoiAsiEFSKdf25oCJCtbXFBaNQcWmLO4iiiFrcihut1qVWQwRFAatsUjdUUIMC6iSBkMmeTM7v\nj5mkAcIWgUng+3698iL33HvPfe6dAJlnznmOOecQEREREREREdkdIsIdgIiIiIiIiIjsvZR4EBER\nEREREZHdRokHEREREREREdltlHgQERERERERkd1GiQcRERERERER2W2UeBARERERERGR3UaJBxER\nERERERHZbZR4EBGRFs/MSht91ZlZRaPti3fxtRLM7J9mttrMnJn12ZX9b+WaV5lZbeh+SsxsmZkN\n2d3X/aXM7Egzq90N/caa2UQz+yb0TPLN7K9mdtCuvtZm1x1iZqt25zVERET2RUo8iIhIi+ecS6z/\nAr4HftWo7bldfTlgHnAhsGEX970t80L3lwK8APzDzBJ3pgMzizCzVvN/u5lFNtFmwKvAIOA8IBno\nAXwO9N+T8YmIiMiu0Wp+OREREdkaM4szs2lm9qOZrTGz+8wsKrRviJmtMrM/mdl6M/vOzM7bWl/O\nuXLn3FTn3AdA3Xaue5mZLdisbbyZ/T30/Zlm9qWZ+c3sBzMbs717cc4FgCeBRKCjmaWb2VtmVhCK\n/zUzO6DR9T40sz+b2UdAOdDOzK5sdN1VZjay0fH1z2OCmRWa2VozOz0U6zdmVmRmYxsd7zGzW8zs\n29Dxz5lZm9Du9wFPo9EnPULnXGlmX4Xi/beZHRhqjw2NIvmtmX0DfNbEIzgDOBE40zm3zDkXcM5t\ncM494JybGeqng5m9Ger/azO7rFG8L5rZhM3vt9H2T2Z2nZl9ZmYbQ/cTbWapwCvAIY3uJ3V7r5eI\niIhsnxIPIiKyN/gTkAl0B44h+Mn4jY32dwSigf2B0cAzZtZpF1z3FaCnmXVo1HYR8Hzo+yeBS51z\nXiALmL+9DkOjAK4ANgLfEfy/+lGgA1Af85TNThsOXAp4gZ+AH4HTgCTgKmCamXVrdHxHoIbg87g7\nFOe5BJ/hQOCO+mQBcAMwGDgBaB86r/76JwGBRqNPPjaz/wOuBX4F7Ad8DDy7WbxDCb5OPZp4BAOB\nBc65n7b6kOAfwFfAAQSf9xQz67eN4zd3LnAqcChwHHCRc64IOAv4ttH9FO1EnyIiIrIVSjyIiMje\n4GLgNudcoXPuZ2AicEmj/bXAn5xz1c65OcAcgm8+fxHnXAnwJnABgJl1J/jm/M3QIQGgm5l5nXNF\nzrmPt9HdyWZWTDBxcCbwa+dcmXPuZ+fca865CufcRuAu4OTNzp3hnPvKOVfjnKt1zr3unPvOBc0B\n3iOYOKhXBtznnKsFXiSYIJgUut7HwDcEkzgQTFyMc86tc85VEkzy/F9oSkRTrgImOue+ds7VhI4/\nwcz2a3TMHc65YudcRRPnpxJMnDTJzA4Djgb+6Jyrcs4tAZ5h09d7e6aEnmsBwdcqayfOFRERkZ2k\nxIOIiLRqoTfA+wOrGzWvBg5stF0QetPceH87Mzu80bD6wmaG8DzBehAQ/PT9JedcdWj7TOAc4Hsz\ne8fMem+jn/ecc22cc2nOuX7OuXmh+/Oa2ZNm9r2ZlQB5QNpm5/7QeMPMhpnZotBUhGJgwGbnFDjn\n6qeR1L/5/7nR/gogMfRsDwLeNLPiUF8fE/z9YWvTEA4GHm10fAHBxE/7rcW7mSKCIxm2pl0o/sZJ\ni81f7+1pPJqinOC0FhEREdlNlHgQEZFWzTnnCL6RPLhRcwdgbaPtNDOL3Wz/utCn8vXD6jd/M7+j\n3gQ6mVkXgiMf6qdZ4Jz7j3NuKMERBXmN9+2EcQTftPd2ziURnPaw+WgDV/+NmSUQnIrwFyDDOdcG\neKeJc7Yr9GzXAgNCSZH6r1jnXGHj6zbyAzBis+PjnHNLm4q3CXOAfpuNkGhsHZBuZnGN2hq/3mVA\nfKN9+2/jWpvbVlwiIiLSTEo8iIjI3uAF4DYzSzWzDOBmNq0rEAXcEioiOIDgigkvb60zM4tplKiI\n3ixpsYnQSIpXgKmh67wX6iPBzC4wsySCdRH8bKdY5VZ4CX4qX2xmacCE7RwfF4rDB9SZ2TB+2WoQ\njwJ3W2gpSzPLMLNfhfb5CBaX7LDZ8RPM7IjQ8Slmds5OXO/fwELgVTPLChW3TDaz35vZJcAq4FNg\nYuh16glcxv9e7+XAUDNrE6pT8YeduPbPQIbt5GoiIiIism1KPIiIyN7gVuALgksuLif4xvXeRvvz\nCQ73/4lgIcXLnXPfbqO/1QSnG6QSTCRUmNm2Pjl/nmBRxFmNpjAAjAz1tZFg8cdLd/yWGkwiOE2i\nCFjA/+pHNCk0EuEG4F+hc369vXO2416CoxDeMTM/8AHQM3StDaH9S0NTK7Kccy8ADwP/DE0NWU4w\n0bNDQqMsziQ4SuOfQAnwX+Ao4J3Q/vOArgRfz1lAjnOufnWRJwkmJ74H3iCYlNpR/wVeB1aH7qft\nTpwrIiIiW2HB/79FRET2TmY2BHjYOXdouGMRERER2RdpxIOIiIiIiIiI7DZKPIiIiIiIiIjIbqOp\nFiIiIiIiIiKy22jEg4iIiIiIiIjsNpHhDmBPS0tLcx07dgx3GCIiIiIiIrIbLF26tNA5lx7uOOR/\n9rnEQ8eOHVmyZEm4wxAREREREZHdwMxWhzsG2ZSmWoiIiIiIiIjIbqPEg4iIiIiIiIjsNko8iIiI\niIiIiMhus8/VeGhKTU0Na9asobKyMtyhyA6KjY2lffv2REVFhTsUERERERER2QYlHoA1a9bg9Xrp\n2LEjZhbucGQ7nHMUFRWxZs0aOnXqFO5wREREREREZBs01QKorKwkNTVVSYdWwsxITU3VCBURERER\nEZFWQImHECUdWhe9XiIiIiIiIq2DEg8iIiIiIiIistso8dBCeDwesrKyGr7uvvtuAObPn0+3bt3I\nysqioqKCnJwcunXrRk5ODo8++ih/+9vfttrnunXrOPfcc5sd0wMPPEB5eXnDdseOHTnnnHMatl96\n6SVGjBixzT6WL1/Om2++2ewYREREREREpHVTcckWIi4ujuXLl2/R/txzzzF+/HiGDx8OwOOPP876\n9evxeDzb7bNdu3a89NJLzY7pgQceYPjw4cTHxze0LV26lC+++IKuXbvuUB/Lly9nyZIlnH766c2O\nQ0RERERERFovjXhowWbMmMHf//53brnlFi6++GKGDRtGaWkpxxxzDLNmzeL2229n0qRJAKxatYqB\nAwdy9NFH07NnT7755hvy8/M56qijAAgEAuTk5NC7d28yMzN57LHHAJg3bx79+/fn3HPP5cgjj+Ti\niy/GOcfUqVNZt24dp5xyCqecckpDTGPHjuWOO+7YItaysjJGjhzJscceS48ePXjttdeorq7m1ltv\nZdasWWRlZTFr1qw98NRERERERESkJdGIh+ZISgK//3/bXi+UlPyiLisqKsjKymrYHj9+PKNGjWLB\nggUMHTq0YcpEYmJiw8iI22+/veH4iy++mHHjxnHWWWdRWVlJXV0dPp+vYf8TTzxBcnIyixcvpqqq\nin79+jF48GAAPv74Yz7//HPatWtHv379WLhwIWPGjOH+++/n3XffJS0traGf888/n0ceeYRVq1Zt\nEv8dd9zBgAEDePLJJykuLubYY49l4MCB/PnPf2bJkiU8/PDDv+j5iIiIiIiISOukxENzNE46NLXd\nDFubarFj4fhZu3YtZ511FgCxsbFbHJOXl8cnn3zSMPVi48aNrFy5kujoaI499ljat28PQFZWFvn5\n+ZxwwglNXsvj8
            "text/plain": [
              "<Figure size 1152x720 with 1 Axes>"
            ]
          },
          "metadata": {
            "tags": []
          }
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1VpRimNp5tuW",
        "colab_type": "text"
      },
      "source": [
        "# Image Throughput\n",
        "\n",
        "One of the first thing I noticed running batches through my first ported EfficientNet weights -- the image throughput does not scale with FLOP or parameter counts. Much larger ResNet, DPN, etc. models can match the throughput of EfficientNet models with far fewer parameters and FLOPS. I've trained on many of these models and training throughputs do -- in relative terms -- mirror the validation numbers here.\n",
        "\n",
        "This was surprising to me given the FLOP ratios. I'd like to see an in depth comparison with Tensorflow,  XLA enabled, targeted for both GPU and TPU."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "iapzkrt2gBwR",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 357
        },
        "outputId": "f47bfaa1-d78a-4d2f-a325-8fe247acc46a"
      },
      "source": [
        "print('Results by image rate:')\n",
        "results_by_rate = list(sorted(results.keys(), key=lambda x: results[x]['rate'], reverse=True))\n",
        "for m in results_by_rate:\n",
        "  print('  {:32} Rate: {:>6.2f}, Top-1 {:.2f}, Top-5: {:.2f}'.format(\n",
        "      m, results[m]['rate'], results[m]['top1'], results[m]['top5']))\n",
        "print()\n"
      ],
      "execution_count": 44,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Results by image rate:\n",
            "  efficientnet_b0-224              Rate: 165.73, Top-1 64.58, Top-5: 85.89\n",
            "  resnet50-224                     Rate: 159.51, Top-1 66.81, Top-5: 87.00\n",
            "  dpn68b-224                       Rate: 155.15, Top-1 65.60, Top-5: 85.94\n",
            "  resnet50-240-ttp                 Rate: 154.35, Top-1 67.02, Top-5: 87.04\n",
            "  efficientnet_b1-240              Rate: 151.63, Top-1 67.55, Top-5: 87.29\n",
            "  gluon_seresnext50_32x4d-224      Rate: 150.43, Top-1 68.67, Top-5: 88.32\n",
            "  efficientnet_b2-260              Rate: 144.20, Top-1 67.80, Top-5: 88.20\n",
            "  tf_efficientnet_b2-260           Rate: 142.73, Top-1 67.40, Top-5: 87.58\n",
            "  resnet50-260-ttp                 Rate: 135.92, Top-1 67.63, Top-5: 87.63\n",
            "  gluon_seresnext101_32x4d-224     Rate: 131.57, Top-1 70.01, Top-5: 88.91\n",
            "  gluon_seresnext50_32x4d-260-ttp  Rate: 126.52, Top-1 69.67, Top-5: 88.62\n",
            "  tf_efficientnet_b3-300           Rate: 119.13, Top-1 68.52, Top-5: 88.70\n",
            "  gluon_seresnext50_32x4d-300-ttp  Rate: 104.69, Top-1 70.47, Top-5: 89.18\n",
            "  gluon_seresnext101_32x4d-260-ttp Rate:  95.84, Top-1 71.14, Top-5: 89.47\n",
            "  ig_resnext101_32x8d-224          Rate:  83.35, Top-1 73.83, Top-5: 92.28\n",
            "  gluon_seresnext101_32x4d-300-ttp Rate:  74.87, Top-1 71.99, Top-5: 90.10\n",
            "  tf_efficientnet_b4-380           Rate:  69.10, Top-1 71.34, Top-5: 90.11\n",
            "  ig_resnext101_32x8d-300-ttp      Rate:  43.62, Top-1 75.17, Top-5: 92.66\n",
            "\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Y2bawRNtfFmH",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 621
        },
        "outputId": "ba888805-c714-4fdf-9ea8-b84d6016b296"
      },
      "source": [
        "rate_effnet = np.array([results[m]['rate'] for m in names_effnet])\n",
        "rate_effnet_tf = np.array([results[m]['rate'] for m in names_effnet_tf])\n",
        "rate_resnet = np.array([results[m]['rate'] for m in names_resnet])\n",
        "rate_resnet_ttp = np.array([results[m]['rate'] for m in names_resnet_ttp])\n",
        "\n",
        "fig = plt.figure()\n",
        "ax1 = fig.add_subplot(111)\n",
        "ax1.scatter(rate_effnet, acc_effnet, s=10, c='r', marker=\"s\", label='EfficientNet')\n",
        "ax1.plot(rate_effnet, acc_effnet, c='r')\n",
        "annotate(ax1, rate_effnet, acc_effnet, names_effnet, xo=.5, align='left')\n",
        "\n",
        "ax1.scatter(rate_effnet_tf, acc_effnet_tf, s=10, c='#8C001A', marker=\"v\", label='TF-EfficientNet')\n",
        "ax1.plot(rate_effnet_tf, acc_effnet_tf, c='#8C001A')\n",
        "annotate(ax1, rate_effnet_tf, acc_effnet_tf, names_effnet_tf, xo=-.5, yo=-.2, align='right')\n",
        "\n",
        "ax1.scatter(rate_resnet, acc_resnet, s=10, c='b', marker=\"o\", label='ResNet')\n",
        "ax1.plot(rate_resnet, acc_resnet, c='b')\n",
        "annotate(ax1, rate_resnet, acc_resnet, names_resnet, xo=.3, align='left')\n",
        "\n",
        "ax1.scatter(rate_resnet_ttp, acc_resnet_ttp, s=10, c='#43C6DB', marker=\"x\", label='ResNet TPP')\n",
        "ax1.plot(rate_resnet_ttp, acc_resnet_ttp, c='#43C6DB')\n",
        "annotate(ax1, rate_resnet_ttp, acc_resnet_ttp, names_resnet_ttp, xo=0., yo=0., align='center')\n",
        "\n",
        "ax1.set_title('Top-1 vs Rate')\n",
        "ax1.set_ylabel('Top-1 Accuracy (%)')\n",
        "ax1.set_xlabel('Rate (Images / sec)')\n",
        "ax1.legend()\n",
        "plt.show()"
      ],
      "execution_count": 48,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "image/png": "iVBORw0KGgoAAAANSUhEUgAAA+AAAAJcCAYAAAB5WM7HAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzs3Xt8j/X/x/HHe59tttmaw4bNLMs6\nOM0wc5hYcqjE91s6x5eKiBIyEYqikvNEIVJRpBO/ioavfVHJqTlrJsPmMKc5m5nr98dn+9hyWprN\ntuf9dnOz6/S+Xtdnbrd6fd7X9byMZVmIiIiIiIiIyI3lVNAFiIiIiIiIiBQHasBFRERERERE8oEa\ncBEREREREZF8oAZcREREREREJB+oARcRERERERHJB2rARURERERERPKBGnARERERERGRfKAGXERE\nihVjzMlsfy4YY85kW346j89V0hjzjTFmlzHGMsY0yMvxr3DObsaY85nXc9wYs84Y0+pvHD/bGDPo\nRtYoIiJSXKkBFxGRYsWyLM+sP8BuoE22dbPy+nRALPAkcDSPx76a2MzrKwV8Asw1xpTMx/OLiIjI\nZagBFxERycYY426MmWiM2WeMSTLGjDTGuGRuu88Yk2CMGWqMOWKM2WmMefRKY1mWddqyrGjLsn4B\nLlzjvB2NMSv+sm6AMebLzJ//ZYzZZow5YYzZY4zpea1rsSzrAvAZ4AXcljmOszHma2PMAWNMqjFm\nqTHmzsxtPYF2wODMGfS5mesrGWPmGWMOGWP+NMZ0u9a5RURE5FJqwEVERHIaCoQANYG6QCTQL9v2\nyoArUAHoAnxijAnKg/N+C9QxxgRmW/cU8Hnmz9OB/1iW5QWEAsuvNaAxxhl4BjgLJGXbNA+ogv0a\ntmGfJceyrGjga+CtzDsCHjXG2IAfgV8Af+A+4DVjTNPrvVAREZHiSg24iIhITk8Db1iWdciyrAPA\nMKBDtu3ngaGWZZ2zLGsxsBh45J+e1LKs49gb3ScAjDE1gYDMdQAZQHVjjJdlWYcty/r9KsM1Ncak\nAmeAN4EnLcs6mnme85ZlfWpZ1knLss5i/8Ih3BjjdoWxGgNulmWNyLzmeODjrDpFREQk99SAi4iI\nZDLGGOyzwruyrd4FVMy2fDCzcc2+3d8Yc0e2MLdD11nC59ifFwf77PdXlmWdy1z+F/bbw3cbY/5r\njKl3lXH+Z1lWKaAsEANEZG3IvAV9dOat5Mexz4CbzH0v51agcubt6qmZjX0f7J+TiIiI/A1qwEVE\nRDJZlmUB+7E3nVkCgeRsyz5/mS0OBPZalhWfLczN5zpL+BEIMsZUxT7DnHX7OZZl/WpZ1oNAeexN\n9eeXHyLH9RwHugHdjDHVMlc/A7QA7gG8gbsy15usw/4yzB5gm2VZpbL98bIs66HrukIREZFiTA24\niIhITl8AbxhjyhpjygEDgZnZtrtgDylzNcY0w97Mfn2lwYwxJbI17K5XudWbzJn1b4HozPP8L3OM\nksaYJ4wxtwDpwAmuEeqWbcwD2J/xHpy5ygv7M+GHgZLYb7HP7gCZgW2ZVmTW0MsY45Y5gx5ijKmT\nm/OLiIjIRWrARUREcnod2AJsBuKAn4H3sm1PxP4c+H7swWjPWJb151XG24X9Weyy2BvqM8aYq92+\n/TnQHJiTmWKe5dnMsY4B/8n8k1tjgHaZaefTgIOZ9W8ks8HOZgpQL/N289mWZaUDDwCNMs9/EPgA\n8Pwb5xcRERHA2O+2ExERkWsxxtwHvG9ZVnBB1yIiIiKFj2bARURERERERPKBGnARERERERGRfKBb\n0EVERERERETygWbARURERERERPKBc0EXkBs+Pj5W5cqVC7oMERERERERuQHWrl17yLIs34Ku40Yr\nFA145cqVWbNmTUGXISIiIiIiIjeAMWZXQdeQH3QLuoiIiIiIiEg+UAMuIiIiIiIikg/UgIuIiIiI\niIjkg0LxDLiIiIiIiMhfpaenk5SUxNmzZwu6FMklNzc3AgICcHFxKehSCoQacBERERERKZSSkpLw\n8vKicuXKGGMKuhy5BsuyOHz4MElJSQQFBRV0OQVCt6CLiIiIiEihdPbsWcqWLavmu5AwxlC2bNli\nfceCGnARERERESm01HwXLsX996UGXERERERERCQfqAEXERERERG5TjabjdDQUMefd999F4Dly5dT\nvXp1QkNDOXPmDFFRUVSvXp2oqCg+/PBDPv300yuOuXfvXh555JHrrmncuHGcPn3asVy5cmXatWvn\nWP7qq6/o1KnTVceIi4vjxx9/vO4a5PIUwiYiIiIiInKd3N3diYuLu2T9rFmzGDBgAO3btwdgypQp\nHDlyBJvNds0x/f39+eqrr667pnHjxtG+fXs8PDwc69auXcuWLVuoVq1arsaIi4tjzZo1PPDAA9dd\nh1xKM+AiIiIiIiJ56KOPPuLLL79k8ODBPP3007Rt25aTJ09St25d5syZw5AhQxg1ahQACQkJNG/e\nnFq1alGnTh127NhBYmIiNWrUACAjI4OoqCjq1atHSEgIkydPBiA2NpbIyEgeeeQR7rrrLp5++mks\nyyI6Opq9e/dyzz33cM899zhqeuWVVxg+fPgltZ46dYpnn32W8PBwateuzbx58zh37hyvv/46c+bM\nITQ0lDlz5uTDp1Y8aAZcRERERESKh1tugRMnLi57ecHx4/9oyDNnzhAaGupYHjBgAJ07d2bFihU8\n+OCDjlvJPT09HTPlQ4YMcez/9NNP079/fx566CHOnj3LhQsXSElJcWyfNm0a3t7erF69mrS0NCIi\nImjZsiUAv//+O5s3b8bf35+IiAh+/vlnevbsyZgxY1i6dCk+Pj6OcR577DEmTZpEQkJCjvqHDx9O\ns2bNmD59OqmpqYSHh9O8eXPefPNN1qxZw/vvv/+PPh/JSQ24iIiIiIgUD9mb78stX4cr3YKeu3JO\nkJyczEMPPQSAm5vbJfvExMSwYcMGxy3px44dY/v27bi6uhIeHk5AQAAAoaGhJCYm0rhx48uey2az\nERUVxTvvvMP999+fY/z58+c7ZuTPnj3L7t27r+t65NrUgIuIiIiIiNykLMtiwoQJtGrVKsf62NhY\nSpQo4Vi22WycP3/+qmN16NCBd955x3F7e9b4X3/9NXfeeWeOfX/77bc8qF7+Ss+Ai4iIiIiIFAAv\nLy8CAgL47rvvAEhLS8uRXg7QqlUrPvjgA9LT0wGIj4/n1KlT1xz3xGVm911cXOjduzdjx47NMf6E\nCROwLAuw39Z+tTHkn1EDLiIiIiIixYOX19WXr0PWM+BZf/r37/+3jv/ss8+Ijo4mJCSERo0asX//\n/hzbO3fuTLVq1ahTpw41atSga9eu15zpfv7557nvvvtyhLBlee6553IcP3jwYNLT0wkJCaF69eoM\nHjwYgHvuuYctW7YohC2PmaxvOm5mYWFh1po1awq6DBERERERuYls3bqVqlWrFnQZ8jdd7vdmjFlr\nWVZYAZWUbzQDLiIiIiIiIpIP8qQBN8ZUMMaMzouxbrTY2Fg2bNjgWB48eDC33norzZs3z7HfjBkz\naNSoEREREaxbtw6AHTt2ULduXTw9PVmxYsUVz3H8+HEaNWpEZGQk4eHhLFmyBIBPP/2U8PBwmjRp\nwhNPPEFaWtoVxzh69CgtW7akadOmRERE5Kg5y7Bhw5gxY8Yl68eMGUOTJk2IiIjgP//5j+N5kXXr\n1hEREUGjRo1yHHe5a80uNTWVTz/91LH8189QREREREREri1PGnDLsvZblvXK9RxrjLHlRQ259dfm\nsXv37ixdujTHPkePHiU6OprY2FhmzpxJz549AfDz82PRokWOd/ldiaenJ8uWLSM2NpbZs2c7ngNp\n3Lgxv/76K8uWLSMwMJCZM2decYxZs2YRERHB//73P4YPH87w4cNzfY0vvvgiy5Yt4+effwbsrxYA\neOmll5g5cyaxs
            "text/plain": [
              "<Figure size 1152x720 with 1 Axes>"
            ]
          },
          "metadata": {
            "tags": []
          }
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Y6-2sm9W50JB",
        "colab_type": "text"
      },
      "source": [
        "# GPU Memory Usage\n",
        "\n",
        "Measuring the 'practical' GPU memory consumption is a bit of a challenge. By 'practical', what I want to capture is relative GPU memory usage that indicates what the likely maximum batch sizes will be. With `cudnn.benchmark = True` set, the torch memory allocator metrics didn't prove reliable. In the end, using pynvml (same output as nvidia-smi) and taking a sample part way through the validation set is the most consistent. \n",
        "\n",
        "I've verified the sampling by pushing batch sizes for several of the models to the point where they fail with OOM exception. The relative measures of the memory usage match the relative batch sizes -- I can roughly predict where the largest batch size will be from the measure. \n",
        "\n",
        "On a T4 colab instance I pushed:\n",
        "- efficientnet_b2-260 to a batch size of 480\n",
        "- tf_efficientnet_b2-260 to a batch size 448 (failed at 480)\n",
        "- ig_resnext101_32x8d-224 to a batch size of 512\n",
        "\n",
        "Overall, the EfficientNets are not particularly memory efficient. The monster ResNext101-32x8d with 88M params is more memory efficient at 224x224 than the EfficientNet-B2 at 260x260 with 9.1M. This is especially true for the 'tf' variants with the 'SAME' padding hack enabled, there is up to a 20% penalty for this in memory churn that does impact the max useable batch size."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Qmr4J7-EgifY",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 340
        },
        "outputId": "d8d0db4a-ccca-4ac2-85e6-011535f29c1e"
      },
      "source": [
        "print('Results by GPU memory usage:')\n",
        "results_by_mem = list(sorted(results.keys(), key=lambda x: results[x]['gpu_used'], reverse=False))\n",
        "for m in results_by_mem:\n",
        "  print('  {:32} Mem: {}, Rate: {:>6.2f}, Top-1 {:.2f}, Top-5: {:.2f}'.format(\n",
        "      m, results[m]['gpu_used'], results[m]['rate'], results[m]['top1'], results[m]['top5']))"
      ],
      "execution_count": 46,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Results by GPU memory usage:\n",
            "  resnet50-224                     Mem: 1530, Rate: 159.51, Top-1 66.81, Top-5: 87.00\n",
            "  gluon_seresnext50_32x4d-224      Mem: 1670, Rate: 150.43, Top-1 68.67, Top-5: 88.32\n",
            "  gluon_seresnext101_32x4d-224     Mem: 1814, Rate: 131.57, Top-1 70.01, Top-5: 88.91\n",
            "  resnet50-240-ttp                 Mem: 2084, Rate: 154.35, Top-1 67.02, Top-5: 87.04\n",
            "  gluon_seresnext101_32x4d-260-ttp Mem: 2452, Rate:  95.84, Top-1 71.14, Top-5: 89.47\n",
            "  resnet50-260-ttp                 Mem: 2532, Rate: 135.92, Top-1 67.63, Top-5: 87.63\n",
            "  gluon_seresnext50_32x4d-260-ttp  Mem: 2586, Rate: 126.52, Top-1 69.67, Top-5: 88.62\n",
            "  dpn68b-224                       Mem: 2898, Rate: 155.15, Top-1 65.60, Top-5: 85.94\n",
            "  efficientnet_b0-224              Mem: 2930, Rate: 165.73, Top-1 64.58, Top-5: 85.89\n",
            "  gluon_seresnext101_32x4d-300-ttp Mem: 3252, Rate:  74.87, Top-1 71.99, Top-5: 90.10\n",
            "  gluon_seresnext50_32x4d-300-ttp  Mem: 3300, Rate: 104.69, Top-1 70.47, Top-5: 89.18\n",
            "  efficientnet_b1-240              Mem: 3370, Rate: 151.63, Top-1 67.55, Top-5: 87.29\n",
            "  ig_resnext101_32x8d-224          Mem: 3382, Rate:  83.35, Top-1 73.83, Top-5: 92.28\n",
            "  efficientnet_b2-260              Mem: 3992, Rate: 144.20, Top-1 67.80, Top-5: 88.20\n",
            "  ig_resnext101_32x8d-300-ttp      Mem: 4658, Rate:  43.62, Top-1 75.17, Top-5: 92.66\n",
            "  tf_efficientnet_b2-260           Mem: 4690, Rate: 142.73, Top-1 67.40, Top-5: 87.58\n",
            "  tf_efficientnet_b3-300           Mem: 8638, Rate: 119.13, Top-1 68.52, Top-5: 88.70\n",
            "  tf_efficientnet_b4-380           Mem: 11754, Rate:  69.10, Top-1 71.34, Top-5: 90.11\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "dLlD9SUufV4A",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 621
        },
        "outputId": "ab03124d-b28e-4615-d4d9-b3012a774328"
      },
      "source": [
        "mem_effnet = np.array([results[m]['gpu_used'] for m in names_effnet])\n",
        "mem_effnet_tf = np.array([results[m]['gpu_used'] for m in names_effnet_tf])\n",
        "mem_resnet = np.array([results[m]['gpu_used'] for m in names_resnet])\n",
        "mem_resnet_ttp = np.array([results[m]['gpu_used'] for m in names_resnet_ttp])\n",
        "\n",
        "fig = plt.figure()\n",
        "ax1 = fig.add_subplot(111)\n",
        "ax1.scatter(mem_effnet, acc_effnet, s=10, c='r', marker=\"s\", label='EfficientNet')\n",
        "ax1.plot(mem_effnet, acc_effnet, c='r')\n",
        "annotate(ax1, mem_effnet, acc_effnet, names_effnet, xo=-.3, align='right')\n",
        "\n",
        "ax1.scatter(mem_effnet_tf, acc_effnet_tf, s=10, c='#8C001A', marker=\"v\", label='TF-EfficientNet')\n",
        "ax1.plot(mem_effnet_tf, acc_effnet_tf, c='#8C001A')\n",
        "annotate(ax1, mem_effnet_tf, acc_effnet_tf, names_effnet_tf, xo=-.3, align='right')\n",
        "\n",
        "ax1.scatter(mem_resnet, acc_resnet, s=10, c='b', marker=\"o\", label='ResNet')\n",
        "ax1.plot(mem_resnet, acc_resnet, c='b')\n",
        "annotate(ax1, mem_resnet, acc_resnet, names_resnet, xo=.5, align='left')\n",
        "\n",
        "# Too busy\n",
        "#ax1.scatter(mem_resnet_ttp, acc_resnet_ttp, s=10, c='#43C6DB', marker=\"o\", label='ResNet TTP')\n",
        "#ax1.plot(mem_resnet_ttp, acc_resnet_ttp, c='#43C6DB')\n",
        "#annotate(ax1, mem_resnet_ttp, acc_resnet_ttp, names_resnet_ttp, xo=.5, align='left')\n",
        "\n",
        "ax1.set_title('Top-1 vs GPU Memory')\n",
        "ax1.set_ylabel('Top-1 Accuracy (%)')\n",
        "ax1.set_xlabel('GPU Memory (MB)')\n",
        "ax1.legend()\n",
        "plt.show()"
      ],
      "execution_count": 47,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "image/png": "iVBORw0KGgoAAAANSUhEUgAAA7AAAAJcCAYAAADATEiPAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzs3Xd0VVXexvHvJoQaRKQMTSkqSosB\nIi0IAQKoNAFFVBgQYaQLSMBIVXpHugUEhEEERRhERUqoIjUjUiaiBgmI1EiHJOz3jxvum1BjSHJy\nk+ezVpY5Ze/znBtnzC97n32MtRYRERERERGRtC6T0wFEREREREREEkMFrIiIiIiIiHgEFbAiIiIi\nIiLiEVTAioiIiIiIiEdQASsiIiIiIiIeQQWsiIiIiIiIeAQVsCIiIiIiIuIRVMCKiEiaYYw5H+/r\nmjHmUrztV5L5WjmNMV8YYw4ZY6wxpmpy9n+H6xY1xsw2xvwRd1+/GGNmGWMejTv+eFye6/f9qzHm\nzXjHYm7R56fGmAG3ud6ouP5ev2F/v7j9b6XEfYqIiKQEFbAiIpJmWGt9rn8BvwON4+1bkNyXA0KB\nl4Azydz3LRlj/gFsxfXf3+pALsA/bl/deKfGxvsc2gEjjDGB93DpcOCfN+z7Z9x+RxljMjudQURE\nPIcKWBER8RjGmOzGmGlxo5eRxpixxhjvuGNPG2MOGmPeMcacNsb8Zox54XZ9WWsvWmsnW2u3ANfu\nct22xphNN+wLMcZ8Fvd9U2PMAWPMOWPMYWNMj9t0FQwctda2s9b+Zl3OWGs/tNbOvE3ODbgKzXJ3\nyngXm4BCxpiH4/L6A1eBPTfcUzNjzI/GmChjzEZjTJl4x44ZY3obY/bGjQzPMMYUMsZ8Z4w5a4z5\nxhhzX7zzWxhj9sX1tfr6CHO8vvoYY/YCZ40xA40xC27I8oExZvQ93LOIiKRDKmBFRMSTvAP4AuWB\nSkAg0Dfe8eJAFqAg0BGYa4wpkQzXXQpUNMY8FG/fy8C/476fDfzTWpsL8AM23qafIOCLxF7UuAQC\npYCwvxs6HgvM5/9HYf8JzLvhWlWB6cCrQF7gE+DLG0ZImwG1gDJAK2AZ0Bv4B+ADdI7rqzwwB+gC\nFADWA8tu6OtFoF68azUxxuSMa58VeOHGjCIiIipgRUTEk7wCDLbWnrTW/gkMA9rEOx4DvGOtvWqt\nXQ2sBp6/14taa88CK3EVbdcLtKJx+wBigbLGmFzW2lPW2t236SofcOz6hjGmZdwI5TljzPJ453kZ\nY6KA08A04A1r7SbuzTygtTEmC67P5N83HH8dmGqt3WmtjbXWfgBkxfWHgusmxX32vwNbgM3W2j3W\n2ku4itkKcee1ApZaa0OttVeBEUB+XNOlr5torT1qrb1krY0AdgDN4441Bn6z1u69x3sWEZF0RgWs\niIh4BGOMwTWyeije7kNAkXjbJ6y1l284XtgYUyreokgnkxjh37ielwXX6OuSuOIMoCnQAvjdGLPW\nGPPkbfo4BRS6vmGt/cxaez8Qgmvk+LpYa+391to81tqy8aYXxwCZjDE3/vfbG4i+U3hr7UHgT2A4\nsDvuDwDxFQPejiuoo+IK6Pwk/Hzjt7l0i22fuO8LE+/nZK2NBY7c0NfhG64/F2gd931rXKOyIiIi\nCaiAFRERj2CttbhGL4vF2/0QrsLounzGmGw3HD9qrQ2PtxhUviRGWAmUMMaUxjXC6B7BtNZ+b61t\nhGsq7SpuHt28bg3QLK4YT4rIuH8Wu2F/CRIW9rczD3iTW0/NPQwMiiucr3/lsNYmespzPEfjZzTG\neOEqXuP/rOwNbZYAVY0xZYH63P4zFBGRDEwFrIiIeJKFwGBjTF5jTAGgP65nO6/zBgYaY7IYY+rg\nesby89t1ZozJGq/gzXJD8ZtA3MjuUmBy3HXWx/WR0xjTKm4Bo2jgHLdfFGoMrqnHHxtjSsQ945ob\n13O9dxWXYRkw0hiTxxjjbYxph6tY/C4RXXyCqzj88hbHPgC6G2P843L5GGOaGGNyJCbbDRbhKtRr\nxi2y9Rau0ecdt2tgrT0PLMf1Mw69xQixiIiIClgREfEog4B9wF5cixptxlUUXheBa5rtMVwLK71q\nrf31Dv0dwjX1NS+ugvSSMabgHc7/N66FmBZZa+MXqe3j+voL1wJJN76yBgBr7TGgCmCA73EVuzsB\nL+B2KxffqCNwGddn8GfctZ+x1p66W0Nr7QVr7Wpr7ZVbHNscl+F9IArXyscvc/NI6V1Za38EXovr\n6wSuVwQ1tdbe9A7bG8zFtUCXpg+LiMgtGdeMLBEREc9mjHka1yJEjzidRZLGGFMK1yjtP+IWhhIR\nEUlAI7AiIiLiuLjnZHsD81W8iojI7WS++ykiIiIiKccY8wDwO/Ar0MDhOCIikoZpCrGIiIiIiIh4\nBE0hFhEREREREY/gEVOI8+XLZ4sXL+50DBEREREREUkBO3fuPGmtzX+38zyigC1evDg7dtz21XEi\nIiIiIiLiwYwxhxJznqYQi4iIiIiIiEdQASsiIiIiIiIeQQWsiIiIiIiIeASPeAZWRERERETkRtHR\n0URGRnL58mWno0giZcuWjaJFi+Lt7Z2k9ipgRURERETEI0VGRpIrVy6KFy+OMcbpOHIX1lpOnTpF\nZGQkJUqUSFIfmkIsIiIiIiIe6fLly+TNm1fFq4cwxpA3b957GjFXASsiIiIiIh5LxatnudeflwpY\nERERERER8QgqYEVERERERJLIy8sLPz8/99eoUaMA2LhxI2XLlsXPz49Lly4RHBxM2bJlCQ4OZubM\nmcybN++2fR49epTnn38+yZkmTZrExYsX3dvFixenRYsW7u0lS5bQrl27O/YRFhbGypUrk5whpWgR\nJxERERERkSTKnj07YWFhN+1fsGABISEhtG7dGoAPPviA06dP4+Xlddc+CxcuzJIlS5KcadKkSbRu\n3ZocOXK49+3cuZN9+/ZRpkyZRPURFhbGjh07ePbZZ5OcIyVoBFZERERERCQZffTRR3z22WcMHDiQ\nV155hSZNmnD+/HkqVarEokWLGDJkCOPGjQPg4MGDBAUF8cQTT1CxYkV++eUXIiIiKFeuHACxsbEE\nBwfz5JNP4uvry/vvvw9AaGgogYGBPP/88zz++OO88sorWGuZPHkyR48epXbt2tSuXdud6c0332T4\n8OE3Zb1w4QLt27encuXKVKhQgWXLlnH16lUGDRrEokWL8PPzY9GiRanwqSWORmBFRERERCRjuO8+\nOHfu/7dz5YKzZ++py0uXLuHn5+feDgkJoUOHDmzatIlGjRq5pwL7+Pi4R2qHDBniPv+VV17hrbfe\nolmzZly+fJlr165x/Phx9/FZs2aRO3dutm/fzpUrVwgICKB+/foA7N69m71791K4cGECAgLYvHkz\nPXr0YMKECaxbt458+fK5+2nZsiXTp0/n4MGDCfIPHz6cOnXqMHv2bKKioqhcuTJBQUG8++677Nix\ng6lTp97T55PcUqyANcY8BsQv1UsCg6y1k+KOvwmMA/Jba0+mVA4REREREREgYfF6q+0kuN0U4sTF\nOceRI0do1qwZANmyZbvpnFWrVvHjjz+6pxT/9ddf/Pzzz2TJkoXKlStTtGhRAPz8/IiIiKBGjRq3\nvJaXlxfBwcGMHDmSZ555JkH/y5cvd48IX758md9//z1J95MaUqyAtdb+D/ADMMZ4AUeApXHbDwL1\ngbT7yYiIiIiIiDjMWsuUKVNo0KBBgv2hoaFkzZrVve3l5UVMTMwd+2rTpg0jR450T0++3v/nn3/O\nY489luDcH374IRnSJ7/Uega2LvCLtfZQ3PZEoC9gU+n6IiIiIiIiaUquXLkoWrQoX375JQBXrlxJ\nsHowQIMGDZgxYwbR0dEAhIeHc+HChbv2e+4Wo8ve3t706tWLiRMnJuh/ypQpWOsqzXbv3n3HPpyW\nWgVsK2AhgDGmK
            "text/plain": [
              "<Figure size 1152x720 with 1 Axes>"
            ]
          },
          "metadata": {
            "tags": []
          }
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vU-2SHss55jw",
        "colab_type": "text"
      },
      "source": [
        "# 1 on 1 Comparisons\n",
        "A few model to model comparisons, pairing models that are a little more fair than the original paper when you consider all of accuracy, rate, and memory efficiency."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "SKA-MF-yShDW",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 187
        },
        "outputId": "83f55196-040a-4a2a-a49e-e6629c38ce83"
      },
      "source": [
        "def compare_results(results, namea, nameb):\n",
        "    resa, resb = results[namea], results[nameb]\n",
        "    top1r = 100. * (resa['top1'] - resb['top1']) / resb['top1']\n",
        "    top5r = 100. * (resa['top5'] - resb['top5']) / resb['top5']\n",
        "    rater = 100. * (resa['rate'] - resb['rate']) / resb['rate']\n",
        "    memr = 100. * (resa['gpu_used'] - resb['gpu_used']) / resb['gpu_used']\n",
        "    print('{:22} vs {:28} top1: {:+4.2f}%, top5: {:+4.2f}%, rate: {:+4.2f}%, mem: {:+.2f}%'.format(\n",
        "        namea, nameb, top1r, top5r, rater, memr))\n",
        "    \n",
        "#compare_results(results, 'efficientnet_b0-224', 'seresnext26_32x4d-224')\n",
        "compare_results(results, 'efficientnet_b0-224', 'dpn68b-224')\n",
        "compare_results(results, 'efficientnet_b1-240', 'resnet50-224')\n",
        "compare_results(results, 'efficientnet_b1-240', 'resnet50-240-ttp')\n",
        "compare_results(results, 'efficientnet_b2-260', 'gluon_seresnext50_32x4d-224')\n",
        "compare_results(results, 'tf_efficientnet_b3-300', 'gluon_seresnext50_32x4d-224')\n",
        "compare_results(results, 'tf_efficientnet_b3-300', 'gluon_seresnext101_32x4d-224')\n",
        "compare_results(results, 'tf_efficientnet_b4-380', 'ig_resnext101_32x8d-224')\n",
        "\n",
        "print('\\nNote the cost of running with the SAME padding hack:')\n",
        "compare_results(results, 'tf_efficientnet_b2-260', 'efficientnet_b2-260')"
      ],
      "execution_count": 34,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "efficientnet_b0-224    vs dpn68b-224                   top1: -1.55%, top5: -0.06%, rate: +6.82%, mem: +1.10%\n",
            "efficientnet_b1-240    vs resnet50-224                 top1: +1.11%, top5: +0.33%, rate: -4.94%, mem: +120.26%\n",
            "efficientnet_b1-240    vs resnet50-240-ttp             top1: +0.79%, top5: +0.29%, rate: -1.76%, mem: +61.71%\n",
            "efficientnet_b2-260    vs gluon_seresnext50_32x4d-224  top1: -1.27%, top5: -0.14%, rate: -4.14%, mem: +139.04%\n",
            "tf_efficientnet_b3-300 vs gluon_seresnext50_32x4d-224  top1: -0.22%, top5: +0.43%, rate: -20.81%, mem: +417.25%\n",
            "tf_efficientnet_b3-300 vs gluon_seresnext101_32x4d-224 top1: -2.13%, top5: -0.24%, rate: -9.45%, mem: +376.19%\n",
            "tf_efficientnet_b4-380 vs ig_resnext101_32x8d-224      top1: -3.37%, top5: -2.35%, rate: -17.10%, mem: +247.55%\n",
            "\n",
            "Note the cost of running with the SAME padding hack:\n",
            "tf_efficientnet_b2-260 vs efficientnet_b2-260          top1: -0.59%, top5: -0.70%, rate: -1.02%, mem: +17.48%\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "aSibvBwp5-CX",
        "colab_type": "text"
      },
      "source": [
        "# How are we generalizing to ImageNet-V2?\n",
        "\n",
        "This is often an interesting comparison. The results for the IG ResNeXt are impressive, it's the lowest gap between ImageNet-1k and ImageNet-V2 validation scores that I've seen (http://people.csail.mit.edu/ludwigs/papers/imagenet.pdf)."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "aahwcXGnSOab",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 442
        },
        "outputId": "7a33b7ad-619e-4479-e585-ee9068a3bc13"
      },
      "source": [
        "print('Results by absolute accuracy gap between ImageNet-V2 Matched-Frequency and original ImageNet top-1:')\n",
        "no_ttp_keys = [k for k in results.keys() if 'ttp' not in k]\n",
        "gaps = {x: (results[x]['top1'] - orig_top1[results[x]['model_name']]) for x in no_ttp_keys}\n",
        "sorted_keys = list(sorted(no_ttp_keys, key=lambda x: gaps[x], reverse=True))\n",
        "for m in sorted_keys:\n",
        "  print('  Model: {:34} {:4.2f}%'.format(m, gaps[m]))\n",
        "print()\n",
        "\n",
        "print('Results by relative accuracy gap between ImageNet-V2 Matched-Frequency and original ImageNet top-1:')\n",
        "gaps = {x: 100 * (results[x]['top1'] - orig_top1[results[x]['model_name']]) / orig_top1[results[x]['model_name']] for x in no_ttp_keys}\n",
        "sorted_keys = list(sorted(no_ttp_keys, key=lambda x: gaps[x], reverse=True))\n",
        "for m in sorted_keys:\n",
        "  print('  Model: {:34} {:4.2f}%'.format(m, gaps[m]))"
      ],
      "execution_count": 18,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Results by absolute accuracy gap between ImageNet-V2 Matched-Frequency and original ImageNet top-1:\n",
            "  Model: ig_resnext101_32x8d-224            -8.86%\n",
            "  Model: gluon_seresnext101_32x4d-224       -10.89%\n",
            "  Model: efficientnet_b1-240                -11.14%\n",
            "  Model: gluon_seresnext50_32x4d-224        -11.24%\n",
            "  Model: tf_efficientnet_b4-380             -11.26%\n",
            "  Model: resnet50-224                       -11.68%\n",
            "  Model: dpn68b-224                         -11.91%\n",
            "  Model: efficientnet_b2-260                -11.96%\n",
            "  Model: tf_efficientnet_b2-260             -12.21%\n",
            "  Model: efficientnet_b0-224                -12.33%\n",
            "  Model: tf_efficientnet_b3-300             -12.35%\n",
            "\n",
            "Results by relative accuracy gap between ImageNet-V2 Matched-Frequency and original ImageNet top-1:\n",
            "  Model: ig_resnext101_32x8d-224            -10.71%\n",
            "  Model: gluon_seresnext101_32x4d-224       -13.46%\n",
            "  Model: tf_efficientnet_b4-380             -13.64%\n",
            "  Model: gluon_seresnext50_32x4d-224        -14.07%\n",
            "  Model: efficientnet_b1-240                -14.16%\n",
            "  Model: resnet50-224                       -14.88%\n",
            "  Model: efficientnet_b2-260                -14.99%\n",
            "  Model: tf_efficientnet_b3-300             -15.28%\n",
            "  Model: tf_efficientnet_b2-260             -15.33%\n",
            "  Model: dpn68b-224                         -15.37%\n",
            "  Model: efficientnet_b0-224                -16.03%\n"
          ],
          "name": "stdout"
        }
      ]
    }
  ]
}