Follow the steps in the "Usage" section to set up the environment. Here are the main steps to follow:
All development and testing has been done in Conda Python 3 environments on Linux x86-64 systems, specifically Python 3.6.x and 3.7.x. (It also works with 3.8.x, I had tried it.)
Steps to set up the conda environment (if running on CPU, don't need to install cudatoolkit=10.1):
After the conda environment is set up, install the 'timm' package:
```
pip install timm
```
To check to make sure the 'timm' package was install, open an Python shell and try to run below:
```
>>> import timm
>>> m = timm.create_model('mobilenetv3_small_100', pretrained=True)
>>> m.eval()
```
## Run training with the ResNeXt50 model
There are a few examples on running different models for training in the "Training Hyperparameters" section. To run the one for ResNeXt50 with CUDA/GPU, the example is below.
These params will also work well for SE-ResNeXt-50 and SK-ResNeXt-50 and likely 101. I used them for the SK-ResNeXt-50 32x4d that I trained with 2 GPU using a slightly higher LR per effective batch size (lr=0.18, b=192 per GPU). The cmd line below are tuned for 8 GPU training.
Here is the breakdown of the arguments from the above example (detailed explanation for all possible arguments are in the train.py file in the argparser):
`./distributed_train.sh <num of GPUs to use> <path to imagenet directory> --model resnext50_32x4d --lr <learning rate> --warmup-epochs <epochs to warmup LR, if scheduler supports> --epochs <number of epochs to train (default: 2)> --weight-decay <weight decay default: 0.0001> --sched <learning rate scheduler> --reprob <Random erase prob (default: 0.)> --recount <Random erase count (default: 1)> --remode <Random erase mode (default: "const")> --aa <Use AutoAugment policy. "v0" or "original". (default: None)> -b <input batch size for training> -j <how many training processes to use/number of workers (default: 1)> --amp <flag that indicates use NVIDIA amp for mixed precision training> --dist-bn <Distribute BatchNorm stats between nodes after each epoch ("broadcast", "reduce", or "")>`
Some other arguments that may be helpful:
`--log-interval <how many batches to wait before logging training status>`
The default log interval is set to 50.
`--drop <dropout rate>`
The default dropout rate is 0.
`--opt <Optimizer>`
The default optimizer is SGD, but some other choices include Adam ("adam"), RMSProp ("rmsprop"), etc. The optimizer choices are in the timm/optim/optim_factory.py file.