Merge pull request #1 from jhyau/first_commit

Edited readme to include the steps followed when first running the Re…
5 years ago · c402bd28b5
parent 17f4dd2778 01608cd912
commit c402bd28b5
1 changed files with 54 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -1,5 +1,59 @@
 # PyTorch Image Models, etc
 ## Setup 
 Follow the steps in the "Usage" section to set up the environment. Here are the main steps to follow:
 All development and testing has been done in Conda Python 3 environments on Linux x86-64 systems, specifically Python 3.6.x and 3.7.x. (It also works with 3.8.x, I had tried it.)
 Steps to set up the conda environment (if running on CPU, don't need to install cudatoolkit=10.1):
 ```
 conda create -n torch-env
 conda activate torch-env
 conda install -c pytorch pytorch torchvision cudatoolkit=10.1
 conda install pyyaml
 ```
 After the conda environment is set up, install the 'timm' package:
 ```
 pip install timm
 ```
 To check to make sure the 'timm' package was install, open an Python shell and try to run below:
 ```
 >>> import timm
 >>> m = timm.create_model('mobilenetv3_small_100', pretrained=True)
 >>> m.eval()
 ```
 ## Run training with the ResNeXt50 model
 There are a few examples on running different models for training in the "Training Hyperparameters" section. To run the one for ResNeXt50 with CUDA/GPU, the example is below.
 ### ResNeXt-50 32x4d w/ RandAugment - 79.762 top-1, 94.60 top-5
 These params will also work well for SE-ResNeXt-50 and SK-ResNeXt-50 and likely 101. I used them for the SK-ResNeXt-50 32x4d that I trained with 2 GPU using a slightly higher LR per effective batch size (lr=0.18, b=192 per GPU). The cmd line below are tuned for 8 GPU training.
 `./distributed_train.sh 8 /imagenet --model resnext50_32x4d --lr 0.6 --warmup-epochs 5 --epochs 240 --weight-decay 1e-4 --sched cosine --reprob 0.4 --recount 3 --remode pixel --aa rand-m7-mstd0.5-inc1 -b 192 -j 6 --amp --dist-bn reduce`
 Here is the breakdown of the arguments from the above example (detailed explanation for all possible arguments are in the train.py file in the argparser):
 `./distributed_train.sh <num of GPUs to use> <path to imagenet directory> --model resnext50_32x4d --lr <learning rate> --warmup-epochs <epochs to warmup LR, if scheduler supports> --epochs <number of epochs to train (default: 2)> --weight-decay <weight decay default: 0.0001> --sched <learning rate scheduler> --reprob <Random erase prob (default: 0.)> --recount <Random erase count (default: 1)> --remode <Random erase mode (default: "const")> --aa <Use AutoAugment policy. "v0" or "original". (default: None)> -b <input batch size for training> -j <how many training processes to use/number of workers (default: 1)> --amp <flag that indicates use NVIDIA amp for mixed precision training> --dist-bn <Distribute BatchNorm stats between nodes after each epoch ("broadcast", "reduce", or "")>`
 Some other arguments that may be helpful:
 `--log-interval <how many batches to wait before logging training status>`
 The default log interval is set to 50.
 `--drop <dropout rate>`
 The default dropout rate is 0.
 `--opt <Optimizer>`
 The default optimizer is SGD, but some other choices include Adam ("adam"), RMSProp ("rmsprop"), etc. The optimizer choices are in the timm/optim/optim_factory.py file.
 ----------------------------------------------------------------------------------------
 ## What's New
 ### June 11, 2020