Ross Wightman 0dbd9352ce Add bulk_runner script and updates to benchmark.py and validate.py for better error handling in bulk runs (used for benchmark and validation result runs). Improved batch size decay stepping on retry... 3 years ago
..
__init__.py Add bulk_runner script and updates to benchmark.py and validate.py for better error handling in bulk runs (used for benchmark and validation result runs). Improved batch size decay stepping on retry... 3 years ago
agc.py Initial AGC impl. Still testing. 4 years ago
checkpoint_saver.py Fix so that checkpoint saver works with max history of 1. Add checkpoint-hist arg to train.py. 4 years ago
clip_grad.py Initial AGC impl. Still testing. 4 years ago
cuda.py Initial AGC impl. Still testing. 4 years ago
decay_batch.py Add bulk_runner script and updates to benchmark.py and validate.py for better error handling in bulk runs (used for benchmark and validation result runs). Improved batch size decay stepping on retry... 3 years ago
distributed.py Reorg of utils into separate modules 4 years ago
jit.py disable nvfuser for jit te/legacy modes (for PT 1.12+) 3 years ago
log.py Reorg of utils into separate modules 4 years ago
metrics.py Tweak accuracy topk safety. Fix 3 years ago
misc.py Reorg of utils into separate modules 4 years ago
model.py Fix some formatting in utils/model.py 3 years ago
model_ema.py Add separate set and update method to ModelEmaV2 4 years ago
random.py Remove commented code, add more consistent seed fn 4 years ago
summary.py Make wandb optional 4 years ago