Taeksang Kim
7f29a46d44
Add gradient accumulation option to train.py
...
option: iters-to-accum(iterations to accmulate)
Gradient accumulation improves training performance(samples/s).
It can reduce the number of parameter sharing between each node.
This option can be helpful when network is bottleneck.
Signed-off-by: Taeksang Kim <voidbag@puzzle-ai.com>
2 years ago
Fredo Guan
81ca323751
Davit update formatting and fix grad checkpointing ( #7 )
...
fixed head to gap->norm->fc as per convnext, along with option for norm->gap->fc
failed tests due to clip convnext models, davit tests passed
3 years ago
Ross Wightman
b1b024dfed
Scheduler update, add v2 factory method, support scheduling on updates instead of just epochs. Add LR to summary csv. Add lr_base scaling calculations to train script. Fix #1168
3 years ago
Ross Wightman
87939e6fab
Refactor device handling in scripts, distributed init to be less 'cuda' centric. More device args passed through where needed.
3 years ago
Ross Wightman
0dbd9352ce
Add bulk_runner script and updates to benchmark.py and validate.py for better error handling in bulk runs (used for benchmark and validation result runs). Improved batch size decay stepping on retry...
3 years ago
Ross Wightman
324a4e58b6
disable nvfuser for jit te/legacy modes (for PT 1.12+)
3 years ago
Ross Wightman
2f2b22d8c7
Disable nvfuser fma / opt level overrides per #1244
3 years ago
jjsjann123
f88c606fcf
fixing channels_last on cond_conv2d; update nvfuser debug env variable
3 years ago
Ross Wightman
f0f9eccda8
Add --fuser arg to train/validate/benchmark scripts to select jit fuser type
4 years ago
Ross Wightman
57992509f9
Fix some formatting in utils/model.py
4 years ago
Ross Wightman
e5da481073
Small post-merge tweak for freeze/unfreeze, add to __init__ for utils
4 years ago
Alexander Soare
431e60c83f
Add acknowledgements for freeze_batch_norm inspiration
4 years ago
Alexander Soare
65c3d78b96
Freeze unfreeze functionality finalized. Tests added
4 years ago
Alexander Soare
0cb8ea432c
wip
4 years ago
Ross Wightman
d667351eac
Tweak accuracy topk safety. Fix #807
4 years ago
Yohann Lereclus
35c9740826
Fix accuracy when topk > num_classes
4 years ago
Ross Wightman
e685618f45
Merge pull request #550 from amaarora/wandb
...
Wandb Support
4 years ago
Ross Wightman
7c97e66f7c
Remove commented code, add more consistent seed fn
4 years ago
Aman Arora
5772c55c57
Make wandb optional
4 years ago
Aman Arora
f54897cc0b
make wandb not required but rather optional as huggingface_hub
4 years ago
Aman Arora
3f028ebc0f
import wandb in summary.py
4 years ago
Aman Arora
624c9b6949
log to wandb only if using using wandb
4 years ago
Aman Arora
6b18061773
Add GIST to docstring for quick access
4 years ago
Aman Arora
92b1db9a79
update docstrings and add check on and
4 years ago
Aman Arora
b85be24054
update to work with fnmatch
4 years ago
Aman Arora
20626e8387
Add to extract stats for SPP
4 years ago
Ross Wightman
4f49b94311
Initial AGC impl. Still testing.
4 years ago
Ross Wightman
4203efa36d
Fix #387 so that checkpoint saver works with max history of 1. Add checkpoint-hist arg to train.py.
4 years ago
Ross Wightman
4ca52d73d8
Add separate set and update method to ModelEmaV2
5 years ago
Ross Wightman
27bbc70d71
Add back old ModelEma and rename new one to ModelEmaV2 to avoid compat breaks in dependant code. Shuffle train script, add a few comments, remove DataParallel support, support experimental torchscript training.
5 years ago
Ross Wightman
9214ca0716
Simplifying EMA...
5 years ago
Ross Wightman
4a3df7842a
Fix topn metric view regression on PyTorch 1.7
5 years ago
Ross Wightman
80078c47bb
Add Adafactor and Adahessian optimizers, cleanup optimizer arg passing, add gradient clipping support.
5 years ago
Ross Wightman
fcb6258877
Add missing leaky_relu layer factory defn, update Apex/Native loss scaler interfaces to support unscaled grad clipping. Bump ver to 0.2.2 for pending release.
5 years ago
Ross Wightman
532e3b417d
Reorg of utils into separate modules
5 years ago