Ross Wightman
87939e6fab
Refactor device handling in scripts, distributed init to be less 'cuda' centric. More device args passed through where needed.
2 years ago
Ross Wightman
ff6a919cf5
Add --fast-norm arg to benchmark.py, train.py, validate.py
2 years ago
Ross Wightman
0dbd9352ce
Add bulk_runner script and updates to benchmark.py and validate.py for better error handling in bulk runs (used for benchmark and validation result runs). Improved batch size decay stepping on retry...
2 years ago
Ross Wightman
4670d375c6
Reorg benchmark.py import
2 years ago
Ross Wightman
28e0152043
Add --no-retry flag to benchmark.py to skip batch_size decay and retry on error. Fix #1226 . Update deepspeed profile usage for latest DS releases. Fix # 1333
2 years ago
Ross Wightman
34f382f8f6
move dataconfig before script, scripting killing metadata now (PyTorch 1.12? just nvfuser?)
2 years ago
Ross Wightman
2d7ab06503
Move aot-autograd opt after model metadata used to setup data config in benchmark.py
2 years ago
Xiao Wang
ca991c1fa5
add --aot-autograd
2 years ago
Ross Wightman
372ad5fa0d
Significant model refactor and additions:
...
* All models updated with revised foward_features / forward_head interface
* Vision transformer and MLP based models consistently output sequence from forward_features (pooling or token selection considered part of 'head')
* WIP param grouping interface to allow consistent grouping of parameters for layer-wise decay across all model types
* Add gradient checkpointing support to a significant % of models, especially popular architectures
* Formatting and interface consistency improvements across models
* layer-wise LR decay impl part of optimizer factory w/ scale support in scheduler
* Poolformer and Volo architectures added
3 years ago
Ross Wightman
95cfc9b3e8
Merge remote-tracking branch 'origin/master' into norm_norm_norm
3 years ago
Ross Wightman
cf4334391e
Update benchmark and validate scripts to output results in JSON with a fixed delimiter for use in multi-process launcher
3 years ago
kozistr
56a6b38f76
refactor: remove if-condition
3 years ago
Ross Wightman
f0f9eccda8
Add --fuser arg to train/validate/benchmark scripts to select jit fuser type
3 years ago
Ross Wightman
683fba7686
Add drop args to benchmark.py
3 years ago
Ross Wightman
aaff2d82d0
Add new 50ts attn models to benchmark/meta csv files
3 years ago
Ross Wightman
1e17863b7b
Fixed botne*t26 model results, add some 50ts self-attn variants
3 years ago
Ross Wightman
71f00bfe9e
Don't run profile if model is torchscripted
3 years ago
Ross Wightman
5882e62ada
Add activation count to fvcore based profiling in benchmark.py
3 years ago
Ross Wightman
f7325c7b71
Support either deepspeed or fvcore for flop profiling
3 years ago
Ross Wightman
66253790d4
Add `--bench profile` mode for benchmark.py to just run deepspeed detailed profile on model
3 years ago
Ross Wightman
13a8bf7972
Add train size override and deepspeed GMACs counter (if deepspeed installed) to benchmark.py
3 years ago
Ross Wightman
ac469b50da
Optimizer improvements, additions, cleanup
...
* Add MADGRAD code
* Fix Lamb (non-fused variant) to work w/ PyTorch XLA
* Tweak optimizer factory args (lr/learning_rate and opt/optimizer_name), may break compat
* Use newer fn signatures for all add,addcdiv, addcmul in optimizers
* Use upcoming PyTorch native Nadam if it's available
* Cleanup lookahead opt
* Add optimizer tests
* Remove novograd.py impl as it was messy, keep nvnovograd
* Make AdamP/SGDP work in channels_last layout
* Add rectified adablief mode (radabelief)
* Support a few more PyTorch optim, adamax, adagrad
3 years ago
Ross Wightman
137a374930
Merge pull request #555 from MichaelMonashev/patch-1
...
benchmark.py argument description fixed
4 years ago
Ross Wightman
e15e68d881
Fix #566 , summary.csv writing to pwd on local_rank != 0. Tweak benchmark mem handling to see if it reduces likelihood of 'bad' exceptions on OOM.
4 years ago
Michael Monashev
0be1fa4793
Argument description fixed
4 years ago
Ross Wightman
37c71a5609
Some further create_optimizer_v2 tweaks, remove some redudnant code, add back safe model str. Benchmark step times per batch.
4 years ago
Ross Wightman
288682796f
Update benchmark script to add precision arg. Fix some downstream (DeiT) compat issues with latest changes. Bump version to 0.4.7
4 years ago
Ross Wightman
4445eaa470
Add img_size to benchmark output
4 years ago
Ross Wightman
0706d05d52
Benchmark models listed in txt file. Add more hybrid vit variants for testing
4 years ago
Ross Wightman
0e16d4e9fb
Add benchmark.py script, and update optimizer factory to be more friendly to use outside of argparse interface.
4 years ago