Commit Graph

23 Commits (b4ea69c9ce9f855f691e3afbe7c494f5b0cc7dbd)

Author SHA1 Message Date
Ross Wightman 1186fc9c73 Merge remote-tracking branch 'origin/master' into bits_and_tpu
2 years ago
Ross Wightman 4f338556d8 Fixes and improvements for metrics, tfds parser, loader / transform handling
3 years ago
Ross Wightman 80ca078aed Fix a few bugs and formatting/naming issues
3 years ago
Ross Wightman 59a3409182
Update README.md
3 years ago
Ross Wightman 3581affb77 Update train.py with some flags related to scheduler tweaks, fix best checkpoint bug.
3 years ago
Ross Wightman f2e14685a8 Add force-cpu flag for train/validate, fix CPU fallback for device init, remove old force cpu flag for EMA model weights
3 years ago
Ross Wightman b76b48e8e9 Update optimizer creation for master optimizer changes
3 years ago
Ross Wightman 40457e5691 Transforms, augmentation work for bits, add RandomErasing support for XLA (pushing into transforms), revamp of transform/preproc config, etc ongoing...
3 years ago
Ross Wightman 847b4af144
Update README.md
4 years ago
Ross Wightman 5c5cadfe4c
Update README.md
4 years ago
Ross Wightman ee2b8f49ee
Update README.md
4 years ago
Ross Wightman cc870df7b8
Update README.md
4 years ago
Ross Wightman 6b2d9c2660 Another bits/README.md update
4 years ago
Ross Wightman c3db5f5801 Worker hack for TFDS eval, add TPU env var setting.
4 years ago
Ross Wightman f411724de4 Fix checkpoint delete issue. Add README about bits and initial Pytorch XLA usage on TPU-VM. Add some FIXMEs and fold train_cfg into train_state by default.
4 years ago
Ross Wightman 91ab0b6ce5 Add proper TrainState checkpoint save/load. Some reorg/refactoring and other cleanup. More to go...
4 years ago
Ross Wightman 5b9c69e80a Add basic training resume based on legacy code
4 years ago
Ross Wightman 72ca831dd4 Back to using strings for the enum translation, forgot about import dep
4 years ago
Ross Wightman cbd4ee737f Fix model init for XLA, remove some prints.
4 years ago
Ross Wightman aa92d7b1c5 Major timm.bits update. Updater and DeviceEnv now dataclasses, after_step closure used, metrics base impl w/ distributed reduce, many tweaks/fixes.
4 years ago
Ross Wightman 938716c753 Fix import issue, use devenv for dist info in parser_tfds
4 years ago
Ross Wightman 76de984a5f Fix some bugs with XLA support, logger, add hacky xla dist launch script since torch.dist.launch doesn't work
4 years ago
Ross Wightman 12d9a6d4d2 First timm.bits commit, add initial abstractions, WIP updates to train, val... some of it working
4 years ago