Commit Graph

19 Commits (a45186a6e84fed05a39065bf4cbc2258d33a73e9)

Author SHA1 Message Date
Ross Wightman 3581affb77 Update train.py with some flags related to scheduler tweaks, fix best checkpoint bug.
3 years ago
Ross Wightman f2e14685a8 Add force-cpu flag for train/validate, fix CPU fallback for device init, remove old force cpu flag for EMA model weights
3 years ago
Ross Wightman b76b48e8e9 Update optimizer creation for master optimizer changes
3 years ago
Ross Wightman 40457e5691 Transforms, augmentation work for bits, add RandomErasing support for XLA (pushing into transforms), revamp of transform/preproc config, etc ongoing...
3 years ago
Ross Wightman 847b4af144
Update README.md
3 years ago
Ross Wightman 5c5cadfe4c
Update README.md
3 years ago
Ross Wightman ee2b8f49ee
Update README.md
3 years ago
Ross Wightman cc870df7b8
Update README.md
3 years ago
Ross Wightman 6b2d9c2660 Another bits/README.md update
3 years ago
Ross Wightman c3db5f5801 Worker hack for TFDS eval, add TPU env var setting.
3 years ago
Ross Wightman f411724de4 Fix checkpoint delete issue. Add README about bits and initial Pytorch XLA usage on TPU-VM. Add some FIXMEs and fold train_cfg into train_state by default.
3 years ago
Ross Wightman 91ab0b6ce5 Add proper TrainState checkpoint save/load. Some reorg/refactoring and other cleanup. More to go...
4 years ago
Ross Wightman 5b9c69e80a Add basic training resume based on legacy code
4 years ago
Ross Wightman 72ca831dd4 Back to using strings for the enum translation, forgot about import dep
4 years ago
Ross Wightman cbd4ee737f Fix model init for XLA, remove some prints.
4 years ago
Ross Wightman aa92d7b1c5 Major timm.bits update. Updater and DeviceEnv now dataclasses, after_step closure used, metrics base impl w/ distributed reduce, many tweaks/fixes.
4 years ago
Ross Wightman 938716c753 Fix import issue, use devenv for dist info in parser_tfds
4 years ago
Ross Wightman 76de984a5f Fix some bugs with XLA support, logger, add hacky xla dist launch script since torch.dist.launch doesn't work
4 years ago
Ross Wightman 12d9a6d4d2 First timm.bits commit, add initial abstractions, WIP updates to train, val... some of it working
4 years ago