Commit Graph

151 Commits (ef57561d5124f831051e5996d8346e95ded69c14)

Author SHA1 Message Date
Ross Wightman da2796ae82 Add webdataset (WDS) support, update TFDS to make some naming in parsers more similar. Fix workers=0 compatibility. Add ImageNet22k/12k synset defs.
3 years ago
Ross Wightman a16ea1e355 Merge remote-tracking branch 'origin/norm_norm_norm' into bits_and_tpu
3 years ago
Ross Wightman 372ad5fa0d Significant model refactor and additions:
3 years ago
Ross Wightman fafece230b Allow changing base lr batch size from 256 via arg
3 years ago
Ross Wightman 7148039f9f Tweak base lr log
3 years ago
Ross Wightman f82fb6b608 Add base lr w/ linear and sqrt scaling to train script
3 years ago
Ross Wightman 95cfc9b3e8 Merge remote-tracking branch 'origin/master' into norm_norm_norm
3 years ago
Ross Wightman abc9ba2544 Transitioning default_cfg -> pretrained_cfg. Improving handling of pretrained_cfg source (HF-Hub, files, timm config, etc). Checkpoint handling tweaks.
3 years ago
Ross Wightman f0f9eccda8 Add --fuser arg to train/validate/benchmark scripts to select jit fuser type
3 years ago
Ross Wightman 5ccf682a8f Remove deprecated bn-tf train arg and create_model handler. Add evos/evob models back into fx test filter until norm_norm_norm branch merged.
3 years ago
Ross Wightman 4c8bb295ab Remove bn-tf arg
3 years ago
Ross Wightman 4f338556d8 Fixes and improvements for metrics, tfds parser, loader / transform handling
3 years ago
Ross Wightman 80ca078aed Fix a few bugs and formatting/naming issues
3 years ago
Ross Wightman 406c486ba2 Merge remote-tracking branch 'origin/more_datasets' into bits_and_tpu
3 years ago
Ross Wightman ba65dfe2c6 Dataset work
3 years ago
Ross Wightman cd638d50a5
Merge pull request #880 from rwightman/fixes_bce_regnet
3 years ago
Ross Wightman d9abfa48df Make broadcast_buffers disable its own flag for now (needs more testing on interaction with dist_bn)
3 years ago
Ross Wightman 80075b0b8a Add worker_seeding arg to allow selecting old vs updated data loader worker seed for (old) experiment repeatability
3 years ago
Shoufa Chen 908563d060
fix `use_amp`
3 years ago
Ross Wightman 25d52ea71d Merge remote-tracking branch 'origin/fixes_bce_regnet' into bits_and_tpu
3 years ago
Ross Wightman 0387e6057e Update binary cross ent impl to use thresholding as an option (convert soft targets from mixup/cutmix to 0, 1)
3 years ago
Ross Wightman 3581affb77 Update train.py with some flags related to scheduler tweaks, fix best checkpoint bug.
3 years ago
Ross Wightman 0639d9a591 Fix updated validation_batch_size fallback
3 years ago
Ross Wightman 5db057dca0 Fix misnamed arg, tweak other train script args for better defaults.
3 years ago
Ross Wightman fb94350896 Update training script and loader factory to allow use of scheduler updates, repeat augment, and bce loss
3 years ago
Ross Wightman f2e14685a8 Add force-cpu flag for train/validate, fix CPU fallback for device init, remove old force cpu flag for EMA model weights
3 years ago
Ross Wightman 0d82876132 Add comment for reference re PyTorch XLA 'race' issue
3 years ago
Ross Wightman 40457e5691 Transforms, augmentation work for bits, add RandomErasing support for XLA (pushing into transforms), revamp of transform/preproc config, etc ongoing...
3 years ago
SamuelGabriel 7c19c35d9f
Global instead of local rank.
3 years ago
Ross Wightman c3db5f5801 Worker hack for TFDS eval, add TPU env var setting.
3 years ago
Ross Wightman f411724de4 Fix checkpoint delete issue. Add README about bits and initial Pytorch XLA usage on TPU-VM. Add some FIXMEs and fold train_cfg into train_state by default.
3 years ago
Ross Wightman 91ab0b6ce5 Add proper TrainState checkpoint save/load. Some reorg/refactoring and other cleanup. More to go...
4 years ago
Ross Wightman 5b9c69e80a Add basic training resume based on legacy code
4 years ago
Ross Wightman cbd4ee737f Fix model init for XLA, remove some prints.
4 years ago
Ross Wightman 6d90fcf282 Fix distribute_bn and model_ema
4 years ago
Ross Wightman aa92d7b1c5 Major timm.bits update. Updater and DeviceEnv now dataclasses, after_step closure used, metrics base impl w/ distributed reduce, many tweaks/fixes.
4 years ago
Ross Wightman 76de984a5f Fix some bugs with XLA support, logger, add hacky xla dist launch script since torch.dist.launch doesn't work
4 years ago
Ross Wightman 12d9a6d4d2 First timm.bits commit, add initial abstractions, WIP updates to train, val... some of it working
4 years ago
Ross Wightman e15e68d881 Fix #566, summary.csv writing to pwd on local_rank != 0. Tweak benchmark mem handling to see if it reduces likelihood of 'bad' exceptions on OOM.
4 years ago
Ross Wightman e685618f45
Merge pull request #550 from amaarora/wandb
4 years ago
Ross Wightman 7c97e66f7c Remove commented code, add more consistent seed fn
4 years ago
Aman Arora 5772c55c57 Make wandb optional
4 years ago
Aman Arora f54897cc0b make wandb not required but rather optional as huggingface_hub
4 years ago
Aman Arora f13f7508a9 Keep changes to minimal and use args.experiment as wandb project name if it exists
4 years ago
Aman Arora f8bb13f640 Default project name to None
4 years ago
Aman Arora 3f028ebc0f import wandb in summary.py
4 years ago
Aman Arora a9e5d9e5ad log loss as before
4 years ago
Aman Arora 624c9b6949 log to wandb only if using using wandb
4 years ago
Aman Arora 00c8e0b8bd Make use of wandb configurable
4 years ago
Aman Arora 8e6fb861e4 Add wandb support
4 years ago