Ross Wightman
dff33730b3
Merge remote-tracking branch 'origin/master' into bits_and_tpu
3 years ago
Ross Wightman
fd360ac951
Merge pull request #1266 from kaczmarj/enh/no-star-imports
...
ENH: replace star imports with imported names in train.py
3 years ago
Jakub Kaczmarzyk
ce5578bc3a
replace star imports with imported names
3 years ago
Jakub Kaczmarzyk
dcad288fd6
use argparse groups to group arguments
3 years ago
Jakub Kaczmarzyk
e1e4c9bbae
rm whitespace
3 years ago
han
a16171335b
fix: change milestones to decay-milestones
...
- change argparser option `milestone` to `decay-milestone`
3 years ago
han
57a988df30
fix: multistep lr decay epoch bugs
...
- add milestones arguments
- change decay_epochs to milestones variable
3 years ago
Ross Wightman
749856cf25
Merge remote-tracking branch 'origin/norm_norm_norm' into bits_and_tpu
3 years ago
Ross Wightman
b049a5c5c6
Merge remote-tracking branch 'origin/master' into norm_norm_norm
3 years ago
Ross Wightman
04db5833eb
Merge pull request #986 from hankyul2/master
...
fix: typo of argment parser desc in train.py
3 years ago
Ross Wightman
da2796ae82
Add webdataset (WDS) support, update TFDS to make some naming in parsers more similar. Fix workers=0 compatibility. Add ImageNet22k/12k synset defs.
3 years ago
Ross Wightman
0557c8257d
Fix bug introduced in non layer_decay weight_decay application. Remove debug print, fix arg desc.
3 years ago
Ross Wightman
a16ea1e355
Merge remote-tracking branch 'origin/norm_norm_norm' into bits_and_tpu
3 years ago
Ross Wightman
372ad5fa0d
Significant model refactor and additions:
...
* All models updated with revised foward_features / forward_head interface
* Vision transformer and MLP based models consistently output sequence from forward_features (pooling or token selection considered part of 'head')
* WIP param grouping interface to allow consistent grouping of parameters for layer-wise decay across all model types
* Add gradient checkpointing support to a significant % of models, especially popular architectures
* Formatting and interface consistency improvements across models
* layer-wise LR decay impl part of optimizer factory w/ scale support in scheduler
* Poolformer and Volo architectures added
3 years ago
Ross Wightman
fafece230b
Allow changing base lr batch size from 256 via arg
3 years ago
Ross Wightman
7148039f9f
Tweak base lr log
3 years ago
Ross Wightman
f82fb6b608
Add base lr w/ linear and sqrt scaling to train script
3 years ago
Ross Wightman
95cfc9b3e8
Merge remote-tracking branch 'origin/master' into norm_norm_norm
3 years ago
Ross Wightman
abc9ba2544
Transitioning default_cfg -> pretrained_cfg. Improving handling of pretrained_cfg source (HF-Hub, files, timm config, etc). Checkpoint handling tweaks.
3 years ago
Ross Wightman
f0f9eccda8
Add --fuser arg to train/validate/benchmark scripts to select jit fuser type
3 years ago
Ross Wightman
5ccf682a8f
Remove deprecated bn-tf train arg and create_model handler. Add evos/evob models back into fx test filter until norm_norm_norm branch merged.
3 years ago
Ross Wightman
4c8bb295ab
Remove bn-tf arg
3 years ago
han
ab5ae32f75
fix: typo of argment parser desc in train.py
...
- Remove duplicated `of`
3 years ago
Ross Wightman
4f338556d8
Fixes and improvements for metrics, tfds parser, loader / transform handling
...
* add back ability to create transform with loader
* change 'samples' -> 'examples' for tfds wrapper to match tfds naming
* add support for specifying feature names for input and target in tfds wrapper
* add class_to_idx for image classification datasets in tfds wrapper
* add accumulate_type to avg meters and metrics to allow float32 or float64 accumulation control with lower prec data
* minor cleanup, log output rate prev and avg
3 years ago
Ross Wightman
80ca078aed
Fix a few bugs and formatting/naming issues
...
* Pass optimizer resume flag through to checkpoint / updater restore. Related to #961 but not clear how relates to crash.
* Rename monitor step args, cleanup handling of step_end_idx vs num_steps for consistent log output in either case
* Resume from proper epoch (ie next epoch relative to checkpoint)
3 years ago
Ross Wightman
406c486ba2
Merge remote-tracking branch 'origin/more_datasets' into bits_and_tpu
3 years ago
Ross Wightman
ba65dfe2c6
Dataset work
...
* support some torchvision datasets
* improvements to TFDS wrapper for subsplit handling (fix #942 ), shuffle seed
* add class-map support to train (fix #957 )
3 years ago
Ross Wightman
cd638d50a5
Merge pull request #880 from rwightman/fixes_bce_regnet
...
A collection of fixes, model experiments, etc
3 years ago
Ross Wightman
d9abfa48df
Make broadcast_buffers disable its own flag for now (needs more testing on interaction with dist_bn)
3 years ago
Ross Wightman
80075b0b8a
Add worker_seeding arg to allow selecting old vs updated data loader worker seed for (old) experiment repeatability
3 years ago
Shoufa Chen
908563d060
fix `use_amp`
...
Fix https://github.com/rwightman/pytorch-image-models/issues/881
3 years ago
Ross Wightman
25d52ea71d
Merge remote-tracking branch 'origin/fixes_bce_regnet' into bits_and_tpu
3 years ago
Ross Wightman
0387e6057e
Update binary cross ent impl to use thresholding as an option (convert soft targets from mixup/cutmix to 0, 1)
3 years ago
Ross Wightman
3581affb77
Update train.py with some flags related to scheduler tweaks, fix best checkpoint bug.
3 years ago
Ross Wightman
0639d9a591
Fix updated validation_batch_size fallback
3 years ago
Ross Wightman
5db057dca0
Fix misnamed arg, tweak other train script args for better defaults.
3 years ago
Ross Wightman
fb94350896
Update training script and loader factory to allow use of scheduler updates, repeat augment, and bce loss
3 years ago
Ross Wightman
f2e14685a8
Add force-cpu flag for train/validate, fix CPU fallback for device init, remove old force cpu flag for EMA model weights
3 years ago
Ross Wightman
0d82876132
Add comment for reference re PyTorch XLA 'race' issue
3 years ago
Ross Wightman
40457e5691
Transforms, augmentation work for bits, add RandomErasing support for XLA (pushing into transforms), revamp of transform/preproc config, etc ongoing...
3 years ago
SamuelGabriel
7c19c35d9f
Global instead of local rank.
3 years ago
Ross Wightman
c3db5f5801
Worker hack for TFDS eval, add TPU env var setting.
3 years ago
Ross Wightman
f411724de4
Fix checkpoint delete issue. Add README about bits and initial Pytorch XLA usage on TPU-VM. Add some FIXMEs and fold train_cfg into train_state by default.
3 years ago
Ross Wightman
91ab0b6ce5
Add proper TrainState checkpoint save/load. Some reorg/refactoring and other cleanup. More to go...
3 years ago
Ross Wightman
5b9c69e80a
Add basic training resume based on legacy code
4 years ago
Ross Wightman
cbd4ee737f
Fix model init for XLA, remove some prints.
4 years ago
Ross Wightman
6d90fcf282
Fix distribute_bn and model_ema
4 years ago
Ross Wightman
aa92d7b1c5
Major timm.bits update. Updater and DeviceEnv now dataclasses, after_step closure used, metrics base impl w/ distributed reduce, many tweaks/fixes.
4 years ago
Ross Wightman
76de984a5f
Fix some bugs with XLA support, logger, add hacky xla dist launch script since torch.dist.launch doesn't work
4 years ago
Ross Wightman
12d9a6d4d2
First timm.bits commit, add initial abstractions, WIP updates to train, val... some of it working
4 years ago