A pytorch project for fast runing deep learning and iterating version.
run.py # main function for runing
trainer.py # trainer, train and test model
config.yaml # global config file, loaded in run.py
dataset.py # define personal dataset, used in run.py with dataloader
loss.py # define personal loss, used in train.py
net.py # define personal network, used in run.py
utils.py # define other functions
# define your dataset, loss, net, trainer and config file
ssh run.sh- solve problem with bool command line arguments always being recognized as True
- add
load_in_memoryfor dataset for speed training - remove some above functions (Doesn't work well in the actual version iteration):
- save version project file for each version
- type of saved files is defined by
save_version_file_patternsin config file - if
load_epochin config file is set false, save files inruns/latest_projectandruns/version/project ifload_epochin config file is set epoch name in saved model filesave latest files inruns/latest_project.load files fromruns/version/projectfor testingrestore latest files fromruns/latest_project.
- type of saved files is defined by
- save version project file for each version
- save version project file for each version
- type of saved files is defined by
save_version_file_patternsin config file - if
load_epochin config file is set false, save files inruns/latest_projectandruns/version/project - if
load_epochin config file is set epoch name in saved model file- save latest files in
runs/latest_project - load files from
runs/version/projectfor testing - restore latest files from
runs/latest_project
- save latest files in
- type of saved files is defined by
- save tensorboard file, running log, config file, model state dict for each version
- use
tensorboard --logdir=runsfor visualization - load model parameters from saved model files
- read runing log for training output
- use
- easily change train parameters
random_seed: set random seed for each trainingepochs: set total training epochsbatch_size: set batch sizenum_workers: set number of processeslr: set learning ratedevice_ids: gpu device ids, use cpu if none; if its number greater than 1, use data parallelvalid_every_epochs: epochs of validing modelearly_stop_epochs: epochs of early stop, set negative number for not usingstart_save_model_epochs: greater than it will save modelsave_model_interval_epochs: interval epochs for saving model
- add parameters in config file to args, so it can use
args.xxxto use these parameters easily