-
Notifications
You must be signed in to change notification settings - Fork 26
Description
Hello!
I've recently been testing Dyffusion model using the Oisst dataset. I used the following code to train the interpolation:
python run.py experiment=oisst_pacific_interpolation work_dir=./myrun/interpolation_oisst trainer.max_epochs=5 datamodule.horizon=7 datamodule.window=4 datamodule.prediction_horizon=7
Then, I used the following code to train the dyffusion:
python run.py experiment=oisst_pacific_dyffusion work_dir=./myrun/dyffusion_oisst trainer.max_epochs=5 datamodule.horizon=7 datamodule.window=4 datamodule.prediction_horizon=7 diffusion.interpolator_run_id=on6bffjf
I successfully trained the interpolation model, but when I trained the dyffusion model, I've got an Error:
Error executing job with overrides: ['experiment=oisst_pacific_dyffusion', 'work_dir=./myrun/dyffusion_oisst', 'trainer.max_epochs=5', 'datamodule.horizon=7', 'datamodule.window=4', 'datamodule.prediction_horizon=7', 'diffusion.interpolator_run_id=on6bffjf']
Traceback (most recent call last):
File "/mpathc/wpeng/codes/python_github_2/dyffusion-main/run.py", line 22, in <module>
main()
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main
_run_hydra(
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
raise ex
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
return func()
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda>
lambda: hydra.run(
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "/mpathc/wpeng/codes/python_github_2/dyffusion-main/run.py", line 12, in main
return run_model(config)
File "/mpathc/wpeng/codes/python_github_2/dyffusion-main/src/train.py", line 101, in run_model
raise e
File "/mpathc/wpeng/codes/python_github_2/dyffusion-main/src/train.py", line 97, in run_model
fit(ckpt_filepath=ckpt_path)
File "/mpathc/wpeng/codes/python_github_2/dyffusion-main/src/train.py", line 93, in fit
trainer.fit(model, datamodule=datamodule, ckpt_path=ckpt_filepath)
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 560, in fit
call._call_and_handle_interrupt(
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 49, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 598, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1011, in _run
results = self._run_stage()
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1055, in _run_stage
self.fit_loop.run()
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 216, in run
self.advance()
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 458, in advance
self.epoch_loop.run(self._data_fetcher)
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 152, in run
self.advance(data_fetcher)
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 348, in advance
batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs)
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 185, in run
closure()
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 146, in __call__
self._result = self.closure(*args, **kwargs)
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 131, in closure
step_output = self._step_fn()
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 319, in _training_step
training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 329, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 391, in training_step
return self.lightning_module.training_step(*args, **kwargs)
File "/mpathc/wpeng/codes/python_github_2/dyffusion-main/src/experiment_types/_base_experiment.py", line 438, in training_step
loss_output = self.get_loss(batch) # either a scalar or a dict with key 'loss'
File "/mpathc/wpeng/codes/python_github_2/dyffusion-main/src/experiment_types/forecasting_multi_horizon.py", line 419, in get_loss
loss = self.model.get_loss(inputs=inputs, targets=x_last, **extra_kwargs)
File "/mpathc/wpeng/codes/python_github_2/dyffusion-main/src/diffusion/_base_diffusion.py", line 116, in get_loss
results = self(inputs, targets, **kwargs)
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1603, in _call_impl
result = forward_call(*args, **kwargs)
File "/mpathc/wpeng/codes/python_github_2/dyffusion-main/src/diffusion/_base_diffusion.py", line 106, in forward
return self.p_losses(targets, t=t, **kwargs)
File "/mpathc/wpeng/codes/python_github_2/dyffusion-main/src/diffusion/dyffusion.py", line 526, in p_losses
x_t[t_nonzero] = x_interpolated.to(x_t.dtype)
RuntimeError: The expanded size of the tensor (14400) must match the existing size (60) at non-singleton dimension 4. Target sizes: [63, 1, 60, 60, 14400]. Tensor sizes: [63, 1, 60, 60]
What I want to test is using 4 historical images to predict 7 future images.
Why did this error occur, and how should I train the model? Thank you!