Skip to content

Errors for Runing Experiment Oisst #11

@BlackRab

Description

@BlackRab

Hello!

I've recently been testing Dyffusion model using the Oisst dataset. I used the following code to train the interpolation:
python run.py experiment=oisst_pacific_interpolation work_dir=./myrun/interpolation_oisst trainer.max_epochs=5 datamodule.horizon=7 datamodule.window=4 datamodule.prediction_horizon=7

Then, I used the following code to train the dyffusion:
python run.py experiment=oisst_pacific_dyffusion work_dir=./myrun/dyffusion_oisst trainer.max_epochs=5 datamodule.horizon=7 datamodule.window=4 datamodule.prediction_horizon=7 diffusion.interpolator_run_id=on6bffjf

I successfully trained the interpolation model, but when I trained the dyffusion model, I've got an Error:

Error executing job with overrides: ['experiment=oisst_pacific_dyffusion', 'work_dir=./myrun/dyffusion_oisst', 'trainer.max_epochs=5', 'datamodule.horizon=7', 'datamodule.window=4', 'datamodule.prediction_horizon=7', 'diffusion.interpolator_run_id=on6bffjf']
Traceback (most recent call last):
  File "/mpathc/wpeng/codes/python_github_2/dyffusion-main/run.py", line 22, in <module>
    main()
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main
    _run_hydra(
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
    _run_app(
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app
    run_and_report(
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
    raise ex
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
    return func()
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda>
    lambda: hydra.run(
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
  File "/mpathc/wpeng/codes/python_github_2/dyffusion-main/run.py", line 12, in main
    return run_model(config)
  File "/mpathc/wpeng/codes/python_github_2/dyffusion-main/src/train.py", line 101, in run_model
    raise e
  File "/mpathc/wpeng/codes/python_github_2/dyffusion-main/src/train.py", line 97, in run_model
    fit(ckpt_filepath=ckpt_path)
  File "/mpathc/wpeng/codes/python_github_2/dyffusion-main/src/train.py", line 93, in fit
    trainer.fit(model, datamodule=datamodule, ckpt_path=ckpt_filepath)
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 560, in fit
    call._call_and_handle_interrupt(
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 49, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 598, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1011, in _run
    results = self._run_stage()
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1055, in _run_stage
    self.fit_loop.run()
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 216, in run
    self.advance()
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 458, in advance
    self.epoch_loop.run(self._data_fetcher)
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 152, in run
    self.advance(data_fetcher)
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 348, in advance
    batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs)
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 185, in run
    closure()
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 146, in __call__
    self._result = self.closure(*args, **kwargs)
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 131, in closure
    step_output = self._step_fn()
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 319, in _training_step
    training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 329, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 391, in training_step
    return self.lightning_module.training_step(*args, **kwargs)
  File "/mpathc/wpeng/codes/python_github_2/dyffusion-main/src/experiment_types/_base_experiment.py", line 438, in training_step
    loss_output = self.get_loss(batch)  # either a scalar or a dict with key 'loss'
  File "/mpathc/wpeng/codes/python_github_2/dyffusion-main/src/experiment_types/forecasting_multi_horizon.py", line 419, in get_loss
    loss = self.model.get_loss(inputs=inputs, targets=x_last, **extra_kwargs)
  File "/mpathc/wpeng/codes/python_github_2/dyffusion-main/src/diffusion/_base_diffusion.py", line 116, in get_loss
    results = self(inputs, targets, **kwargs)
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mpathc/wpeng/codes/0_python_env/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1603, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/mpathc/wpeng/codes/python_github_2/dyffusion-main/src/diffusion/_base_diffusion.py", line 106, in forward
    return self.p_losses(targets, t=t, **kwargs)
  File "/mpathc/wpeng/codes/python_github_2/dyffusion-main/src/diffusion/dyffusion.py", line 526, in p_losses
    x_t[t_nonzero] = x_interpolated.to(x_t.dtype)
RuntimeError: The expanded size of the tensor (14400) must match the existing size (60) at non-singleton dimension 4.  Target sizes: [63, 1, 60, 60, 14400].  Tensor sizes: [63, 1, 60, 60]

What I want to test is using 4 historical images to predict 7 future images.

Why did this error occur, and how should I train the model? Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions