Skip to content

CNNTiling#59

Open
runwangdl wants to merge 2 commits intopulp-platform:mainfrom
runwangdl:CNNTiling
Open

CNNTiling#59
runwangdl wants to merge 2 commits intopulp-platform:mainfrom
runwangdl:CNNTiling

Conversation

@runwangdl
Copy link

  • pulp_conv_naive_fp32.c/h: add dw_kernel_weight_grad_padded and dw_kernel_input_grad_padded for DW ConvGrad with non-zero padding or stride != 1. Use precomputed valid-range loops (ho_min/ho_max, wo_min/wo_max) instead of inner-loop conditionals to avoid a GCC -O3 -ffast-math miscompile on RISC-V.

  • pulp_conv_dw_fp32.c: dispatch to padded kernels in both pulp_conv_dw_fp32_bw_param_grads_cl and pulp_conv_dw_fp32_bw_input_grads_cl when padding or stride != 1; forward stride/padding fields that were previously missing.

  • pulp_im2col_fp32.c: remove overly-strict validity check that returned early (leaving the buffer uninitialized) whenever (Hin - Hk + pad) was not divisible by stride.

  • pulp_conv2d_fp32.c: pass actual Lpad/Rpad/Upad/Dpad to im2col in pulp_conv2d_fp32_bw_param_grads_cl (previously hard-coded to 0).

- pulp_conv_naive_fp32.c/h: add dw_kernel_weight_grad_padded and
  dw_kernel_input_grad_padded for DW ConvGrad with non-zero padding
  or stride != 1. Use precomputed valid-range loops (ho_min/ho_max,
  wo_min/wo_max) instead of inner-loop conditionals to avoid a
  GCC -O3 -ffast-math miscompile on RISC-V.

- pulp_conv_dw_fp32.c: dispatch to padded kernels in both
  pulp_conv_dw_fp32_bw_param_grads_cl and
  pulp_conv_dw_fp32_bw_input_grads_cl when padding or stride != 1;
  forward stride/padding fields that were previously missing.

- pulp_im2col_fp32.c: remove overly-strict validity check that
  returned early (leaving the buffer uninitialized) whenever
  (Hin - Hk + pad) was not divisible by stride.

- pulp_conv2d_fp32.c: pass actual Lpad/Rpad/Upad/Dpad to im2col in
  pulp_conv2d_fp32_bw_param_grads_cl (previously hard-coded to 0).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant