Skip to content

[amd] convolution kernel didn't reuse the algorithm founded. #11203

@dzhwinter

Description

@dzhwinter

My PR fix the issue above https://github.com/dzhwinter/Paddle/tree/review_conv2d_1
The cudnn op is run on Cuda device, so its inputs/outputs must stay at Cuda device. In ROCm#16, it use CPU Tensor to store the algorithm selected, but our framework will automatically transform it into a temporary GPU Tensor. As a result, inside cudnn op, it can not get the real persistent Tensor.

If we allocated output and input in GPU, and copy the result to CPU, then we will get the correct result.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions