feat(ROCm): Add BF16 support for conv kernels on HIP/ROCm by fchange · Pull Request #47 · ROCm/Paddle

fchange · 2026-04-13T10:12:13Z

Description

This PR adds bfloat16 (BF16) data type support for convolution kernels on AMD ROCm/HIP GPUs.

Problem

The PaddleOCR-VL model uses BF16 precision, but the native HIP/ROCm backend fails because conv kernels are not registered for BF16. This blocks running PaddleOCR-VL with the native backend on AMD GPUs.

Changes

1. paddle/phi/backends/gpu/rocm/miopen_desc.h

Added BFLOAT16 case to ToCudnnDataType() mapping to miopenBFloat16

2. paddle/phi/kernels/gpudnn/conv_kernel.cu

Registered phi::bfloat16 for conv2d kernel
Registered phi::bfloat16 for conv3d kernel
Registered phi::bfloat16 for depthwise_conv2d kernel

3. paddle/phi/kernels/gpudnn/conv_grad_kernel.cu

Registered phi::bfloat16 for conv2d_grad kernel
Registered phi::bfloat16 for conv3d_grad kernel
Registered phi::bfloat16 for conv2d_double_grad kernel
Registered phi::bfloat16 for conv3d_double_grad kernel
Registered phi::bfloat16 for depthwise_conv2d_double_grad kernel

4. test/legacy_test/test_hip_bf16_conv_kernel.py (new)

Added unit tests for BF16 conv2d forward and grouped conv on HIP

Motivation

This is a port of the same fix from PaddlePaddle/Paddle#78587 to the ROCm fork, enabling PaddleOCR-VL and other BF16 models to run on AMD ROCm GPUs using the native backend.

Testing

Added test_hip_bf16_conv_kernel.py with BF16 conv2d forward and grouped conv tests
Tests are gated behind core.is_compiled_with_rocm() check

cc: @PaddlePaddle/paddle-rocma

Register bfloat16 data type for conv2d, conv3d, depthwise_conv2d and their grad/double_grad kernels on HIP/ROCm platform. Changes: - Add BFLOAT16 case to ToCudnnDataType in miopen_desc.h - Register phi::bfloat16 for conv2d, conv3d, depthwise_conv2d kernels - Register phi::bfloat16 for conv2d_grad, conv3d_grad, conv2d_double_grad, conv3d_double_grad, depthwise_conv2d_double_grad kernels - Add test_hip_bf16_conv_kernel.py for BF16 conv validation This enables PaddleOCR-VL and other BF16 models to run on AMD ROCm GPUs using the native backend. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

Register bfloat16 for layer_norm and layer_norm_grad kernels on HIP. This is required for PaddleOCR-VL native backend which uses BF16 precision. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

fchange · 2026-04-13T11:10:00Z

Restore the BF16 registrations for conv2d / conv3d / depthwise conv kernels and the DataType::BFLOAT16 -> miopenBFloat16 mapping originally added by ROCm#47 and reverted on paddle_hackthon ahead of RDNA4 enablement. The change is gated at compile time by the existing #ifdef PADDLE_WITH_HIP block. Deployment to archs that lack native BF16 support should be handled via PADDLE_ROCM_OFFLOAD_ARCHS (paddle_hackthon's default list already covers the BF16-capable set: CDNA3/gfx942, CDNA4/gfx950, RDNA3/gfx1100- 1102, RDNA4/gfx1200-1201); if a downstream target needs to strip BF16 from the build, it can narrow the offload-arch list accordingly. No runtime arch queries are introduced.

fchange and others added 2 commits April 13, 2026 10:11

feat(ROCm): Add BF16 support for layer_norm kernels on HIP/ROCm

c086e92

Register bfloat16 for layer_norm and layer_norm_grad kernels on HIP. This is required for PaddleOCR-VL native backend which uses BF16 precision. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

M4jupitercannon merged commit 29d1c6f into ROCm:develop Apr 14, 2026
1 check passed

austin1997 mentioned this pull request Apr 22, 2026

[ROCm] Enable BF16 softmax + gate cuDNN-only conv2d_add fuse passes on HIP #48

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ROCm): Add BF16 support for conv kernels on HIP/ROCm#47

feat(ROCm): Add BF16 support for conv kernels on HIP/ROCm#47
M4jupitercannon merged 2 commits into
ROCm:developfrom
fchange:hip-bf16-conv-support

fchange commented Apr 13, 2026

Uh oh!

fchange commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fchange commented Apr 13, 2026

Description

Problem

Changes

Motivation

Testing

Uh oh!

fchange commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants