Skip to content

fix nan box issue#158

Closed
canxin121 wants to merge 2 commits intoopenhwgroup:developfrom
HardwareFuzz:develop
Closed

fix nan box issue#158
canxin121 wants to merge 2 commits intoopenhwgroup:developfrom
HardwareFuzz:develop

Conversation

@canxin121
Copy link
Copy Markdown

@canxin121 canxin121 commented Oct 24, 2025

@rgiunti
Copy link
Copy Markdown
Contributor

rgiunti commented Dec 3, 2025

I checked the PR performing a regression test with https://github.com/FondazioneChipsIT/cvfpu-uvm.git. The test runs 10000 random transactions with random operation, operands, FP format (between FP64 and FP32) and FP rounding mode (between RNE, RTZ, RDN and RUP) repeated for 10 different seeds where the results are compared with those given by the MPFR golden model. The test passed without errors, the PR can be merged in my opinion.

@canxin121
Copy link
Copy Markdown
Author

I checked the PR performing a regression test with https://github.com/FondazioneChipsIT/cvfpu-uvm.git. The test runs 10000 random transactions with random operation, operands, FP format (between FP64 and FP32) and FP rounding mode (between RNE, RTZ, RDN and RUP) repeated for 10 different seeds where the results are compared with those given by the MPFR golden model. The test passed without errors, the PR can be merged in my opinion.

I also conducted extensive testing using my own differential testing framework and random instruction generation library, and the results are currently largely consistent with Spike's calculations.

@IhsaneTahir
Copy link
Copy Markdown
Contributor

Hi @rgiunti, the current CVFPU UVM environment always NaN-boxes input operands. That’s why this specific bug hasn’t shown up, and also why it won’t be able to indicate whether this PR fixes the issue. I’m currently updating the UVM environment to make NaN boxing configurable, so we can first reproduce the bug and then verify whether the proposed fix works.

@rgiunti
Copy link
Copy Markdown
Contributor

rgiunti commented Feb 16, 2026

Hi @rgiunti, the current CVFPU UVM environment always NaN-boxes input operands. That’s why this specific bug hasn’t shown up, and also why it won’t be able to indicate whether this PR fixes the issue. I’m currently updating the UVM environment to make NaN boxing configurable, so we can first reproduce the bug and then verify whether the proposed fix works.

Hi @IhsaneTahir, yes you're right thank you for your job, please keep me updated about that.

@MikeOpenHWGroup
Copy link
Copy Markdown
Member

@canxin121, @rgiunti and @IhsaneTahir, is it possible that the original developers of fpnew/cvfpu assumed that the host core would perform the Nan-boxing? @davideschiavone, would you have any insight about that? What does the CV32E40P do?

@rgiunti
Copy link
Copy Markdown
Contributor

rgiunti commented Mar 13, 2026

Hi @MikeOpenHWGroup,
I've never worked with CV32E40P, however I had a look in its RTL and having a look at its fpu wrapper:

fpnew_top #(
      .Features      (FPU_FEATURES),
      .Implementation(FPU_IMPLEMENTATION),
      .TagType       (logic)
  ) i_fpnew_bulk (
  ...

where:

localparam fpnew_pkg::fpu_features_t FPU_FEATURES = '{
  ...
  EnableNanBox:  1'b0,
  ...
};

where you can see that the NaN Boxing check is disabled for the cvfpu instantiated in CV32E40P. As far as I've understood this is because in CV32E40P Nan Boxing is managed by the core as you hypotized:

OPCODE_LOAD_FP: begin
        if (FPU == 1 && ZFINX == 0 && fs_off_i == 1'b0) begin
          data_req            = 1'b1;
          regfile_mem_we      = 1'b1;

          ...

          // NaN boxing
          data_sign_extension_o = 2'b10;

However I did not find an analogous handling of Nan Boxing in CVA6, that's why the FPU_FEATURES in this case have always EnableNanBox=1, so that the CVFPU could check and eventually perform it.

@rgiunti
Copy link
Copy Markdown
Contributor

rgiunti commented Mar 13, 2026

Hi @IhsaneTahir,
with the updated UVM environment I reproduced the issues #2449 and #3123. I then verified that the PR solves the problem. Since FP16 is not yet supported I've not reproduced the issues #3124 and #3125 which are inherent to FP16 but looking at the changes introduced by the PR it should be working also for half-precision. If you agree I can merge the PR.

@IhsaneTahir
Copy link
Copy Markdown
Contributor

IhsaneTahir commented Mar 13, 2026

Hi @canxin121,

Thanks for proposing this fix. To summarize the root cause for the record: the divsqrt wrappers fpnew_divsqrt_th_64_multi.sv and fpnew_divsqrt_th_32.sv were unconditionally NaN-boxing input operands before forwarding them to their respective div/sqrt units (THead's c910 and e906), regardless of whether they were NaN-boxed upstream or not. This meant that non-NaN-boxed inputs, which should have been treated as canonical NaNs per the RISC-V spec, were instead being passed through with their lower n (<FLEN) bits interpreted directly.

In theory, your fix does resolve the issue, however I believe the correct fix is simpler: simply remove the unconditional NaN-boxing from the divsqrt wrappers entirely. for example the following block in fpnew_divsqrt_th_64_multi.sv

// NaN-box inputs with max WIDTH
if(WIDTH == 64) begin : gen_fmt_64_bits
always_comb begin : NaN_box_inputs
if(divsqrt_fmt_q == 4'b1000) begin // 64-bit
srcf0[63:0] = srcf0_q[63:0];
srcf1[63:0] = srcf1_q[63:0];
end else if(divsqrt_fmt_q == 4'b0100) begin // 32-bit
srcf0[63:32] = '1;
srcf1[63:32] = '1;
srcf0[31:0] = srcf0_q[31:0];
srcf1[31:0] = srcf1_q[31:0];
end else if((divsqrt_fmt_q == 4'b0010) || (divsqrt_fmt_q == 4'b0001)) begin //16-bit
srcf0[63:16] = '1;
srcf1[63:16] = '1;
srcf0[15:0] = srcf0_q[15:0];
srcf1[15:0] = srcf1_q[15:0];
end else begin // Unsupported
srcf0[63:0] = '1;
srcf1[63:0] = '1;
end
end
end else if (WIDTH == 32) begin : gen_fmt_32_bits
always_comb begin : NaN_box_inputs
if(divsqrt_fmt_q == 4'b0100) begin // 32-bit
srcf0[63:32] = '1;
srcf1[63:32] = '1;
srcf0[31:0] = srcf0_q[31:0];
srcf1[31:0] = srcf1_q[31:0];
end else if((divsqrt_fmt_q == 4'b0010) || (divsqrt_fmt_q == 4'b0001)) begin // 16-bit
srcf0[63:16] = '1;
srcf1[63:16] = '1;
srcf0[15:0] = srcf0_q[15:0];
srcf1[15:0] = srcf1_q[15:0];
end else begin // Unsupported
srcf0[63:0] = '1;
srcf1[63:0] = '1;
end
end
end else if (WIDTH == 16) begin : gen_fmt_16_bits
always_comb begin : NaN_box_inputs
if((divsqrt_fmt_q == 4'b0010) || (divsqrt_fmt_q == 4'b0001)) begin // 16-bit
srcf0[63:16] = '1;
srcf1[63:16] = '1;
srcf0[15:0] = srcf0_q[15:0];
srcf1[15:0] = srcf1_q[15:0];
end else begin // Unsupported
srcf0[63:0] = '1;
srcf1[63:0] = '1;
end
end
end else begin
$fatal(1, "DivSqrt THMULTI: Unsupported WIDTH (the supported width are 64, 32, 16)");
end

The reason is that the T-Head div/sqrt units (C910 and E906) already implement NaN-boxing detection internally. For example, in C910 (ct_vfdsu_prepare.v#L335-L340):

// cNaN
assign ex1_op0_cnan = ex1_scalar && !ex1_double && !ex1_oper0_high_all1;
// qNaN
assign ex1_op0_qnan = (ex1_expnt0_max && ex1_frac0_msb) || ex1_op0_cnan;

Since NaN-boxing is already handled downstream by the units themselves, the wrapper should simply forward the operands as-is.

    .dp_vfdsu_ex1_pipex_srcf0       ( srcf0_q                     ), // Input for operand 0
    .dp_vfdsu_ex1_pipex_srcf1       ( srcf1_q                     ), // Input for operand 1

Adding conditional logic around the NaN-boxing in the wrapper, as this PR does, fixes the symptom but duplicates responsibility that already exists further down the pipeline.

I validated this proposed bug fix for fpnew_divsqrt_th_64_multi.sv in the cvfpu-uvm environment.

@rgiunti, could you hold off on merging until the PR is updated?

@IhsaneTahir
Copy link
Copy Markdown
Contributor

Hi @canxin121,

Just checking in on this PR. I left a comment earlier suggesting an alternative approach, do you plan on updating the PR to incorporate that?

Ideally, I’d like to have this resolved by the end of the week. If that timing doesn’t work for you, no worries, just let me know, otherwise I may go ahead and open a new PR to move things forward.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants