Add Temporal half-rounding boundary tests for PlainDate, PlainDateTime, PlainYearMonth, and ZonedDateTime#4996
Conversation
…and dayOfWeek The existing rounding mode tests for PlainDate.prototype.since() and PlainDate.prototype.until() use dates that produce ~31.97 months of difference, well above the 0.5 boundary. All half-* rounding modes produce identical results, making it impossible to distinguish halfExpand from halfTrunc, or halfEven from halfCeil. These new tests use dates that produce exactly 0.5 fractional progress (183/366 days in a leap year), causing all nine rounding modes to produce distinct result patterns. The 2.5-year case specifically distinguishes halfEven (rounds to nearest even integer 2) from halfExpand (rounds away from zero to 3). Also adds: - inLeapYear century-year tests (1700, 1800, 1900, 2100, 2200) exercising the 100/400 rule that the basic test does not cover - dayOfWeek tests across all 12 months of a year, since the basic test only checks 7 consecutive days within a single month Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove unrelated dayOfWeek and inLeapYear tests. Add RoundRelativeDuration spec references to info blocks. Extend half-boundary coverage to PlainDateTime, PlainYearMonth, and ZonedDateTime (until + since). PlainYearMonth uses June-starting dates because RoundRelativeDuration converts month remainders to days: Jun-Nov = 183 days in a 366-day year span (Jun 2019 - Jun 2020 crossing Feb 29), giving exactly 183/366 = 0.5. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ptomato
left a comment
There was a problem hiding this comment.
Thanks! This looks good at first glance.
A couple questions and comments:
- Would you mind sharing approximately the process you used for prompting Claude Code and turning the output into this PR? As the technology is new we are still finding our way around. Thank you for disclosing it up front, by the way.
- Did the mutation tool only find a lack of test coverage when rounding to years, or is the coverage for units such as months also lacking? It might make sense to find similar pairs of objects for other units. (If you do that, it probably makes sense to loop over rounding modes that have the same outcome in each case, to prevent the tests from getting overly long.)
- To make sure we are testing what we expect to be testing, it might be helpful to add assertions at the beginning such as
assert.sameValue(earlier1.since(later).total({ unit: "years", relativeTo: earlier1 }), -1.5, "duration is on a 0.5 boundary");
| @@ -0,0 +1,121 @@ | |||
| // Copyright (C) 2026 Rudi Theunissen. All rights reserved. | |||
There was a problem hiding this comment.
Well that's an interesting hallucination. Thanks, fixed.
There was a problem hiding this comment.
Hah! Yeah not me, sorry. I'm flattered though thanks Claude.
| TemporalHelpers.assertDuration( | ||
| earlier1.since(later, { smallestUnit: "years", roundingMode: "halfExpand" }), | ||
| -2, 0, 0, 0, 0, 0, 0, 0, 0, 0, | ||
| "-1.5 years, halfExpand rounds 0.5 away from zero" |
There was a problem hiding this comment.
What does the "0.5" refer to here? Please use consistent phrasing
There was a problem hiding this comment.
Replaced with "breaks ties."
I told it to write a minimal transpiler for the syntax used in the Temporal tests and it came up with this. It grew as it implemented more classes and methods that required different syntax to be converted. Then I told it to start implementing classes and methods one by one against the test suite. At one point it was pretty much done and all that was left were concepts that are untranslatable, like JS I was wondering whether there was any dead or untested code in there (which there shouldn't if test262 is exhaustive). I fired up my go-to technique for that, mutation testing, using Infection. To my surprise it actually flagged some mutants, specifically pointing out that the rounding modes were pretty much uncovered. I then told it in a different session (directly in the test262 repo) to double and triple check and it and it was positive that that's an actual gap in the test suite. I told it to add the tests, it did it, verified them against V8 and I let it open the PR, making sure to include the fact that this was written by an agent.
Yeah of course, that was important to me. We're all still figuring out how to handle this stuff and I thought it would be important to be upfront with that.
Claude Code did not flag other units by itself specifically, and I didn't ask. Will check tomorrow.
Great idea, I will add that. @ptomato Thanks for taking the time. |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clarifies the half-* rounding mode assertion messages by using standard tie-breaking terminology for consistency with non-half mode phrasing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The copyright hallucination makes me very skeptical of this contribution.
|
I checked the code, not the boilerplate-looking legal header.
Absolutely fair. I'm skeptical of AI-written code that I haven't generated myself either. But given the radical shift in the last couple of months I think these kinds of policies will need to be updated. What a weird point in time; people still see anything involving AIs instinctively as low-quality (myself included) while even the greatest skeptics have started using it all the time. Personally, I'm in favor of being transparent with the use of AI and allowing it in a limited capacity instead of banning it outright (and people using it anyway without telling). |
Note
This PR was drafted with the help of Claude Code. Apologies if that's not welcome here — happy to revise anything by hand.
Context
We're building temporal-php, a PHP 8.4 port of the TC39 Temporal API. We run the test262 suite as part of our CI (transpiled to PHP) and also use Infection for mutation testing. Infection systematically modifies source code and checks whether the test suite catches each mutation. Out of 11,983 mutations, several hundred escaped — and many of those escaped because the upstream test262 data doesn't exercise certain code paths. This PR adds tests to close the most impactful gaps we found.
Summary
Eight new test files across four Temporal types, all exercising the exact 0.5 fractional boundary in
RoundRelativeDuration:PlainDate,PlainDateTime,ZonedDateTime(since/anduntil/): Test all 9 rounding modes at exact 0.5 fractional progress (183 days out of a 366-day leap year). The existingroundingmode-*.jstests use dates producing ~2.663 fractional years, where all half-* modes round identically. These new tests include both 1.5-year (odd integer) and 2.5-year (even integer) cases to distinguishhalfEvenfromhalfExpand.PlainYearMonth(since/anduntil/): Same boundary test, but uses June-starting dates becauseRoundRelativeDurationconverts month remainders to days: Jun–Nov = 183 days in the 366-day year span from Jun 2019 to Jun 2020 (crossing Feb 29), giving exactly 183/366 = 0.5.The
untiltests cover the positive direction; thesincetests cover the negative direction (wherehalfExpandandhalfCeildiverge).How the gaps were found
Infection rewrites code like swapping
halfExpandmatch arms, etc. If no test fails, the mutant "escapes." We traced ~80 escaped mutants back to the fact that all half-* rounding modes produce the same output with the current test data (no value near the 0.5 boundary), so entirematcharms can be deleted or swapped without detection.All expected values were verified against V8's
Temporalimplementation viatest262-harnesswithesvu-installed V8 (d8).