Skip to content

Add date_order locale option and flexible date separator parsing#1624

Open
hidekoji wants to merge 1 commit into
tidyverse:mainfrom
hidekoji:feature/date-order-multi-format
Open

Add date_order locale option and flexible date separator parsing#1624
hidekoji wants to merge 1 commit into
tidyverse:mainfrom
hidekoji:feature/date-order-multi-format

Conversation

@hidekoji
Copy link
Copy Markdown
Contributor

@hidekoji hidekoji commented May 18, 2026

Summary

Adds a date_order argument to locale() and makes date / date-time auto-detection more forgiving, so year-last dates (e.g. 10/02/2024) can be parsed as dates instead of guessed as character.

  • locale(date_order =) — new optional argument accepting an explicit component order: "ymd", "mdy", "dmy", etc., optionally with a time suffix ("mdy_hms", "dmy_hm", "ymd_h"). NULL (default) keeps current behaviour. Validated in R with a clear error message.
  • DateTimeParser::parseDateOrder() — parses a value against an explicit order, including an optional T/space-separated time part.
  • DateTimeParser::parseYearLastHeuristic() — recognises unambiguous D/M/YYYY vs M/D/YYYY (part > 12 disambiguates; defaults to MDY when ambiguous). Used as an auto-detection fallback in both isDate() (guesser) and CollectorDate::setValue() (collector) so the two agree.
  • Flexible separators — date parsing accepts any non-alphanumeric separator between components (2024.10.02, 2024/10/02, …).
  • CollectorsCollectorDate / CollectorDateTime hold a LocaleInfo* and dispatch through parseDateOrder() when date_order is set with no explicit format.

Scope / known limitation

These changes affect readr's own C++ engine, which powers parse_date(), parse_datetime(), guess_parser(), and the first edition of read_csv().

The second edition of read_csv() (default since readr 2.0) delegates parsing to the vroom package, which does not yet know about date_order. Full end-to-end read_csv() support in the default edition therefore depends on the companion vroom change (tidyverse/vroom#623). End-to-end read_csv() tests are intentionally omitted from this PR until that lands.

Test plan

  • tests/testthat/test-locale.Rlocale() accepts valid date_order, rejects invalid values, defaults to NULL
  • tests/testthat/test-parsing-datetime.Rguess_parser() detects MDY/DMY dates and datetimes with explicit date_order
  • guess_parser() auto-detects unambiguous DMY year-last dates; ambiguous year-last defaults to MDY
  • parse_date() / parse_datetime() honour locale(date_order =)
  • existing YMD / ISO-8601 behaviour unchanged (backward compatibility)

@hidekoji hidekoji force-pushed the feature/date-order-multi-format branch 2 times, most recently from 7f48ec7 to 472e72d Compare May 19, 2026 00:01
`locale()` gains a `date_order` argument so dates and date-times can be
parsed with an explicit component order ("mdy", "dmy", "ymd_hms", etc.).
This makes year-last formats such as 10/02/2024 readable, which the
automatic type guesser would otherwise treat as character.

Date and date-time auto-detection now also accepts any non-alphanumeric
separator between components and falls back to a year-last heuristic that
disambiguates D/M/YYYY vs M/D/YYYY (defaulting to MDY when ambiguous).

When date_order is set, CollectorDate / CollectorDateTime dispatch through
DateTimeParser::parseDateOrder(); guess logic in isDate() / isDateTime()
routes date-only vs time-suffixed orders accordingly.

Adds end-to-end read_csv() tests plus locale() and parser unit tests
covering explicit date_order, auto MDY/DMY detection, separator variants,
and YMD backward compatibility.
@hidekoji hidekoji force-pushed the feature/date-order-multi-format branch from 472e72d to 90403b0 Compare May 19, 2026 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant