From e5b6eeb210d9a294a3787d0b51c36c0465b67c0a Mon Sep 17 00:00:00 2001
From: Sebastian Fischer <sebf.fischer@gmail.com>
Date: Tue, 5 May 2026 11:01:41 +0200
Subject: [PATCH 1/4] docs: add gotchas vignette

---
 pkgdown/_pkgdown.yml                  |   2 +
 vignettes/differences-from-base-r.Rmd | 119 ++++++++++++++++++++++++++
 2 files changed, 121 insertions(+)
 create mode 100644 vignettes/differences-from-base-r.Rmd

diff --git a/pkgdown/_pkgdown.yml b/pkgdown/_pkgdown.yml
index 2d3f4455..72a9b34e 100644
--- a/pkgdown/_pkgdown.yml
+++ b/pkgdown/_pkgdown.yml
@@ -39,6 +39,8 @@ navbar:
         href: articles/random-numbers.html
       - text: Type Promotion
         href: articles/type-promotion.html
+      - text: Differences from base R
+        href: articles/differences-from-base-r.html
       - text: Efficiency
         href: articles/efficiency.html
       - text: FAQ
diff --git a/vignettes/differences-from-base-r.Rmd b/vignettes/differences-from-base-r.Rmd
new file mode 100644
index 00000000..b3984a23
--- /dev/null
+++ b/vignettes/differences-from-base-r.Rmd
@@ -0,0 +1,119 @@
+---
+title: "Differences from base R"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Differences from base R}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+This vignette lists major behavioral differences between {anvl} and base R that R users should be aware of when working with `AnvlArray`s.
+
+```{r}
+library(anvl)
+```
+
+## Row-major vs column-major ordering
+
+R stores matrices and arrays in *column-major* order, while {anvl} (following XLA) uses *row-major* order.
+This makes no difference when you only use shape-aware operations (subsetting, matrix multiplication, etc.) -- the indices are the same in both.
+The difference shows up when you flatten an array, because the underlying data is then traversed in a different order.
+
+Consider the 2x2 matrix below:
+
+```{r}
+m <- matrix(1:4, nrow = 2)
+m
+```
+
+In base R, `as.vector()` flattens it column-by-column, so we get `1, 2, 3, 4`:
+
+```{r}
+as.vector(m)
+```
+
+In {anvl}, reshaping to a length-4 vector traverses the data row-by-row, so we get `1, 3, 2, 4`:
+
+```{r}
+nv_reshape(m, shape = 4)
+```
+
+If you need column-major flattening in {anvl}, transpose first:
+
+```{r}
+nv_reshape(nv_transpose(m), shape = 4)
+```
+
+## No recycling
+
+Base R *recycles* the shorter operand when two vectors of different lengths are combined elementwise:
+
+```{r}
+c(1, 2, 3, 4) + c(1, 2)
+```
+
+{anvl} only auto-broadcasts *scalars* (operands with shape `integer()`).
+Adding a scalar to an array works as you would expect:
+
+```{r}
+nv_array(1:4) + 10L
+```
+
+But combining two non-scalar arrays of different shapes errors, even when one shape is a "tile" of the other:
+
+```{r, error = TRUE}
+nv_array(1:4) + nv_array(1:2)
+```
+
+When two non-scalar arrays differ only by size-1 dimensions (numpy-style broadcasting, e.g. shape `(2, 3)` and `(1, 3)`), use `nv_broadcast_arrays()` to align them explicitly first:
+
+```{r}
+a <- nv_array(matrix(1:6, nrow = 2))
+b <- nv_array(matrix(c(10, 20, 30), nrow = 1))
+xs <- nv_broadcast_arrays(a, b)
+xs[[1]] + xs[[2]]
+```
+
+Note that even `nv_broadcast_arrays()` cannot replicate R's recycling for shapes like `(4)` and `(2)` -- the shapes must be broadcast-compatible in the numpy sense.
+
+## No `NA`s
+
+R has a dedicated missing-value marker (`NA`) for every atomic type.
+{anvl} arrays do not -- there is no representation of "missing" at the XLA level.
+When you convert R values containing `NA` into an `AnvlArray`, the `NA`s are silently turned into the closest available value of the target dtype.
+For floating-point dtypes, that value is `NaN`:
+
+```{r}
+nv_array(NA_real_)
+```
+
+```{r}
+nv_array(c(1, NA, 3))
+```
+
+Round-tripping back to R produces `NaN`, not `NA`:
+
+```{r}
+as_array(nv_array(c(1, NA, 3)))
+```
+
+Integer dtypes have no `NaN`, but `NA_integer_` does *appear* to round-trip:
+
+```{r}
+nv_scalar(NA_integer_) |> as.integer()
+```
+
+This is misleading.
+R represents `NA_integer_` by reserving one specific 32-bit integer value (`-2^31 = -2147483648`) as a sentinel for missingness, leaving only `2^32 - 1` valid integers.
+{anvl} has no notion of missing values and just stores that bit pattern as a regular `i32`.
+The round-trip "works" only because R interprets the same bit pattern back as `NA` -- but inside {anvl} the value behaves like the integer `-2147483648`, and any computation on it (e.g. addition, comparison) will treat it as such rather than propagating missingness.
+The same caveat applies in reverse: if a genuine {anvl} computation produces `-2147483648`, converting back to R will silently turn it into `NA`.
+
+For logical (`bool`) dtype the situation is worst: there is no spare bit pattern at all, so a bare `NA` (which is logical) is silently turned into `TRUE`:
+
+```{r}
+nv_scalar(NA)
+as.logical(nv_scalar(NA))
+```
+
+If your data contains missing values, decide how to handle them *before* converting to an `AnvlArray`.

From c6d47855fb4c44153028ebc3946513f151a2a208 Mon Sep 17 00:00:00 2001
From: Sebastian Fischer <sebf.fischer@gmail.com>
Date: Tue, 5 May 2026 16:32:02 +0200
Subject: [PATCH 2/4] ...

---
 NEWS.md                                       |  3 +
 R/array.R                                     | 57 +++++++++++++++++--
 man/AnvlArray.Rd                              | 19 ++++++-
 man/as_array.Rd                               | 14 ++++-
 pkgdown/_pkgdown.yml                          |  4 +-
 ...ifferences-from-base-r.Rmd => gotchas.Rmd} | 34 ++++++++++-
 6 files changed, 117 insertions(+), 14 deletions(-)
 rename vignettes/{differences-from-base-r.Rmd => gotchas.Rmd} (70%)

diff --git a/NEWS.md b/NEWS.md
index c8df8a76..adb9c281 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -34,6 +34,9 @@
   * `nv_select()` to select a slice along a dimension by index.
 * `mean()` and `median()` now error when called with `na.rm = TRUE`, since
   anvl arrays do not carry `NA`s. `mean()` also rejects non-zero `trim`.
+* `nv_array()`, `nv_scalar()`, and `as_array()` gained a `scan_na` argument
+  that opts into checking for `NA` values during host -> device and
+  device -> host transfers. See the "Differences from base R" vignette.
 
 ## Other
 
diff --git a/R/array.R b/R/array.R
index abae4a43..18f275a5 100644
--- a/R/array.R
+++ b/R/array.R
@@ -44,6 +44,12 @@
 #'   Backend to use (`"xla"` or `"quickr"`).
 #'   Defaults to `default_backend()`.
 #'   Must not be specified inside [`jit()`].
+#' @param scan_na (`logical(1)`)\cr
+#'   If `TRUE`, error when `data` contains any `NA` values. XLA has no
+#'   representation for missing values, so they are otherwise silently
+#'   coerced to the closest available value of the target dtype (e.g. `NaN`
+#'   for floats, the bit pattern `-2147483648` for `i32`, `TRUE` for
+#'   `bool`). Defaults to `FALSE`.
 #' @return ([`AnvlArray`])
 #' @examplesIf pjrt::plugins_downloaded()
 #' # A 1-d array (vector) with shape (4). Default type for integers is `i32`
@@ -82,7 +88,23 @@ NULL
 
 #' @rdname AnvlArray
 #' @export
-nv_array <- function(data, dtype = NULL, device = NULL, shape = NULL, ambiguous = NULL, backend = NULL) {
+nv_array <- function(
+  data,
+  dtype = NULL,
+  device = NULL,
+  shape = NULL,
+  ambiguous = NULL,
+  backend = NULL,
+  scan_na = FALSE
+) {
+  assert_flag(scan_na)
+  if (scan_na && !is_anvl_array(data) && anyNA(data)) {
+    n_na <- sum(is.na(data))
+    cli_abort(c(
+      "Input {.arg data} contains {n_na} {.val NA} value{?s}, which {?has/have} no representation at the XLA level.",
+      i = "Replace or drop missing values before transferring, or set {.code scan_na = FALSE} to skip this check."
+    ))
+  }
   if (is_anvl_array(data)) {
     if (!is.null(device) && !eq_device(device(data), nv_device(device, backend))) {
       cli_abort("Cannot change device of existing AnvlArray from {.val {device(data)}} to {.val {device}}")
@@ -241,8 +263,16 @@ unwrap_if_array <- function(x) {
 
 #' @rdname AnvlArray
 #' @export
-nv_scalar <- function(data, dtype = NULL, device = NULL, ambiguous = NULL, backend = NULL) {
-  nv_array(data, dtype = dtype, device = device, shape = integer(), ambiguous = ambiguous, backend = backend)
+nv_scalar <- function(data, dtype = NULL, device = NULL, ambiguous = NULL, backend = NULL, scan_na = FALSE) {
+  nv_array(
+    data,
+    dtype = dtype,
+    device = device,
+    shape = integer(),
+    ambiguous = ambiguous,
+    backend = backend,
+    scan_na = scan_na
+  )
 }
 
 #' @rdname AnvlArray
@@ -298,9 +328,24 @@ shape.AnvlArray <- function(x, ...) {
   globals$backends[[x$backend]]$shape(x)
 }
 
-#' @export
-as_array.AnvlArray <- function(x, ...) {
-  globals$backends[[x$backend]]$as_array(x)
+#' @rdname as_array
+#' @param scan_na (`logical(1)`)\cr
+#'   If `TRUE` and the array's dtype is `i32`, error when the materialized
+#'   R integer vector contains any `NA_integer_` values. R reserves the bit
+#'   pattern `-2147483648` as the `NA_integer_` sentinel, so a genuine
+#'   device-side `i32` value of `-2147483648` is silently turned into `NA`
+#'   on transfer. No-op for other dtypes. Defaults to `FALSE`.
+#' @export
+as_array.AnvlArray <- function(x, scan_na = FALSE, ...) {
+  assert_flag(scan_na)
+  result <- globals$backends[[x$backend]]$as_array(x)
+  if (scan_na && (dtype(x) == as_dtype("i32")) && anyNA(result)) {
+    cli_abort(c(
+      "Materialized R integer vector contains {.val NA} values from device-side {.val -2147483648}.",
+      i = "This collision is irrecoverable: the device value and {.val NA} are indistinguishable in R. Set {.code scan_na = FALSE} to skip this check."
+    ))
+  }
+  result
 }
 
 #' @export
diff --git a/man/AnvlArray.Rd b/man/AnvlArray.Rd
index c458155d..ba9e6e3e 100644
--- a/man/AnvlArray.Rd
+++ b/man/AnvlArray.Rd
@@ -16,10 +16,18 @@ nv_array(
   device = NULL,
   shape = NULL,
   ambiguous = NULL,
-  backend = NULL
+  backend = NULL,
+  scan_na = FALSE
 )
 
-nv_scalar(data, dtype = NULL, device = NULL, ambiguous = NULL, backend = NULL)
+nv_scalar(
+  data,
+  dtype = NULL,
+  device = NULL,
+  ambiguous = NULL,
+  backend = NULL,
+  scan_na = FALSE
+)
 
 nv_empty(dtype, shape, device = NULL, ambiguous = FALSE)
 
@@ -80,6 +88,13 @@ Backend to use (\code{"xla"} or \code{"quickr"}).
 Defaults to \code{default_backend()}.
 Must not be specified inside \code{\link[=jit]{jit()}}.}
 
+\item{scan_na}{(\code{logical(1)})\cr
+If \code{TRUE}, error when \code{data} contains any \code{NA} values. XLA has no
+representation for missing values, so they are otherwise silently
+coerced to the closest available value of the target dtype (e.g. \code{NaN}
+for floats, the bit pattern \code{-2147483648} for \code{i32}, \code{TRUE} for
+\code{bool}). Defaults to \code{FALSE}.}
+
 \item{like}{(\code{\link{AnvlArray}})\cr
 An existing array. Any of \code{dtype}, \code{device}, \code{shape}, \code{ambiguous}, and
 \code{backend} that are \code{NULL} (the default) are taken from \code{like}.}
diff --git a/man/as_array.Rd b/man/as_array.Rd
index 0b00f6e8..b4dff65e 100644
--- a/man/as_array.Rd
+++ b/man/as_array.Rd
@@ -1,15 +1,25 @@
 % Generated by roxygen2: do not edit by hand
-% Please edit documentation in R/reexports.R
-\name{as_array}
+% Please edit documentation in R/array.R, R/reexports.R
+\name{as_array.AnvlArray}
+\alias{as_array.AnvlArray}
 \alias{as_array}
 \title{Convert to an R array}
 \usage{
+\method{as_array}{AnvlArray}(x, scan_na = FALSE, ...)
+
 as_array(x, ...)
 }
 \arguments{
 \item{x}{(\code{\link{arrayish}})\cr
 An array-like object.}
 
+\item{scan_na}{(\code{logical(1)})\cr
+If \code{TRUE} and the array's dtype is \code{i32}, error when the materialized
+R integer vector contains any \code{NA_integer_} values. R reserves the bit
+pattern \code{-2147483648} as the \code{NA_integer_} sentinel, so a genuine
+device-side \code{i32} value of \code{-2147483648} is silently turned into \code{NA}
+on transfer. No-op for other dtypes. Defaults to \code{FALSE}.}
+
 \item{...}{Additional arguments passed to methods (unused).}
 }
 \value{
diff --git a/pkgdown/_pkgdown.yml b/pkgdown/_pkgdown.yml
index 72a9b34e..d58f0b22 100644
--- a/pkgdown/_pkgdown.yml
+++ b/pkgdown/_pkgdown.yml
@@ -39,8 +39,8 @@ navbar:
         href: articles/random-numbers.html
       - text: Type Promotion
         href: articles/type-promotion.html
-      - text: Differences from base R
-        href: articles/differences-from-base-r.html
+      - text: Gotchas
+        href: articles/gotchas.html
       - text: Efficiency
         href: articles/efficiency.html
       - text: FAQ
diff --git a/vignettes/differences-from-base-r.Rmd b/vignettes/gotchas.Rmd
similarity index 70%
rename from vignettes/differences-from-base-r.Rmd
rename to vignettes/gotchas.Rmd
index b3984a23..ffde8d67 100644
--- a/vignettes/differences-from-base-r.Rmd
+++ b/vignettes/gotchas.Rmd
@@ -1,5 +1,5 @@
 ---
-title: "Differences from base R"
+title: "Gotchas"
 output: rmarkdown::html_vignette
 vignette: >
   %\VignetteIndexEntry{Differences from base R}
@@ -7,7 +7,7 @@ vignette: >
   %\VignetteEncoding{UTF-8}
 ---
 
-This vignette lists major behavioral differences between {anvl} and base R that R users should be aware of when working with `AnvlArray`s.
+This vignette lists various things to be aware of, specifically in relation to base R.
 
 ```{r}
 library(anvl)
@@ -117,3 +117,33 @@ as.logical(nv_scalar(NA))
 ```
 
 If your data contains missing values, decide how to handle them *before* converting to an `AnvlArray`.
+To opt into a runtime check, pass `scan_na = TRUE` to `nv_array()` / `nv_scalar()`, which errors if the input contains any `NA`:
+
+```{r, error = TRUE}
+nv_array(c(1, NA, 3), scan_na = TRUE)
+```
+
+The same flag is available on `as_array()` for the `i32` round-trip case, where it errors if the materialized integer vector contains any `NA` (i.e. any `-2147483648`):
+
+```{r, error = TRUE}
+as_array(nv_scalar(NA_integer_), scan_na = TRUE)
+```
+
+## No unsigned integers
+
+R's `integer` type is signed 32-bit (range `-2147483648` to `2147483647`).
+{anvl} also exposes unsigned integer dtypes (`ui8`, `ui16`, `ui32`, `ui64`) backed by XLA, but R has no native counterpart.
+For values that fit into R's signed integer range, the round-trip works as expected:
+
+```{r}
+as_array(nv_array(c(0L, 200L, 255L), dtype = "ui8"))
+```
+
+For larger device-side values, however, materialization back into an R integer vector silently produces `NA`:
+
+```{r}
+big <- nv_array(2147483647L, dtype = "ui32") + 1L
+as_array(big)
+```
+
+The device-side value is `2147483648` -- a perfectly valid `ui32` -- but it falls outside R's signed integer range, so it collides with the `NA_integer_` sentinel on materialization. The same caveat applies to all values `>= 2^31` in any unsigned dtype, including the much larger range of `ui64`. If you need to consume large unsigned values in R, convert the dtype on the device side first (e.g. `nv_convert(x, "f64")`).

From 38382bb4ae7363b1a89394f552f68327ee3daa82 Mon Sep 17 00:00:00 2001
From: Sebastian Fischer <sebf.fischer@gmail.com>
Date: Mon, 18 May 2026 17:50:44 +0200
Subject: [PATCH 3/4] ...

---
 R/api.R               |  2 +-
 R/array.R             | 22 ++++++---------
 R/backend-quickr.R    |  2 +-
 R/backend-xla.R       |  2 +-
 R/backend.R           |  7 +++--
 man/AnvlBackend.Rd    |  5 +++-
 man/as_array.Rd       | 13 +++++----
 vignettes/gotchas.Rmd | 64 +++++++++++++++++++++++--------------------
 8 files changed, 61 insertions(+), 56 deletions(-)

diff --git a/R/api.R b/R/api.R
index 441d1aa2..3238fe7c 100644
--- a/R/api.R
+++ b/R/api.R
@@ -101,7 +101,7 @@ nv_broadcast_scalars <- function(...) {
 
   target_shape <- non_scalar_shapes[[1L]]
   if (!all(vapply(non_scalar_shapes, identical, logical(1L), target_shape))) {
-    shapes <- paste0(sapply(shapes, shape2string), sep = ", ")
+    shapes <- paste0(sapply(shapes, shape2string), collapse = ", ")
     cli_abort(
       "All non-scalar arrays must have the same shape, but got {shapes}. Use {.fn nv_broadcast_arrays} for general broadcasting." # nolint
     )
diff --git a/R/array.R b/R/array.R
index 381f4cac..a9489bfe 100644
--- a/R/array.R
+++ b/R/array.R
@@ -426,23 +426,17 @@ shape.AnvlArray <- function(x, ...) {
 
 #' @rdname as_array
 #' @param check (`logical(1)`)\cr
-#'   If `TRUE` and the array's dtype is `i32`, error when the materialized
-#'   R integer vector contains any `NA_integer_` values. R reserves the bit
-#'   pattern `-2147483648` as the `NA_integer_` sentinel, so a genuine
-#'   device-side `i32` value of `-2147483648` is silently turned into `NA`
-#'   on transfer. No-op for other dtypes. Defaults to `FALSE`. See the
-#'   "Gotchas" vignette.
+#'   If `TRUE`, sanity-check the materialized R vector against losing
+#'   information across the device-to-host boundary, and abort if any
+#'   problematic value is detected. Forwarded to the backend; for the
+#'   `xla` backend the relevant cases are `i32`/`i64` values colliding
+#'   with the `NA` bit pattern and `ui64` values `>= 2^63` wrapping
+#'   through `bit64::integer64`. See [`pjrt::as_array.PJRTBuffer()`] for
+#'   the full list. Defaults to `FALSE`. See the "Gotchas" vignette.
 #' @export
 as_array.AnvlArray <- function(x, check = FALSE, ...) {
   assert_flag(check)
-  result <- globals$backends[[x$backend]]$as_array(x)
-  if (check && (dtype(x) == as_dtype("i32")) && anyNA(result)) {
-    cli_abort(c(
-      "Materialized R integer vector contains {.val NA} values from device-side {.val -2147483648}.",
-      i = "This collision is irrecoverable: the device value and {.val NA} are indistinguishable in R. Set {.code check = FALSE} to skip this check."
-    ))
-  }
-  result
+  globals$backends[[x$backend]]$as_array(x, check = check)
 }
 
 #' @export
diff --git a/R/backend-quickr.R b/R/backend-quickr.R
index 73c28631..b26348fe 100644
--- a/R/backend-quickr.R
+++ b/R/backend-quickr.R
@@ -162,7 +162,7 @@ AnvlBackendQuickr <- function() {
     dtype = function(x) x$dtype,
     shape = function(x) x$shape,
     ambiguous = function(x) x$ambiguous,
-    as_array = function(x) x$data,
+    as_array = function(x, check) x$data,
     as_raw = function(x, row_major) as.raw(x$data),
     platform = function(x) "cpu",
     device = function(x) quickr_device("cpu"),
diff --git a/R/backend-xla.R b/R/backend-xla.R
index 45b02270..b781ce74 100644
--- a/R/backend-xla.R
+++ b/R/backend-xla.R
@@ -319,7 +319,7 @@ AnvlBackendXla <- function() {
     dtype = function(x) tengen::dtype(x$data),
     shape = function(x) tengen::shape(x$data),
     ambiguous = function(x) x$ambiguous,
-    as_array = function(x) tengen::as_array(x$data),
+    as_array = function(x, check) tengen::as_array(x$data, check = check),
     as_raw = function(x, row_major) tengen::as_raw(x$data, row_major = row_major),
     platform = function(x) pjrt::platform(x$data),
     device = function(x) device(x$data),
diff --git a/R/backend.R b/R/backend.R
index 112552b5..4550ec97 100644
--- a/R/backend.R
+++ b/R/backend.R
@@ -6,7 +6,10 @@
 #' @param dtype (`function`)\cr Extracts the dtype from an AnvlArray.
 #' @param shape (`function`)\cr Extracts the shape from an AnvlArray.
 #' @param ambiguous (`function`)\cr Extracts the ambiguous flag from an AnvlArray.
-#' @param as_array (`function`)\cr Converts an AnvlArray to an R array.
+#' @param as_array (`function(x, check)`)\cr Converts an AnvlArray to an R
+#'   array. The `check` flag is forwarded from [`as_array()`]; backends may use
+#'   it to abort when materialization would lose information (e.g. ui64 values
+#'   wrapping through `bit64::integer64`). See [`pjrt::as_array.PJRTBuffer()`].
 #' @param as_raw (`function`)\cr Converts an AnvlArray to raw bytes.
 #' @param platform (`function`)\cr Returns the platform name (e.g. `"cpu"`).
 #' @param device (`function`)\cr Returns the device object for an AnvlArray.
@@ -141,7 +144,7 @@ register_backend(
     dtype = function(x) x$dtype,
     shape = function(x) x$shape,
     ambiguous = function(x) x$ambiguous,
-    as_array = function(x) x$data,
+    as_array = function(x, check) x$data,
     as_raw = function(x, row_major) cli_abort("as_raw not supported for plain backend"),
     platform = function(x) "cpu",
     device = function(x) PlainDeviceCpu(),
diff --git a/man/AnvlBackend.Rd b/man/AnvlBackend.Rd
index ec55487b..b1e68a38 100644
--- a/man/AnvlBackend.Rd
+++ b/man/AnvlBackend.Rd
@@ -30,7 +30,10 @@ underlying data (\code{PJRTBuffer} for \code{"xla"} backend, \code{array()} for
 
 \item{ambiguous}{(\code{function})\cr Extracts the ambiguous flag from an AnvlArray.}
 
-\item{as_array}{(\code{function})\cr Converts an AnvlArray to an R array.}
+\item{as_array}{(\verb{function(x, check)})\cr Converts an AnvlArray to an R
+array. The \code{check} flag is forwarded from \code{\link[=as_array]{as_array()}}; backends may use
+it to abort when materialization would lose information (e.g. ui64 values
+wrapping through \code{bit64::integer64}). See \code{\link[pjrt:as_array.PJRTBuffer]{pjrt::as_array.PJRTBuffer()}}.}
 
 \item{as_raw}{(\code{function})\cr Converts an AnvlArray to raw bytes.}
 
diff --git a/man/as_array.Rd b/man/as_array.Rd
index 33d53782..ecacd553 100644
--- a/man/as_array.Rd
+++ b/man/as_array.Rd
@@ -14,12 +14,13 @@ as_array(x, ...)
 An array-like object.}
 
 \item{check}{(\code{logical(1)})\cr
-If \code{TRUE} and the array's dtype is \code{i32}, error when the materialized
-R integer vector contains any \code{NA_integer_} values. R reserves the bit
-pattern \code{-2147483648} as the \code{NA_integer_} sentinel, so a genuine
-device-side \code{i32} value of \code{-2147483648} is silently turned into \code{NA}
-on transfer. No-op for other dtypes. Defaults to \code{FALSE}. See the
-"Gotchas" vignette.}
+If \code{TRUE}, sanity-check the materialized R vector against losing
+information across the device-to-host boundary, and abort if any
+problematic value is detected. Forwarded to the backend; for the
+\code{xla} backend the relevant cases are \code{i32}/\code{i64} values colliding
+with the \code{NA} bit pattern and \code{ui64} values \verb{>= 2^63} wrapping
+through \code{bit64::integer64}. See \code{\link[pjrt:as_array.PJRTBuffer]{pjrt::as_array.PJRTBuffer()}} for
+the full list. Defaults to \code{FALSE}. See the "Gotchas" vignette.}
 
 \item{...}{Additional arguments passed to methods (unused).}
 }
diff --git a/vignettes/gotchas.Rmd b/vignettes/gotchas.Rmd
index 1bd5d6a8..256a310d 100644
--- a/vignettes/gotchas.Rmd
+++ b/vignettes/gotchas.Rmd
@@ -9,15 +9,15 @@ vignette: >
 
 This vignette lists various things to be aware of, specifically in relation to base R.
 
-```{r}
+```{r, include = FALSE}
 library(anvl)
 ```
 
 ## Row-major vs column-major ordering
 
 R stores matrices and arrays in *column-major* order, while {anvl} (following XLA) uses *row-major* order.
-This makes no difference when you only use shape-aware operations (subsetting, matrix multiplication, etc.) -- the indices are the same in both.
-The difference shows up when you flatten an array, because the underlying data is then traversed in a different order.
+For most operations, this is an internal implementation detail that does not change the semantics.
+However, for reshaping operations such as `nv_flatten()` there is a difference.
 
 Consider the 2x2 matrix below:
 
@@ -35,13 +35,13 @@ as.vector(m)
 In {anvl}, reshaping to a length-4 vector traverses the data row-by-row, so we get `1, 3, 2, 4`:
 
 ```{r}
-nv_reshape(m, shape = 4)
+nv_flatten(m)
 ```
 
 If you need column-major flattening in {anvl}, transpose first:
 
 ```{r}
-nv_reshape(nv_transpose(m), shape = 4)
+nv_flatten(t(m))
 ```
 
 ## No recycling
@@ -68,9 +68,12 @@ nv_array(1:4) + nv_array(1:2)
 When two non-scalar arrays differ only by size-1 dimensions (numpy-style broadcasting, e.g. shape `(2, 3)` and `(1, 3)`), use `nv_broadcast_arrays()` to align them explicitly first:
 
 ```{r}
-a <- nv_array(matrix(1:6, nrow = 2))
-b <- nv_array(matrix(c(10, 20, 30), nrow = 1))
+a <- nv_matrix(1:6, nrow = 2)
+shape(a)
+b <- nv_matrix(c(10, 20, 30), nrow = 1)
+shape(b)
 xs <- nv_broadcast_arrays(a, b)
+lapply(xs, shape)
 xs[[1]] + xs[[2]]
 ```
 
@@ -79,9 +82,8 @@ Note that even `nv_broadcast_arrays()` cannot replicate R's recycling for shapes
 ## No `NA`s
 
 R has a dedicated missing-value marker (`NA`) for every atomic type.
-{anvl} arrays do not -- there is no representation of "missing" at the XLA level.
-When you convert R values containing `NA` into an `AnvlArray`, the `NA`s are silently turned into the closest available value of the target dtype.
-For floating-point dtypes, that value is `NaN`:
+{anvl} arrays do not -- there is no representation of "missing" at the XLA level, only `NaN` for floating point numbers.
+When you convert R values containing `NA` into an `AnvlArray`, the `NA`s are silently turned into `NaN`s.
 
 ```{r}
 nv_array(NA_real_)
@@ -91,50 +93,44 @@ nv_array(NA_real_)
 nv_array(c(1, NA, 3))
 ```
 
-Round-tripping back to R produces `NaN`, not `NA`:
+Round-tripping back to R is not guaranteed to produce `NA`, but can also yield `NaN`:
 
 ```{r}
 as_array(nv_array(c(1, NA, 3)))
 ```
 
-Integer dtypes have no `NaN`, but `NA_integer_` does *appear* to round-trip:
+For other data types, the situation is even worse, especially for integers, where R uses the smallest possible value to represent missingness:
 
 ```{r}
-nv_scalar(NA_integer_) |> as.integer()
+nv_scalar(NA_integer_)
 ```
 
-This is misleading.
-R represents `NA_integer_` by reserving one specific 32-bit integer value (`-2^31 = -2147483648`) as a sentinel for missingness, leaving only `2^32 - 1` valid integers.
-{anvl} has no notion of missing values and just stores that bit pattern as a regular `i32`.
-The round-trip "works" only because R interprets the same bit pattern back as `NA` -- but inside {anvl} the value behaves like the integer `-2147483648`, and any computation on it (e.g. addition, comparison) will treat it as such rather than propagating missingness.
-The same caveat applies in reverse: if a genuine {anvl} computation produces `-2147483648`, converting back to R will silently turn it into `NA`.
+However, when you convert it back, you get a missing value again:
+
+```{r}
+as.integer(nv_scalar(NA_integer_))
+```
 
-For logical (`bool`) dtype the situation is worst: there is no spare bit pattern at all, so a bare `NA` (which is logical) is silently turned into `TRUE`:
+When creating logicals, `NA` will be interpreted as `TRUE`:
 
 ```{r}
 nv_scalar(NA)
 as.logical(nv_scalar(NA))
 ```
 
-If your data contains missing values, decide how to handle them *before* converting to an `AnvlArray`.
-To opt into a runtime check, pass `check = TRUE` to `nv_array()` / `nv_scalar()`, which errors if the input contains any `NA`:
+In order to avoid these pitfals, array creators such as `nv_array()` have a `check` argument to prevent the above problems.
+It is `FALSE` by default, because it needs to scan the complete data.
 
 ```{r, error = TRUE}
 nv_array(c(1, NA, 3), check = TRUE)
 ```
 
-The same flag is available on `as_array()` for the `i32` round-trip case, where it errors if the materialized integer vector contains any `NA` (i.e. any `-2147483648`):
+The same flag is available for converters like `as_array()`:
 
 ```{r, error = TRUE}
 as_array(nv_scalar(NA_integer_), check = TRUE)
 ```
 
-It is also forwarded by the `as.integer()` / `as.double()` / `as.logical()` / `as.vector()` methods for `AnvlArray`, so the same scan is available when coercing directly to a bare R vector:
-
-```{r, error = TRUE}
-as.integer(nv_scalar(NA_integer_), check = TRUE)
-```
-
 ## No unsigned integers
 
 R's `integer` type is signed 32-bit (range `-2147483648` to `2147483647`).
@@ -145,11 +141,19 @@ For values that fit into R's signed integer range, the round-trip works as expec
 as_array(nv_array(c(0L, 200L, 255L), dtype = "ui8"))
 ```
 
-For larger device-side values, however, materialization back into an R integer vector silently produces `NA`:
+Because `ui32` does not fit into R's native integer type, it will be converted to `bit64::integer64` data type:
+
 
 ```{r}
 big <- nv_array(2147483647L, dtype = "ui32") + 1L
 as_array(big)
 ```
 
-The device-side value is `2147483648` -- a perfectly valid `ui32` -- but it falls outside R's signed integer range, so it collides with the `NA_integer_` sentinel on materialization. The same caveat applies to all values `>= 2^31` in any unsigned dtype, including the much larger range of `ui64`. If you need to consume large unsigned values in R, convert the dtype on the device side first (e.g. `nv_convert(x, "f64")`).
+However, for `ui64`, we also convert to `integer64`, which does not cover the whole range, so overflow is possible, but can be detected via the `check` flag:
+
+```{r, error = TRUE}
+big <- nv_array(0L, dtype = "ui64") - 1L
+big
+as_array(big)
+as_array(big, check = TRUE)
+```

From 8cd657c840c722a33054945ae39ba7c856d84984 Mon Sep 17 00:00:00 2001
From: Sebastian Fischer <sebf.fischer@gmail.com>
Date: Mon, 18 May 2026 19:17:57 +0200
Subject: [PATCH 4/4] remove unsupported argument from as.vector

---
 R/array.R           | 4 ++--
 man/as-AnvlArray.Rd | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/R/array.R b/R/array.R
index a9489bfe..170259e9 100644
--- a/R/array.R
+++ b/R/array.R
@@ -516,8 +516,8 @@ as.logical.AnvlArray <- function(x, check = FALSE, ...) {
 #' @rdname as-AnvlArray
 #' @method as.vector AnvlArray
 #' @export
-as.vector.AnvlArray <- function(x, mode = "any", check = FALSE) {
-  as.vector(as_array(x, check = check), mode = mode)
+as.vector.AnvlArray <- function(x, mode = "any") {
+  as.vector(as_array(x), mode = mode)
 }
 
 #' @rdname platform
diff --git a/man/as-AnvlArray.Rd b/man/as-AnvlArray.Rd
index 74f030af..2b7e4f73 100644
--- a/man/as-AnvlArray.Rd
+++ b/man/as-AnvlArray.Rd
@@ -14,7 +14,7 @@
 
 \method{as.logical}{AnvlArray}(x, check = FALSE, ...)
 
-\method{as.vector}{AnvlArray}(x, mode = "any", check = FALSE)
+\method{as.vector}{AnvlArray}(x, mode = "any")
 }
 \arguments{
 \item{x}{(\code{\link{AnvlArray}})\cr