Skip to content

Conversation

@scovich
Copy link
Contributor

@scovich scovich commented Jan 29, 2026

Which issue does this PR close?

Rationale for this change

Two things:

  1. An &ArrayRef can coerce to &dyn Array, but there is currently no way to recover the original ArrayRef instance afterward, because Array::as_any() returns the referent of the ArrayRef.
  2. An Arc<T: Array> can coerce to ArrayRef, but there is currently no way to recover the orignal Arc<T> because Array does not (and indeed cannot) satisfy the Any + 'static bound that Arc::downcast requires.

What changes are included in this PR?

Define a new Array::as_array_ref() method, with a provided implementation that returns None. <Arc<T> as Array>::as_array_ref() returns Some(self) and <&T as Array>::as_array_ref() delegates to its referent.

Also define a new Array::to_array_ref() method that attempts to recover an ArrayRef but which falls back to creating a new Array instance via Array -> ArrayData -> make_array. This provides a simple way to obtain an owned ArrayRef from any &dyn Array, optimized to a cheap Arc clone in case the trait object is already an ArrayRef.

Define a new freestanding downcast_array_ref function that attempts to recover the original Arc<T> from an owned ArrayRef -- mirroring the behavior of Arc::downcast, but without the restrictive type bounds.

Are these changes tested?

Yes, new doc tests validate the code.

Are there any user-facing changes?

New user-facing functions described above.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 29, 2026
@scovich
Copy link
Contributor Author

scovich commented Jan 29, 2026

@alamb interested in your thoughts here

@tustvold
Copy link
Contributor

We typically just use make_array(array.to_data) to achieve this. I'm not sure we should be assuming ArrayRef

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @scovich -- it is really cool to see pushing on this idea

/// ```
fn as_any(&self) -> &dyn Any;

/// Attempts to recover an [`ArrayRef`] from `&dyn Array`, returning `None` if not Arc-backed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this API makes sense to me as a user of the library, for what it is worth, and would save some allocations rather than going to/from ArrayData


fn as_array_ref_opt(&self) -> Option<ArrayRef> {
// Recursively unwrap nested Arcs to find the deepest one
Some(innermost_array_ref(Cow::Borrowed(self)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am probably confused, but why not just Arc::clone(self)? This is impl ... for ArrayRef which is Arc<dyn Array>

So technically speaking as_array_ref_opt would be recovering the ArrayRef

Are there usecases when we have wrapped multiple levels of Arrays? I think this (code tries to handle the case of Arc<Arc<PrimitiveArary>> or something, which I don't think is common 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hehe, yes we absolutely do have Arc<Arc<...>> and there are even unit tests verifying they behave correctly.
That was a surprise to me when I started exploring the possibility to have Array: Any.

So this code might arc-clone self... but not if self harbors a deeper ArrayRef inside.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is possible, but why on earth would you do such a thing? Tbh I'm not sure why Array is implemented for Arc<dyn Array>...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO both impl Array for Arc<dyn Array> and impl<T: Array> Array for &T were questionable (design?) decisions back whenever they showed up.

I expect it was intended to be ergonomic?

  • The former allows passing &ArrayRef directly to functions that expect &dyn Array, without needing to litter the code with explicit as_ref() calls.
  • The latter allows code that expects impl AsRef<dyn Array> to work with both owned and borrowed operands.

Part of me would love to just remove both of those, but that would definitely be a massive breaking change. As in, massive number of compilation failures (even if easily fixed).

Even within arrow-rs itself, the whole ArrayAccessor and ArrayIterator abstraction would break if we took away the blanket impl for &T, because ArrayAccessor::Item is owned for some array types (primitive, boolean, etc) and borrowed for others (GenericByteArray). I spent a few minutes trying to untangle it, without success.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried removing impl Array for Arc<dyn Array> and the biggest problem is that Scalar<ArrayRef> stops working. It's not immediately clear whether/how to fix that. In particular, there's no obvious replacement for Scalar::new(new_null_array(...)).

Other smaller messes include:

  • Quite a few call sites that call Arc::new(<expression returning ArrayRef>) as ArrayRef (there's that wrapping!)
  • A surprising number of calls to <ArrayRef as Array>::shrink_to_fit() (fake-mutable, no-op)
  • Lots of calls to <ArrayRef as Array>::into_data() (fake-mutable, equivalent to Array::to_data)
  • A lot of code that has an ArrayRef or &ArrayRef, and which lacks the necessary .as_ref() call. Mostly unit tests.
  • Missing impl Datum for ArrayRef

Just getting arrow-rs to compile without it was an impressive amount of churn:

 82 files changed, 883 insertions(+), 732 deletions(-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If nothing else, we can at least clean up call sites that are needlessly creating Arc<Arc<dyn Array>>:

// - Same allocation/pointer is guaranteed by check #2
// - Arc::from_raw is only called once on this pointer

// Unwrap nested Arc<dyn Array> if present (zero clones if not nested)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is code I am not qualified to review.

I did some more googling, and it seems like we should be able to do soemthing similar using a trick like this:

https://stackoverflow.com/questions/76222743/upcasting-an-arcdyn-trait-to-an-arcdyn-any

However, I couldn't make the trait impls all line up 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the SO question is trying to go from Arc<dyn Foo> to Arc<dyn Any>. The two solutions are:

  • use trait upcasting
  • define an extension trait with a helper method that receives Arc<Self>

The former requires Foo: Any + 'static as well as the unstable trait-upcasting feature. Even if the feature stabilizes, we still have the problem that Array does not satisfy Any + 'static.

The latter I thought would make the extension trait not dyn-compatible... but the code in that accepted SO answer seems to compile... let me look into that and get back to you.

Copy link
Contributor Author

@scovich scovich Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah -- apparently the 'static constraint allows Arc<Self> methods to be dyn-compatible (TIL!).

Unfortunately, that would impose a transitive requirement of Array: 'static which is incompatible with a bunch of existing code, including the existing impl<T> Array for &T, trait ArrayAccessor: Array, and any code that does anything with a &'a dyn Array (which is ~all of them, if you account for anonymous/erased lifetimes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

No way to downcast Arc<dyn Array> to Arc<T: Array>

3 participants