-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Define Array methods for working with ArrayRef #9295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@alamb interested in your thoughts here |
|
We typically just use |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @scovich -- it is really cool to see pushing on this idea
| /// ``` | ||
| fn as_any(&self) -> &dyn Any; | ||
|
|
||
| /// Attempts to recover an [`ArrayRef`] from `&dyn Array`, returning `None` if not Arc-backed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this API makes sense to me as a user of the library, for what it is worth, and would save some allocations rather than going to/from ArrayData
|
|
||
| fn as_array_ref_opt(&self) -> Option<ArrayRef> { | ||
| // Recursively unwrap nested Arcs to find the deepest one | ||
| Some(innermost_array_ref(Cow::Borrowed(self))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am probably confused, but why not just Arc::clone(self)? This is impl ... for ArrayRef which is Arc<dyn Array>
So technically speaking as_array_ref_opt would be recovering the ArrayRef
Are there usecases when we have wrapped multiple levels of Arrays? I think this (code tries to handle the case of Arc<Arc<PrimitiveArary>> or something, which I don't think is common 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hehe, yes we absolutely do have Arc<Arc<...>> and there are even unit tests verifying they behave correctly.
That was a surprise to me when I started exploring the possibility to have Array: Any.
So this code might arc-clone self... but not if self harbors a deeper ArrayRef inside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it is possible, but why on earth would you do such a thing? Tbh I'm not sure why Array is implemented for Arc<dyn Array>...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO both impl Array for Arc<dyn Array> and impl<T: Array> Array for &T were questionable (design?) decisions back whenever they showed up.
I expect it was intended to be ergonomic?
- The former allows passing
&ArrayRefdirectly to functions that expect&dyn Array, without needing to litter the code with explicitas_ref()calls. - The latter allows code that expects
impl AsRef<dyn Array>to work with both owned and borrowed operands.
Part of me would love to just remove both of those, but that would definitely be a massive breaking change. As in, massive number of compilation failures (even if easily fixed).
Even within arrow-rs itself, the whole ArrayAccessor and ArrayIterator abstraction would break if we took away the blanket impl for &T, because ArrayAccessor::Item is owned for some array types (primitive, boolean, etc) and borrowed for others (GenericByteArray). I spent a few minutes trying to untangle it, without success.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried removing impl Array for Arc<dyn Array> and the biggest problem is that Scalar<ArrayRef> stops working. It's not immediately clear whether/how to fix that. In particular, there's no obvious replacement for Scalar::new(new_null_array(...)).
Other smaller messes include:
- Quite a few call sites that call
Arc::new(<expression returning ArrayRef>) as ArrayRef(there's that wrapping!) - A surprising number of calls to
<ArrayRef as Array>::shrink_to_fit()(fake-mutable, no-op) - Lots of calls to
<ArrayRef as Array>::into_data()(fake-mutable, equivalent toArray::to_data) - A lot of code that has an
ArrayRefor&ArrayRef, and which lacks the necessary.as_ref()call. Mostly unit tests. - Missing
impl Datum for ArrayRef
Just getting arrow-rs to compile without it was an impressive amount of churn:
82 files changed, 883 insertions(+), 732 deletions(-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If nothing else, we can at least clean up call sites that are needlessly creating Arc<Arc<dyn Array>>:
| // - Same allocation/pointer is guaranteed by check #2 | ||
| // - Arc::from_raw is only called once on this pointer | ||
|
|
||
| // Unwrap nested Arc<dyn Array> if present (zero clones if not nested) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this is code I am not qualified to review.
I did some more googling, and it seems like we should be able to do soemthing similar using a trick like this:
https://stackoverflow.com/questions/76222743/upcasting-an-arcdyn-trait-to-an-arcdyn-any
However, I couldn't make the trait impls all line up 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, the SO question is trying to go from Arc<dyn Foo> to Arc<dyn Any>. The two solutions are:
- use trait upcasting
- define an extension trait with a helper method that receives
Arc<Self>
The former requires Foo: Any + 'static as well as the unstable trait-upcasting feature. Even if the feature stabilizes, we still have the problem that Array does not satisfy Any + 'static.
The latter I thought would make the extension trait not dyn-compatible... but the code in that accepted SO answer seems to compile... let me look into that and get back to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah -- apparently the 'static constraint allows Arc<Self> methods to be dyn-compatible (TIL!).
Unfortunately, that would impose a transitive requirement of Array: 'static which is incompatible with a bunch of existing code, including the existing impl<T> Array for &T, trait ArrayAccessor: Array, and any code that does anything with a &'a dyn Array (which is ~all of them, if you account for anonymous/erased lifetimes).
Which issue does this PR close?
Rationale for this change
Two things:
&ArrayRefcan coerce to&dyn Array, but there is currently no way to recover the originalArrayRefinstance afterward, becauseArray::as_any()returns the referent of theArrayRef.Arc<T: Array>can coerce toArrayRef, but there is currently no way to recover the orignalArc<T>becauseArraydoes not (and indeed cannot) satisfy theAny + 'staticbound thatArc::downcastrequires.What changes are included in this PR?
Define a new
Array::as_array_ref()method, with a provided implementation that returns None.<Arc<T> as Array>::as_array_ref()returnsSome(self)and<&T as Array>::as_array_ref()delegates to its referent.Also define a new
Array::to_array_ref()method that attempts to recover anArrayRefbut which falls back to creating a new Array instance viaArray->ArrayData->make_array. This provides a simple way to obtain an ownedArrayReffrom any&dyn Array, optimized to a cheap Arc clone in case the trait object is already anArrayRef.Define a new freestanding
downcast_array_reffunction that attempts to recover the originalArc<T>from an ownedArrayRef-- mirroring the behavior ofArc::downcast, but without the restrictive type bounds.Are these changes tested?
Yes, new doc tests validate the code.
Are there any user-facing changes?
New user-facing functions described above.