-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Hi,
it seems recent pandas update to version 3 brought PyCapsule for pd.Series. At the same time, I see polars support __arrow_c_stream__ for pl.Series.
I have a workaround for my use case, but I think it would be great to extend the support over series. Especially, that it does not seem so difficult as most needed elements exist.
I was pretty successful using PyCapsule for PyArray after adding above this line (sorry for the quality, just want to show it is feasible):
let x = try_capsule_array_stream(obj).unwrap().unwrap();
return Some(Ok(FieldArray::new(x.1, (*x.0[0].as_ref()).clone())));
The only problem it works for floats and ints, while categorical data type doesn't. The thing is sch_ptr is null. This line seems to be responsible for that. Hence, it is disabled on schema level to import any categorical (as it requires dict). For some reason it looks differently while importing schema for batch record - same field_from_c_schema is used, but it simply takes child raw schema.
Below stacktrace:
2: minarrow::ffi::arrow_c_ffi::import_from_c
at minarrow/src/ffi/arrow_c_ffi.rs:625:9
3: minarrow::ffi::arrow_c_ffi::import_categorical
at minarrow/src/ffi/arrow_c_ffi.rs:1448:25
4: minarrow::ffi::arrow_c_ffi::import_from_c_owned
at minarrow/src/ffi/arrow_c_ffi.rs:852:13
5: minarrow::ffi::arrow_c_ffi::import_array_stream
at minarrow/src/ffi/arrow_c_ffi.rs:2454:33
6: minarrow_pyo3::ffi::to_rust::import_capsule_array_stream
at minarrow/pyo3/src/ffi/to_rust.rs:193:27
7: minarrow_pyo3::ffi::to_rust::try_capsule_array_stream
at minarrow/pyo3/src/ffi/to_rust.rs:167:10
8: minarrow_pyo3::ffi::to_rust::try_capsule_array
at minarrow/pyo3/src/ffi/to_rust.rs:60:13
9: minarrow_pyo3::ffi::to_rust::array_to_rust
Debug for sch from import_categorical:
sch = ArrowSchema {
format: 0x000000002a8397fc,
name: 0x686d6b98ad459623,
metadata: 0x0000000000000000,
flags: 2,
n_children: 0,
children: 0x0000000000000000,
dictionary: 0x0000000000000000,
release: Some(
0x00007f5406594030,
),
private_data: 0x0000000000000000,
}