Skip to content

PyCapsule support for series #34

@kpiwonski

Description

@kpiwonski

Hi,

it seems recent pandas update to version 3 brought PyCapsule for pd.Series. At the same time, I see polars support __arrow_c_stream__ for pl.Series.

I have a workaround for my use case, but I think it would be great to extend the support over series. Especially, that it does not seem so difficult as most needed elements exist.

I was pretty successful using PyCapsule for PyArray after adding above this line (sorry for the quality, just want to show it is feasible):

let x = try_capsule_array_stream(obj).unwrap().unwrap();
return Some(Ok(FieldArray::new(x.1, (*x.0[0].as_ref()).clone())));

The only problem it works for floats and ints, while categorical data type doesn't. The thing is sch_ptr is null. This line seems to be responsible for that. Hence, it is disabled on schema level to import any categorical (as it requires dict). For some reason it looks differently while importing schema for batch record - same field_from_c_schema is used, but it simply takes child raw schema.

Below stacktrace:

   2: minarrow::ffi::arrow_c_ffi::import_from_c
             at minarrow/src/ffi/arrow_c_ffi.rs:625:9
   3: minarrow::ffi::arrow_c_ffi::import_categorical
             at minarrow/src/ffi/arrow_c_ffi.rs:1448:25
   4: minarrow::ffi::arrow_c_ffi::import_from_c_owned
             at minarrow/src/ffi/arrow_c_ffi.rs:852:13
   5: minarrow::ffi::arrow_c_ffi::import_array_stream
             at minarrow/src/ffi/arrow_c_ffi.rs:2454:33
   6: minarrow_pyo3::ffi::to_rust::import_capsule_array_stream
             at minarrow/pyo3/src/ffi/to_rust.rs:193:27
   7: minarrow_pyo3::ffi::to_rust::try_capsule_array_stream
             at minarrow/pyo3/src/ffi/to_rust.rs:167:10
   8: minarrow_pyo3::ffi::to_rust::try_capsule_array
             at minarrow/pyo3/src/ffi/to_rust.rs:60:13
   9: minarrow_pyo3::ffi::to_rust::array_to_rust

Debug for sch from import_categorical:

sch = ArrowSchema {
    format: 0x000000002a8397fc,
    name: 0x686d6b98ad459623,
    metadata: 0x0000000000000000,
    flags: 2,
    n_children: 0,
    children: 0x0000000000000000,
    dictionary: 0x0000000000000000,
    release: Some(
        0x00007f5406594030,
    ),
    private_data: 0x0000000000000000,
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions