Skip to content

Improve Pandas FFI compatibility#35

Merged
pbower merged 4 commits intomainfrom
pycap_series
Mar 15, 2026
Merged

Improve Pandas FFI compatibility#35
pbower merged 4 commits intomainfrom
pycap_series

Conversation

@pbower
Copy link
Copy Markdown
Owner

@pbower pbower commented Mar 8, 2026

Issue Summary

pd.Series (pandas 3+) and pl.Series expose arrow_c_stream but not arrow_c_array. The existing import_array_stream also crashed on dictionary/categorical types because it reconstructed per-chunk schemas without a dictionary pointer.

Changes

1. src/ffi/arrow_c_ffi.rs - field_from_c_schema (line ~2469)

  • Now checks schema.dictionary and returns ArrowType::Dictionary(...) instead of the raw index type when the schema is dictionary-encoded. Previously a dict column with UInt32 indices would return ArrowType::UInt32
  • now it correctly returns ArrowType::Dictionary(UInt32).

2. src/ffi/arrow_c_ffi.rs - import_array_stream (line ~2394)

  • Now keeps the stream schema alive and passes it directly to import_array_zero_copy. This mirrors the approach used by import_record_batch_stream_with_metadata and fixes the null-pointer crash for dictionary-encoded arrays.
  • Also adds the Utf8View -> String dtype fixup that was missing.

3. pyo3/src/ffi/to_rust.rs - array_to_rust (line ~239)

  • Adds arrow_c_stream as a fallback between arrow_c_array and _export_to_c. This enables pd.Series and pl.Series to be converted to FieldArray. Multiple chunks are concatenated into a single array.

@pbower pbower merged commit d746f4f into main Mar 15, 2026
1 check passed
@pbower pbower deleted the pycap_series branch March 15, 2026 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant