-
Notifications
You must be signed in to change notification settings - Fork 5
🪲 StringDtype Compatibility Issue with Pandas 3.0 #441
Copy link
Copy link
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Tests fail on Python 3.11, 3.12, and 3.13, but pass on Python 3.10. The failure occurs during pandera DataFrameModel validation when DataFrames contain pandas StringDtype columns.
TypeError: Cannot interpret '<StringDtype(na_value=nan)>' as a data typeRoot Cause
Pandas 2.2+ uses StringDtype for string columns by default
Pandera uses numpy's issubdtype() internally to check dtypes
numpy.issubdtype() cannot handle pandas StringDtype objects, causing the error
This affects Python 3.11+ due to stricter type checking or numpy/pandas version differences
Affected Code
- Any
DataFrameModelvalidation usingSeries[str]fields - The
TimeStrSeriesSchemausing pa.String
Validation functions innetwork_wrangler/utils/models.py
Resolution
- Changed
pa.Stringtopandas_engine.NpString()inTimeStrSeriesSchema(uses object dtype, compatible with numpy) - Added
_convert_string_dtype_to_object()function to convertStringDtypecolumns to object dtype before validation invalidate_df_to_model()
Failing tests
- pytest tests/ -v
ails on Python 3.11, 3.12, 3.13; passes on 3.10
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working