Skip to content

🪲 StringDtype Compatibility Issue with Pandas 3.0 #441

@e-lo

Description

@e-lo

Describe the bug

Tests fail on Python 3.11, 3.12, and 3.13, but pass on Python 3.10. The failure occurs during pandera DataFrameModel validation when DataFrames contain pandas StringDtype columns.

TypeError: Cannot interpret '<StringDtype(na_value=nan)>' as a data type

Root Cause

Pandas 2.2+ uses StringDtype for string columns by default
Pandera uses numpy's issubdtype() internally to check dtypes
numpy.issubdtype() cannot handle pandas StringDtype objects, causing the error
This affects Python 3.11+ due to stricter type checking or numpy/pandas version differences

Affected Code

  • Any DataFrameModel validation using Series[str] fields
  • The TimeStrSeriesSchema using pa.String
    Validation functions in network_wrangler/utils/models.py

Resolution

  • Changed pa.String to pandas_engine.NpString() in TimeStrSeriesSchema (uses object dtype, compatible with numpy)
  • Added _convert_string_dtype_to_object() function to convert StringDtype columns to object dtype before validation in validate_df_to_model()

Failing tests

  • pytest tests/ -v

ails on Python 3.11, 3.12, 3.13; passes on 3.10

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions