Skip to content

Optional: Replace pandas with polars #83

@bh2smith

Description

@bh2smith

Overview

This issue proposes replacing pandas with polars as our DataFrame library to improve performance and memory usage.

Main Changes Required

DataFrame Operations

# Before (pandas)
from pandas import DataFrame
def save(self, data: DataFrame) -> None:
result = self.client.upload_csv(self.table_name, data.to_csv(index=False))

# After (polars)
from polars import DataFrame
def save(self, data: DataFrame) -> None:
result = self.client.upload_csv(self.table_name, data.write_csv(has_header=True))

Common Conversion Points

  • DataFrame creation
  • CSV reading/writing
  • Empty checks (df.emptydf.is_empty())
  • Column operations
  • Type conversions

Key Differences to Handle

Pandas to Polars conversions

df.empty # → df.is_empty()
df.to_csv() # → df.write_csv()
df['column'] # → df.get_column('column')
df.iloc[0] # → df.row(0)
df.dtypes # → df.dtypes (similar but returns polars dtypes)

Complexity Assessment

Easy

  • Basic DataFrame operations

Medium

  • Type handling, especially with SQLAlchemy integration

Hard

  • Any pandas-specific operations or methods that don't have direct polars equivalents

Benefits

  • Better performance
  • Lower memory usage
  • Better handling of larger datasets
  • More modern API

Potential Challenges

Areas that might need special attention

TypedDataFrame = tuple[DataFrame, dict] # May need redefinition
df.to_sql() # Different database integration approach needed

Tasks

  • Update dependencies in pyproject.toml
  • Replace pandas imports with polars
  • Update DataFrame operations
  • Update type hints and custom types
  • Update database integration code
  • Update tests
  • Update documentation

Questions

  • Do we need to maintain pandas compatibility for any external integrations?
  • Should we do this gradually or all at once?
  • Are there any specific performance bottlenecks we should prioritize?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions