Overview
This issue proposes replacing pandas with polars as our DataFrame library to improve performance and memory usage.
Main Changes Required
DataFrame Operations
# Before (pandas)
from pandas import DataFrame
def save(self, data: DataFrame) -> None:
result = self.client.upload_csv(self.table_name, data.to_csv(index=False))
# After (polars)
from polars import DataFrame
def save(self, data: DataFrame) -> None:
result = self.client.upload_csv(self.table_name, data.write_csv(has_header=True))
Common Conversion Points
- DataFrame creation
- CSV reading/writing
- Empty checks (
df.empty → df.is_empty())
- Column operations
- Type conversions
Key Differences to Handle
Pandas to Polars conversions
df.empty # → df.is_empty()
df.to_csv() # → df.write_csv()
df['column'] # → df.get_column('column')
df.iloc[0] # → df.row(0)
df.dtypes # → df.dtypes (similar but returns polars dtypes)
Complexity Assessment
Easy
- Basic DataFrame operations
Medium
- Type handling, especially with SQLAlchemy integration
Hard
- Any pandas-specific operations or methods that don't have direct polars equivalents
Benefits
- Better performance
- Lower memory usage
- Better handling of larger datasets
- More modern API
Potential Challenges
Areas that might need special attention
TypedDataFrame = tuple[DataFrame, dict] # May need redefinition
df.to_sql() # Different database integration approach needed
Tasks
Questions
- Do we need to maintain pandas compatibility for any external integrations?
- Should we do this gradually or all at once?
- Are there any specific performance bottlenecks we should prioritize?
Overview
This issue proposes replacing pandas with polars as our DataFrame library to improve performance and memory usage.
Main Changes Required
DataFrame Operations
Common Conversion Points
df.empty→df.is_empty())Key Differences to Handle
Pandas to Polars conversions
Complexity Assessment
Easy
Medium
Hard
Benefits
Potential Challenges
Areas that might need special attention
Tasks
Questions