Skip to content

Track the maximum partition size in dataset types #145

@facundominguez

Description

@facundominguez

Spark programs scale as long as the partition sizes of the inputs are bounded. This issue is to explore whether it would be possible to track in the type of datasets which is the maximum partition size of its partitions. This way, the type of an algorithms could ensure that the algorithm doesn't grow the partitions or doesn't grow them beyond some constant factor of the partitions sizes of the input.

Could also be a nice application for liquid haskell.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions