Skip to content

Recursive/Incremental file listing in HDFSInputFormat #302

@kygx-legend

Description

@kygx-legend

In current version, HDFSInputFormat reads the first directory(path) only. For example, if the path is /data, it will list the directory of /data and read the items(must be file) like /data/a and /data/b.

In order to be more flexible, it could support reading an organized path recursively(all files are in the last directories). For example, if the data is stored as a time-based path like /data/year/month/dates/FILES, it prefers scanning all items in path '/data' rather than giving a concrete path '/data/year/month/dates`. Of course, we need to set the maximum recursive layers to avoid the tremendous reading.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions