Skip to content

Be able to split column #11

@flbulgarelli

Description

@flbulgarelli

Like in #10, we should be able to detect when a column extracted by pdfplumber is actually a combination of two columns, e.g.

common and scientific name
rat (mus musculus)

This is tricky and should be driven by some heuristics (e.g, dash-separated, space separated, enclosed in parenthesis, etc). We should be able to extract it (maybe controlled by a flag) as:

common name scientific name
rat mus musculus

Alternatively, we should resolve this problem with the tablemerge command, by comparing it against the columns generated by the models.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions