Like in #10, we should be able to detect when a column extracted by pdfplumber is actually a combination of two columns, e.g.
| common and scientific name |
| rat (mus musculus) |
This is tricky and should be driven by some heuristics (e.g, dash-separated, space separated, enclosed in parenthesis, etc). We should be able to extract it (maybe controlled by a flag) as:
| common name |
scientific name |
| rat |
mus musculus |
Alternatively, we should resolve this problem with the tablemerge command, by comparing it against the columns generated by the models.
Like in #10, we should be able to detect when a column extracted by
pdfplumberis actually a combination of two columns, e.g.This is tricky and should be driven by some heuristics (e.g, dash-separated, space separated, enclosed in parenthesis, etc). We should be able to extract it (maybe controlled by a flag) as:
Alternatively, we should resolve this problem with the
tablemergecommand, by comparing it against the columns generated by the models.