handling string by devzer01 · Pull Request #18 · mifrandir/csv-split

devzer01 · 2022-12-25T04:56:37Z

the column count versifier was not taking enclosed string columns as a single column and was splitting columns with ',' inside the string. I did a quick fix for it, Hope it's good enough, also some compiler warnings on flags.c about pointers being compared to none pointer types etc.

…ount is miss matched

mifrandir · 2022-12-26T10:16:18Z

Thank you for the contribution!

I believe that the problem is a bit more subtle, though. Until now the implementation has been naive in the sense I didn't try to follow the standard. If we are to make this extension then we need to choose a spec and follow it. E.g. https://csv-spec.org/.

On an implementation-level, there are things that I am currently not happy with that I shall address.

devzer01 · 2022-12-26T15:47:49Z

Sure ,

Should we implement a flag to ignore errors or offer an option to skip error ? i was dealing with a file with 249 million lines and was splitting to files of 500,000 each . I picked your tool because the fact it includes the headers on the split , and when you the program exits when there is an error without some useful info and no reasonable way to resume it took some math and dd skip bytes then sed append header etc, and then cut -d f take the first column build an index then figure out the line that was broken.

so maybe

shall we write the good portion of the read lines to the active file before exit on errors? least that way it's easy to spot the broken line
Add an option to skip error lines ?
print error lines out to standard error so the user can handle broken lines by themselves?

I will check the spec out

devzer01 · 2022-12-26T15:57:03Z

looks like if we want to stick with spec better off to integrate with https://github.com/rgamble/libcsv

what do you think ?

mifrandir · 2023-01-17T16:44:22Z

Sorry for ghosting you.

I think this tool wants to be standalone, for learning purposes; of course you can fork and do whatever you want with it, no hard feelings.

However, options 2 and 3 seem quite sensible and possibly the easiest to implement?

fixed an issue when dealing with ',' in enclosed column, the column c…

ad53bae

…ount is miss matched

devzer01 mentioned this pull request Dec 25, 2022

Unexpected number of columns #14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

handling string #18

handling string #18
devzer01 wants to merge 1 commit intomifrandir:masterfrom
devzer01:master

devzer01 commented Dec 25, 2022 •

edited

Loading

Uh oh!

mifrandir commented Dec 26, 2022

Uh oh!

devzer01 commented Dec 26, 2022

Uh oh!

devzer01 commented Dec 26, 2022

Uh oh!

mifrandir commented Jan 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

devzer01 commented Dec 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mifrandir commented Dec 26, 2022

Uh oh!

devzer01 commented Dec 26, 2022

Uh oh!

devzer01 commented Dec 26, 2022

Uh oh!

mifrandir commented Jan 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

devzer01 commented Dec 25, 2022 •

edited

Loading