perf: read_ply replace pandas.read_csv engine=python with c; improve read_off header-parsing robustness#352
Open
YodaEmbedding wants to merge 2 commits intodaavoo:mainfrom
Open
Conversation
001ce2c to
bd7fabb
Compare
bd7fabb to
9d2a3fb
Compare
fa353ac to
4c5211b
Compare
Improve robustness of header parsing a bit. In particular, ModelNet40 has faulty headers: ```bash $ head -n 1 ModelNet40/chair/train/chair_0856.off OFF6586 5534 0 ``` For reference, the correct format is: ``` OFF 6586 5534 0 ``` Nonetheless, it is still valuable to parse the faulty header. Also, reuse already open file for reading instead of opening it twice.
4c5211b to
12ee9f2
Compare
Author
|
Both this PR and #353 improved pandas performance for *.OFF files with Future work: DetailsOnce this is reviewed/accepted, I can look into improving compatibility with Wikipedia's description of the *.OFF file format. Of course, perfect compatibility is too slow, but there's still some missing features:
|
8 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
UPDATE: I have rebased this PR on top of the latest commit. The revised changes are:
color->has_color; andcount->n_header)In particular, ModelNet40 has faulty headers:
For reference, the correct format is:
Nonetheless, it is still valuable to parse the faulty header.
(Original text before #353 was merged)
Big performance improvement by removing the need to use the slow
engine="python"by reading the sliced file from an in-memory StringIO buffer.Also fixes bug where OFF files containing more lines than
num_points + num_facestries to read potential edges as faces!As Wikipedia says, the OFF file may contain:
Of course, this still does not encompass all possible OFF file variants described by Wikipedia, but it's an improvement.