Skip to content

Xeno.DOM: Heap exhausted on a 5.6M file #65

@unhammer

Description

@unhammer

longlines.xml.zip
↑ through xeno-dom exhaust heap memory. I just put the file into the list in SpeedBigFiles.hs as
[ benchFile ["xeno-dom"] "6MB" "longlines.xml.bz2"
and got

benchmarking 6M/xeno-dom
xeno-speed-big-files-bench: Heap exhausted;
xeno-speed-big-files-bench: Current maximum heap size is 26843545600 bytes (25600 MB).

Strangely, only minor changes to the file (e.g. sed 's/x/xx/gincreasing the file size) will let it through with about 800M maxresident (as reported by /usr/bin/time). Inserting newlines after each > we also get 800M maxresident, but it doesn't seem to be related to the long lines, as almost any change to the file helps.

(Yes I should be using Xeno.SAX, but why does e.g. https://dumps.wikimedia.org/nowiki/20230520/nowiki-20230520-pages-articles-multistream-index.txt.bz2 at 11M go through fine with <400M maxresident and this one not? Even removing newlines, the wiki works fine. This feels like leakage.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions