longlines.xml.zip
↑ through xeno-dom exhaust heap memory. I just put the file into the list in SpeedBigFiles.hs as
[ benchFile ["xeno-dom"] "6MB" "longlines.xml.bz2"
and got
benchmarking 6M/xeno-dom
xeno-speed-big-files-bench: Heap exhausted;
xeno-speed-big-files-bench: Current maximum heap size is 26843545600 bytes (25600 MB).
Strangely, only minor changes to the file (e.g. sed 's/x/xx/g – increasing the file size) will let it through with about 800M maxresident (as reported by /usr/bin/time). Inserting newlines after each > we also get 800M maxresident, but it doesn't seem to be related to the long lines, as almost any change to the file helps.
(Yes I should be using Xeno.SAX, but why does e.g. https://dumps.wikimedia.org/nowiki/20230520/nowiki-20230520-pages-articles-multistream-index.txt.bz2 at 11M go through fine with <400M maxresident and this one not? Even removing newlines, the wiki works fine. This feels like leakage.)
longlines.xml.zip
↑ through xeno-dom exhaust heap memory. I just put the file into the list in SpeedBigFiles.hs as
[ benchFile ["xeno-dom"] "6MB" "longlines.xml.bz2"and got
Strangely, only minor changes to the file (e.g.
sed 's/x/xx/g– increasing the file size) will let it through with about 800M maxresident (as reported by /usr/bin/time). Inserting newlines after each>we also get 800M maxresident, but it doesn't seem to be related to the long lines, as almost any change to the file helps.(Yes I should be using Xeno.SAX, but why does e.g. https://dumps.wikimedia.org/nowiki/20230520/nowiki-20230520-pages-articles-multistream-index.txt.bz2 at 11M go through fine with <400M maxresident and this one not? Even removing newlines, the wiki works fine. This feels like leakage.)