Avoided string split() and join() in HTML Canonicalization.#41
Avoided string split() and join() in HTML Canonicalization.#41adon-at-work wants to merge 2 commits intomasterfrom
Conversation
- `string.split(‘’)` was found very slow on large string - used an equiv. string-based manipulation
- tests the performance with different configurations
|
@yukinying / @maditya / @neraliu , this is the first round of changes to significantly boost the performance of context-parser. please review. I find this PR change is well-justified by the performance improvement / lines of code changes required. We're certainly aware of other ways to further optimize the code, but that may involve interface changes. My advice is to discuss them elsewhere, and perhaps get those juices squeezed via separate PRs. Thanks, |
|
@maditya , FYI, I also benchmarked based on the html-purify's fix-tag-balancing branch. With this new CP, the html-purify performance of processing the 1m file is improved by ~3x too (from 0.75 to 2.22MB/s) |
|
I know that you are very tempted with the "3x" performance improvement, and I would recommend you start benchmarking for the case without your canonization logic, and then see how many percentage of time is taken in your canonization logic as compared with the original context parsing logic. I would expect it should take around 1:1 time. Using that as a goal would be more convincing to me. |
String.split('')andString.join('')was found very slow when given a large string