Try to be smarter with our resolve_gep calls to not duplicate work with resolve_check_bounds#71
Try to be smarter with our resolve_gep calls to not duplicate work with resolve_check_bounds#71srosefield-riverside wants to merge 3 commits intomainfrom
Conversation
fdb4797 to
ac65e27
Compare
…th resolve_check_bounds - Insert stack tracking at more appropriate location if we find llvm lifetime markers - We only ever invalidate one address at a time, so do that instead of invalidating a range which is more expensive. - Move stack allocations to their own thread_local so we dont need to acquire a lock. - Reduce the number of times we call invalidate_stack - Hint that the resolve_gep branch is likely true. - coalesce neighboring GEP calls
81ef0e7 to
27c85bc
Compare
…ursuit of merging bounds checks and the actual access
|
This has collected a couple of different things, I'm still not sure about the gep changes, but we should merge the stack improvements separately? Can you pull out the stack instrumentation using lifetime annotations into its own pr? Maybe also the thread local stack object map? |
|
Sure let me know which things you want me to extract. Technically if we just remove this block or add a "strategy != SAFE" in https://github.com/riversideresearch/resolve/pull/71/files#diff-dab9e2f70474350daa0981a1b1db050296682bd2a4f6b1b83c5a4a768d5215d1R723 then we'll also call resolve_check_bounds after the GEPs. |
|
Also FWIW this is what the flamegraph on redis looks like with these changes. There is still a substantial amount of locking overhead on |
|
Flame graph looks even better! Still seems like a lot of synchronization overhead though. Why do thread locals have atomic operations in them again? Is it related to initialization? If you can extract from the logs what the pointers are that are untracked, we can make a plan to track them... |
I can't tell from the link which block you're referring to? |
|
Ah, because the "bounds_check.cpp" is collapsed by default it doesn't jump to the line, which is The atomics are indeed for initialization (and destruction). It needs to check if it has been created yet. |
Quickly checking some logs, there are millions on lines like and from strace it looks like they are in the address spaces returned by |
|
So maybe these are heap allocations coming from libc but not malloc? Hmm... |
Sorry I missed this part of your comment. Basically, just I want to hold off on the other performance tweaks until we have a chance to discuss overall strategy for performance |
GEP knows the size of the type the derived pointer points at, so we can just include that bounds check directly in our
resolve_gepcall. Then we can elide any call toresolve_check_boundsthat is based on aresolve_gepoutput as bounds checking was already completed.The stack allocation lifetime tracking was updated to use llvm's lifetime markers if they are available. This also helped with an error in optimized builds where stack addresses were re-used but we marked them as freed after the first lifetime ended.