Switch away from using PCRE2 alone, and use a native Rust regex engine first#320
Switch away from using PCRE2 alone, and use a native Rust regex engine first#320orowith2os wants to merge 1 commit into
Conversation
d7ff1fa to
22ca1bf
Compare
|
I lost some code in git shenanigans that actually got it about on par with what it was way way back (using a lazy dfa and whatnot), where no lookaround policies would lead to about a <16 second digest and lookarounds about 24 seconds, but I'm not quite sure how to reproduce that. For now, this is the best I can do. |
This is also generally a 50% or so performance improvement compared to just using PCRE2, even if it is worse than the original path (which didn't support look-arounds, and blew up on certain SELinux policies). Signed-off-by: Dallas Strouse <dallas.strouse2007@gmail.com>
22ca1bf to
1a99a53
Compare
|
Hi, thanks for working on this. Let's add some unit tests using a few representative regexps. Isn't the real win we had previously compiling a bunch of regexps into a single DFA? I wonder if we can just do a similar trick with PCRE. Alternatively, an idea I had was to divide up the regexps into groups (say 5%) and try |
|
I'd also reiterate though, my strongest preference here really would be to have an opt-in feature for composefs-boot to use the libselinux crate. The downside is that the examples here are "unclean" and compile a binary on the host (Ubuntu 24.04) and drop that into a variety of Linux distros and that only works because |
It was, yeah, That's the code I had, and lost because I messed up with git. I'm trying to reproduce it, but the checksum always comes out wrong. I don't think the pcre2 crate exposes any API like that, though.
I wouldn't be against this, I just don't find the documentation for the libselinux crate to be pleasant, and I don't know just how much work it would take to switch to it. |
|
OK I put this up: #329 - it definitely gives us the perf back, can you review?
Yes, we need to ensure we're producing exactly the same matches as before. One thing I notice here in this PR is you are not preserving the original relative indices, which 329 is doing. |
having the perf back is good, but I don't like the code at all. Though, given it's already merged, I can close this if you don't want to go through the hassle of reviewing and merging yet another regex-changing PR. |
What don't you like? Happy to take more changes!
I'm sure it can be improved more! |
The first command is a secureblue image (which doesn't have any policies using lookarounds), the latter is a bazzite image (which does have a few)
This is with latest
main(which uses PCRE2 on its own)The following is with this patch: