Skip to content

Switch away from using PCRE2 alone, and use a native Rust regex engine first#320

Draft
orowith2os wants to merge 1 commit into
composefs:mainfrom
orowith2os:dual-regex-pcre2
Draft

Switch away from using PCRE2 alone, and use a native Rust regex engine first#320
orowith2os wants to merge 1 commit into
composefs:mainfrom
orowith2os:dual-regex-pcre2

Conversation

@orowith2os

Copy link
Copy Markdown
Contributor

The first command is a secureblue image (which doesn't have any policies using lookarounds), the latter is a bazzite image (which does have a few)

This is with latest main (which uses PCRE2 on its own)

oro@stitch ~> time sudo bootc/pcre2-bootc container compute-composefs-digest scb-rootfs/
08ed265ec132aefacd211f44127007e4276833d2b5d38c696c81e07fdec470382e3a6800b7302b85600d96fd40df462470306eefeb357abfe1be8595b3f73085

________________________________________________________
Executed in   77.53 secs      fish           external
   usr time    7.47 millis  967.00 micros    6.50 millis
   sys time    9.58 millis    0.00 micros    9.58 millis

oro@stitch ~> time sudo bootc/pcre2-bootc container compute-composefs-digest /ostree/boot.1.0/default/40ad2cc41a58917bee1e7defa7fd9fcdbbd92c688b51d26c64ee4ea4f6970f79/0/
53b063f0f60f3864c0babb869c2c1e7c36c7526a51d15bd1c0d52222f86f3d9667609d3db4bc527dcdd777f849633bcceb978d50c77dda4bdb8d16ab052fa9a5

________________________________________________________
Executed in  130.88 secs      fish           external
   usr time   11.96 millis  330.00 micros   11.63 millis
   sys time    6.18 millis  307.00 micros    5.88 millis

The following is with this patch:

oro@stitch ~> time sudo Projects/bootc/target/release/bootc container compute-composefs-digest scb-rootfs
08ed265ec132aefacd211f44127007e4276833d2b5d38c696c81e07fdec470382e3a6800b7302b85600d96fd40df462470306eefeb357abfe1be8595b3f73085

________________________________________________________
Executed in   35.65 secs      fish           external
   usr time    6.46 millis    0.00 micros    6.46 millis
   sys time   12.85 millis  848.00 micros   12.00 millis

oro@stitch ~> time sudo Projects/bootc/target/release/bootc container compute-composefs-digest /ostree/boot.1.0/default/40ad2cc41a58917bee1e7defa7fd9fcdbbd92c688b51d26c64ee4ea4f6970f79/0/
86dbfe101a2557d9aa3077795cd4f736169ea31a0bc8c1408579269276ad935d792c0b8929899f7a6088d518ccd8dc29794235d3adf79c18f5af62fb03cb0230

________________________________________________________
Executed in   55.85 secs      fish           external
   usr time    3.16 millis  539.00 micros    2.62 millis
   sys time   16.04 millis  446.00 micros   15.59 millis

@orowith2os

orowith2os commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

I lost some code in git shenanigans that actually got it about on par with what it was way way back (using a lazy dfa and whatnot), where no lookaround policies would lead to about a <16 second digest and lookarounds about 24 seconds, but I'm not quite sure how to reproduce that. For now, this is the best I can do.

This is also generally a 50% or so performance improvement compared to
just using PCRE2, even if it is worse than the original path (which
didn't support look-arounds, and blew up on certain SELinux policies).

Signed-off-by: Dallas Strouse <dallas.strouse2007@gmail.com>
@cgwalters

Copy link
Copy Markdown
Collaborator

Hi, thanks for working on this.

Let's add some unit tests using a few representative regexps.

Isn't the real win we had previously compiling a bunch of regexps into a single DFA? I wonder if we can just do a similar trick with PCRE. Alternatively, an idea I had was to divide up the regexps into groups (say 5%) and try build_many on them, and if that fails fall back to PCRE just for that group.

@cgwalters

Copy link
Copy Markdown
Collaborator

I'd also reiterate though, my strongest preference here really would be to have an opt-in feature for composefs-boot to use the libselinux crate. The downside is that the examples here are "unclean" and compile a binary on the host (Ubuntu 24.04) and drop that into a variety of Linux distros and that only works because cfsctl only links to glibc, and it would likely break with more libraries (esp given not all OS/distros use selinux). But that's fixable by reworking the build process to be a proper multi-stage in the container, much like how the integration tests work.

@orowith2os

orowith2os commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Isn't the real win we had previously compiling a bunch of regexps into a single DFA? I wonder if we can just do a similar trick with PCRE. Alternatively, an idea I had was to divide up the regexps into groups (say 5%) and try build_many on them, and if that fails fall back to PCRE just for that group.

It was, yeah, That's the code I had, and lost because I messed up with git. I'm trying to reproduce it, but the checksum always comes out wrong. I don't think the pcre2 crate exposes any API like that, though.

I'd also reiterate though, my strongest preference here really would be to have an opt-in feature for composefs-boot to use the libselinux crate.

I wouldn't be against this, I just don't find the documentation for the libselinux crate to be pleasant, and I don't know just how much work it would take to switch to it.

@orowith2os orowith2os marked this pull request as draft June 23, 2026 19:02
@cgwalters

Copy link
Copy Markdown
Collaborator

OK I put this up: #329 - it definitely gives us the perf back, can you review?

I'm trying to reproduce it, but the checksum always comes out wrong

Yes, we need to ensure we're producing exactly the same matches as before. One thing I notice here in this PR is you are not preserving the original relative indices, which 329 is doing.

@orowith2os

Copy link
Copy Markdown
Contributor Author
  • it definitely gives us the perf back, can you review?

having the perf back is good, but I don't like the code at all. Though, given it's already merged, I can close this if you don't want to go through the hassle of reviewing and merging yet another regex-changing PR.

@cgwalters

Copy link
Copy Markdown
Collaborator

having the perf back is good, but I don't like the code at all.

What don't you like? Happy to take more changes!

Though, given it's already merged, I can close this if you don't want to go through the hassle of reviewing and merging yet another regex-changing PR.

I'm sure it can be improved more!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants