-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
Our performance issues are almost certainly caused by this query:
(SELECT id FROM tbl_sample_sources WHERE source LIKE '%$queryValue%' LIMIT 100)
UNION
(SELECT id from tbl_samples WHERE JSON_SEARCH( lower(yara->'$.yara'), 'all', '$queryValue') IS NOT NULL LIMIT 100)
Both UNION-parts of it are not supported by an index right now. This leads to two full table scans of the (very large) tbl_samples table for every search request.
- I suggest we change that for the first half by removing the
%. This will effectively decrease our feature set there (we only support prefix queries) but I think the performance benefit is worth it. Down the road, we could build a full-text search on thesourcefield to add that feature back in. - the second half of the
UNIONis probably harder to fix: indexing fields in JSON seems to be supported in MySQL by creating a virtual column and then putting an index on that. Our situation here is a bit more complex though, theyarafield in the JSON in theyara[sic] column is a list. I'll research if that's even supported. If not, I suggest we properly de-normalize: have atbl_yaratable containing all YARA rule names and have atbl_matchestable relating entries in that table to rows in thetbl_sampletable. For this, we need to touch the YARA scanning process and build a tool to migrate existing data.
Metadata
Metadata
Assignees
Labels
No labels