Skip to content

Ongoing Performance Issues with MySQL #33

@larsborn

Description

@larsborn

Our performance issues are almost certainly caused by this query:

(SELECT id FROM tbl_sample_sources WHERE source LIKE '%$queryValue%' LIMIT 100)
UNION
(SELECT id from tbl_samples WHERE JSON_SEARCH( lower(yara->'$.yara'), 'all', '$queryValue') IS NOT NULL LIMIT 100)

Both UNION-parts of it are not supported by an index right now. This leads to two full table scans of the (very large) tbl_samples table for every search request.

  1. I suggest we change that for the first half by removing the %. This will effectively decrease our feature set there (we only support prefix queries) but I think the performance benefit is worth it. Down the road, we could build a full-text search on the source field to add that feature back in.
  2. the second half of the UNION is probably harder to fix: indexing fields in JSON seems to be supported in MySQL by creating a virtual column and then putting an index on that. Our situation here is a bit more complex though, the yara field in the JSON in the yara [sic] column is a list. I'll research if that's even supported. If not, I suggest we properly de-normalize: have a tbl_yara table containing all YARA rule names and have a tbl_matches table relating entries in that table to rows in the tbl_sample table. For this, we need to touch the YARA scanning process and build a tool to migrate existing data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions