Skip to content

fix(search): buildFtsQuery does not sanitize FTS5 operators — user input alters query semantics #160

@MuBeiGe

Description

@MuBeiGe

Summary

buildFtsQuery() (L10032 in dist/index.mjs) tokenizes user input and joins tokens with OR. It does not strip FTS5 special operators or syntax from the input, allowing users to alter query semantics.

Root Cause

L10032-L10055 in dist/index.mjs:

function buildFtsQuery(raw) {
    // ... tokenize via jieba or regex ...
    return tokens.map((t) => `"${t.replaceAll('"', "")}"`).join(" OR ");
}

The function wraps each token in double quotes, which mitigates most injection vectors. However, if jieba is not available (fallback path at L10044), the regex [\p{L}\p{N}_]+ strips most operators but not all FTS5 syntax.

More importantly, the explicit join(" OR ") means every query is an OR-query. There is no way for the pipeline to construct AND/NOT queries intentionally, but the OR behavior is hardcoded and cannot be overridden.

Impact

Medium severity. The double-quoting of individual tokens limits practical exploitation, but the lack of explicit sanitization of FTS5 operators (AND, OR, NOT, NEAR) before tokenization is a defense-in-depth gap.

Suggested Fix

Strip FTS5 operators from the raw input string before tokenization:

const FTS5_OPS = /(AND|OR|NOT|NEAR)/gi;
const cleaned = raw.replace(FTS5_OPS, " ");

~3 lines change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions