Summary
buildFtsQuery() (L10032 in dist/index.mjs) tokenizes user input and joins tokens with OR. It does not strip FTS5 special operators or syntax from the input, allowing users to alter query semantics.
Root Cause
L10032-L10055 in dist/index.mjs:
function buildFtsQuery(raw) {
// ... tokenize via jieba or regex ...
return tokens.map((t) => `"${t.replaceAll('"', "")}"`).join(" OR ");
}
The function wraps each token in double quotes, which mitigates most injection vectors. However, if jieba is not available (fallback path at L10044), the regex [\p{L}\p{N}_]+ strips most operators but not all FTS5 syntax.
More importantly, the explicit join(" OR ") means every query is an OR-query. There is no way for the pipeline to construct AND/NOT queries intentionally, but the OR behavior is hardcoded and cannot be overridden.
Impact
Medium severity. The double-quoting of individual tokens limits practical exploitation, but the lack of explicit sanitization of FTS5 operators (AND, OR, NOT, NEAR) before tokenization is a defense-in-depth gap.
Suggested Fix
Strip FTS5 operators from the raw input string before tokenization:
const FTS5_OPS = /�(AND|OR|NOT|NEAR)�/gi;
const cleaned = raw.replace(FTS5_OPS, " ");
~3 lines change.
Summary
buildFtsQuery()(L10032 in dist/index.mjs) tokenizes user input and joins tokens withOR. It does not strip FTS5 special operators or syntax from the input, allowing users to alter query semantics.Root Cause
L10032-L10055 in dist/index.mjs:
The function wraps each token in double quotes, which mitigates most injection vectors. However, if
jiebais not available (fallback path at L10044), the regex[\p{L}\p{N}_]+strips most operators but not all FTS5 syntax.More importantly, the explicit
join(" OR ")means every query is an OR-query. There is no way for the pipeline to construct AND/NOT queries intentionally, but the OR behavior is hardcoded and cannot be overridden.Impact
Medium severity. The double-quoting of individual tokens limits practical exploitation, but the lack of explicit sanitization of FTS5 operators (AND, OR, NOT, NEAR) before tokenization is a defense-in-depth gap.
Suggested Fix
Strip FTS5 operators from the raw input string before tokenization:
~3 lines change.