Note
This query model replaces current functionality. visited is dropped (reproducible via id + reach filters). services deferred to separate MR (will become queryable index field).
Summary
Migrate from the current flat query parameter model to a structured expression-based query DSL with boolean combinators (and, or, not) and field operators.
Current State
Current Query Schema (SearchQueryParametersSchema)
Location : /src/backend/core/schemas.py (lines 109-120)
class SearchQueryParametersSchema (BaseModel ):
q : str # Full-text search query
services : StringListParameter = [] # Which services to search
visited : StringListParameter = [] # Document IDs user has visited
reach : Optional [ReachEnum ] = None # PUBLIC/AUTHENTICATED/RESTRICTED
tags : StringListParameter = [] # Tag filtering (OR logic only)
path : Optional [str ] = None # Path prefix filter
order_by : Optional [Literal [...]] = "relevance" # Sort field
order_direction : Optional [Literal ["asc" ,"desc" ]] = "desc"
nb_results : Optional [int ] = 50 # Result limit (1-100)
Current Limitations
Field
Current Behavior
Limitation
tags
{"terms": {"tags": [...]}}
OR logic only - matches ANY tag
tags
No negation
Cannot exclude tags
tags
No AND logic
Cannot require ALL tags present
services
Inclusion only
Cannot exclude specific services
visited
Hardcoded in filter
Not flexible, couples access control to search
Logic
Flat structure
Cannot express complex boolean combinations
Target State
Refined Query Shape
{
"query" : " budget" ,
"where" : {
"and" : [
{ "field" : " reach" , "op" : " eq" , "value" : " restricted" },
{
"or" : [
{ "field" : " tags" , "op" : " all" , "value" : [" finance" , " approved" ] },
{ "field" : " path" , "op" : " prefix" , "value" : " /teams/legal" }
]
}
]
},
"sort" : [{ "field" : " relevance" , "direction" : " desc" }],
"limit" : 50
}
Query DSL Structure
Top-Level Schema
interface SearchQuery {
query : string ; // Full-text search query (renamed from q)
where ?: WhereClause ; // Filter expression (optional)
sort ?: SortClause [ ] ; // Sort criteria (array for multi-sort)
limit ?: number ; // Result limit (1-100, default 50)
}
Where Clause (Recursive Expression)
type WhereClause =
| { and : WhereClause [ ] } // All conditions must match
| { or : WhereClause [ ] } // Any condition must match
| { not : WhereClause } // Negate condition
| FieldCondition ; // Leaf condition
interface FieldCondition {
field : string ; // Field name
op : Operator ; // Operator
value : unknown ; // Operand value
}
Operators
Operator
Description
Value Type
OpenSearch Mapping
eq
Exact equality
string | number | boolean
{"term": {field: value}}
in
Match ANY value (OR)
string[]
{"terms": {field: values}}
all
Match ALL values (AND)
string[]
Multiple {"term": {field: value}} in must
prefix
Prefix match
string
{"prefix": {field: value}}
gt
Greater than
number | date
{"range": {field: {"gt": value}}}
gte
Greater than or equal
number | date
{"range": {field: {"gte": value}}}
lt
Less than
number | date
{"range": {field: {"lt": value}}}
lte
Less than or equal
number | date
{"range": {field: {"lte": value}}}
exists
Field exists
boolean
{"exists": {field: field}} or must_not
Migration from Old Parameters
Old Pattern
New Pattern
q=budget
query: "budget"
visited=doc1,doc2
where: { field: "id", op: "in", value: ["doc1", "doc2"] }
services=drive,wiki
where: { field: "services", op: "in", value: ["drive", "wiki"] } (after index MR)
reach=public
where: { field: "reach", op: "eq", value: "public" }
tags=finance,legal (OR)
where: { field: "tags", op: "in", value: ["finance", "legal"] }
tags with AND logic
where: { field: "tags", op: "all", value: ["finance", "legal"] }
path=/teams/
where: { field: "path", op: "prefix", value: "/teams/" }
Example Queries
Simple equality filter
{
"query" : " report" ,
"where" : { "field" : " reach" , "op" : " eq" , "value" : " public" }
}
Tags with AND logic (require all)
{
"query" : " *" ,
"where" : { "field" : " tags" , "op" : " all" , "value" : [" finance" , " Q4" , " approved" ] }
}
Exclude drafts
{
"query" : " policy" ,
"where" : {
"not" : { "field" : " tags" , "op" : " in" , "value" : [" draft" , " wip" ] }
}
}
Access control (replaces visited)
{
"query" : " *" ,
"where" : {
"and" : [
{ "field" : " id" , "op" : " in" , "value" : [" doc-uuid-1" , " doc-uuid-2" ] },
{ "field" : " reach" , "op" : " in" , "value" : [" public" , " authenticated" ] }
]
}
}
Complex boolean combination
{
"query" : " budget" ,
"where" : {
"and" : [
{ "field" : " reach" , "op" : " eq" , "value" : " restricted" },
{
"or" : [
{ "field" : " tags" , "op" : " in" , "value" : [" finance" ] },
{ "field" : " path" , "op" : " prefix" , "value" : " /teams/legal" }
]
},
{ "not" : { "field" : " tags" , "op" : " in" , "value" : [" archived" ] } }
]
}
}
Date range filter
{
"query" : " *" ,
"where" : {
"and" : [
{ "field" : " created_at" , "op" : " gte" , "value" : " 2024-01-01" },
{ "field" : " created_at" , "op" : " lt" , "value" : " 2025-01-01" }
]
}
}
Pydantic Schema Design
from typing import Optional , List , Literal , Union
from pydantic import BaseModel , Field
from enum import Enum
class Operator (str , Enum ):
EQ = "eq"
IN = "in"
ALL = "all"
PREFIX = "prefix"
GT = "gt"
GTE = "gte"
LT = "lt"
LTE = "lte"
EXISTS = "exists"
class FieldCondition (BaseModel ):
field : str
op : Operator
value : Union [str , int , float , bool , List [str ], List [int ]]
class AndClause (BaseModel ):
and_ : List ["WhereClause" ] = Field (alias = "and" )
class OrClause (BaseModel ):
or_ : List ["WhereClause" ] = Field (alias = "or" )
class NotClause (BaseModel ):
not_ : "WhereClause" = Field (alias = "not" )
WhereClause = Union [AndClause , OrClause , NotClause , FieldCondition ]
# Enable forward references
AndClause .model_rebuild ()
OrClause .model_rebuild ()
NotClause .model_rebuild ()
class SortClause (BaseModel ):
field : Literal ["relevance" , "title" , "created_at" , "updated_at" , "size" ] = "relevance"
direction : Literal ["asc" , "desc" ] = "desc"
class SearchQuerySchema (BaseModel ):
"""Schema for structured query DSL - replaces SearchQueryParametersSchema"""
query : str
where : Optional [WhereClause ] = None
sort : Optional [List [SortClause ]] = None
limit : Optional [int ] = Field (default = 50 , ge = 1 , le = 100 )
OpenSearch Query Builder
def build_opensearch_filter (where : WhereClause ) -> dict :
"""Recursively build OpenSearch bool query from WhereClause."""
if isinstance (where , AndClause ):
return {
"bool" : {
"must" : [build_opensearch_filter (c ) for c in where .and_ ]
}
}
if isinstance (where , OrClause ):
return {
"bool" : {
"should" : [build_opensearch_filter (c ) for c in where .or_ ],
"minimum_should_match" : 1
}
}
if isinstance (where , NotClause ):
return {
"bool" : {
"must_not" : [build_opensearch_filter (where .not_ )]
}
}
# FieldCondition
return build_field_condition (where )
def build_field_condition (cond : FieldCondition ) -> dict :
"""Build OpenSearch clause from field condition."""
# Map external field names to OpenSearch field names
field = "_id" if cond .field == "id" else cond .field
match cond .op :
case Operator .EQ :
return {"term" : {field : cond .value }}
case Operator .IN :
return {"terms" : {field : cond .value }}
case Operator .ALL :
# All values must match - multiple term queries in bool.must
return {
"bool" : {
"must" : [{"term" : {field : v }} for v in cond .value ]
}
}
case Operator .PREFIX :
return {"prefix" : {field : cond .value }}
case Operator .GT | Operator .GTE | Operator .LT | Operator .LTE :
return {"range" : {field : {cond .op .value : cond .value }}}
case Operator .EXISTS :
clause = {"exists" : {"field" : field }}
return clause if cond .value else {"bool" : {"must_not" : [clause ]}}
Files to Modify
File
Changes
src/backend/core/schemas.py
New recursive WhereClause schema, SearchQuerySchema
src/backend/core/services/search.py
New build_opensearch_filter() function, update get_query()
src/backend/core/views.py
Update SearchDocumentView to use new schema
src/backend/core/enums.py
Add Operator enum
Acceptance Criteria
Labels
enhancement, api, breaking-change
Note
This query model replaces current functionality.
visitedis dropped (reproducible viaid+reachfilters).servicesdeferred to separate MR (will become queryable index field).Summary
Migrate from the current flat query parameter model to a structured expression-based query DSL with boolean combinators (
and,or,not) and field operators.Current State
Current Query Schema (
SearchQueryParametersSchema)Location:
/src/backend/core/schemas.py(lines 109-120)Current Limitations
tags{"terms": {"tags": [...]}}tagstagsservicesvisitedTarget State
Refined Query Shape
{ "query": "budget", "where": { "and": [ { "field": "reach", "op": "eq", "value": "restricted" }, { "or": [ { "field": "tags", "op": "all", "value": ["finance", "approved"] }, { "field": "path", "op": "prefix", "value": "/teams/legal" } ] } ] }, "sort": [{ "field": "relevance", "direction": "desc" }], "limit": 50 }Query DSL Structure
Top-Level Schema
Where Clause (Recursive Expression)
Operators
eqstring | number | boolean{"term": {field: value}}instring[]{"terms": {field: values}}allstring[]{"term": {field: value}}inmustprefixstring{"prefix": {field: value}}gtnumber | date{"range": {field: {"gt": value}}}gtenumber | date{"range": {field: {"gte": value}}}ltnumber | date{"range": {field: {"lt": value}}}ltenumber | date{"range": {field: {"lte": value}}}existsboolean{"exists": {field: field}}ormust_notMigration from Old Parameters
q=budgetquery: "budget"visited=doc1,doc2where: { field: "id", op: "in", value: ["doc1", "doc2"] }services=drive,wikiwhere: { field: "services", op: "in", value: ["drive", "wiki"] }(after index MR)reach=publicwhere: { field: "reach", op: "eq", value: "public" }tags=finance,legal(OR)where: { field: "tags", op: "in", value: ["finance", "legal"] }tagswith AND logicwhere: { field: "tags", op: "all", value: ["finance", "legal"] }path=/teams/where: { field: "path", op: "prefix", value: "/teams/" }Example Queries
Simple equality filter
{ "query": "report", "where": { "field": "reach", "op": "eq", "value": "public" } }Tags with AND logic (require all)
{ "query": "*", "where": { "field": "tags", "op": "all", "value": ["finance", "Q4", "approved"] } }Exclude drafts
{ "query": "policy", "where": { "not": { "field": "tags", "op": "in", "value": ["draft", "wip"] } } }Access control (replaces visited)
{ "query": "*", "where": { "and": [ { "field": "id", "op": "in", "value": ["doc-uuid-1", "doc-uuid-2"] }, { "field": "reach", "op": "in", "value": ["public", "authenticated"] } ] } }Complex boolean combination
{ "query": "budget", "where": { "and": [ { "field": "reach", "op": "eq", "value": "restricted" }, { "or": [ { "field": "tags", "op": "in", "value": ["finance"] }, { "field": "path", "op": "prefix", "value": "/teams/legal" } ] }, { "not": { "field": "tags", "op": "in", "value": ["archived"] } } ] } }Date range filter
{ "query": "*", "where": { "and": [ { "field": "created_at", "op": "gte", "value": "2024-01-01" }, { "field": "created_at", "op": "lt", "value": "2025-01-01" } ] } }Pydantic Schema Design
OpenSearch Query Builder
Files to Modify
src/backend/core/schemas.pyWhereClauseschema,SearchQuerySchemasrc/backend/core/services/search.pybuild_opensearch_filter()function, updateget_query()src/backend/core/views.pySearchDocumentViewto use new schemasrc/backend/core/enums.pyOperatorenumAcceptance Criteria
WhereClauseschema validates nested boolean expressionsand/or/notcombinators work at any nesting deptheqoperator performs exact term matchinoperator matches ANY value (OR semantics)alloperator requires ALL values (AND semantics)prefixoperator performs prefix matchgt/gte/lt/lteoperators work for dates and numbersexistsoperator checks field presencesortaccepts array for multi-field sortingidfield maps to OpenSearch_idinternallyLabels
enhancement, api, breaking-change