You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OpenSearch Query Insights currently identifies slow and resource-intensive queries but provides no actionable guidance on how to fix them. Users face several critical issues:
Silent Failures: Queries like {"term": {"status": "Active"}} on text fields return 0 results without warnings
Performance Issues: Leading wildcards (*search) cause 100-1000x slowdowns with no indication why
Safety Risks: Sorting on text fields can trigger OOM crashes that bring down entire clusters
Knowledge Barrier: Fixing these issues requires deep OpenSearch expertise that most users lack
For example:
User sees: Query latency = 5,200ms in Top N Queries dashboard
User thinks: "Why is this slow? What should I do?"
User outcome: Spends hours searching documentation, may not find solution
What solution would you like?
A rule-based recommendation engine integrated into Query Insights that:
Automatically analyzes queries from Top N Queries and potentially profiler requests
Detects anti-patterns using predefined rules (Rules TBD)
Generates actionable recommendations with:
Clear problem description
Impact assessment (latency, memory, correctness)
Specific fix with code examples
Confidence scores
Surfaces recommendations through:
Top N Queries dashboard (inline badges and panels)
Query profiler page (on-demand analysis)
REST API endpoints (see below for supports)
sequenceDiagram
participant User
participant Dashboard
participant API as REST API
participant Service as RecommendationService
participant Context as QueryContext
participant Rules as Rule Engine
participant Cache as Metadata Cache
User->>Dashboard: View Top N Queries
Dashboard->>API: GET /_insights/top_queries?recommendations=true
API->>Service: analyzeTopQueries(records)
loop For each query record
Service->>Context: build(record)
Context->>Cache: getFieldType(field)
Cache-->>Context: "text"
Context->>Cache: getFieldCardinality(field)
Cache-->>Context: 10000000
Service->>Rules: evaluate(context)
Rules->>Rules: match all active rules
Rules-->>Service: List<Recommendation>
Service->>Service: attach recommendations to record
Service->>Service: store.put(queryHash, recommendations)
end
Service-->>API: List<QueryRecord with recommendations>
API-->>Dashboard: top queries with recommendations embedded
Dashboard-->>User: Show recommendations inline
User->>Dashboard: Click specific query in Top N list
Dashboard->>API: GET /_insights/recommendations/{queryId}
API->>Service: getRecommendations(queryId)
Service->>Service: store.get(queryHash)
Service-->>API: List<Recommendation>
API-->>Dashboard: recommendations for specific query
Dashboard-->>User: Show detailed recommendations
User->>Dashboard: Click "Analyze Query" (Profiler)
Dashboard->>API: POST /_insights/recommendations/analyze
API->>Service: analyzeQuery(query, indices)
Service->>Context: build(query)
Service->>Rules: evaluate(context)
Rules-->>Service: List<Recommendation>
Service-->>API: recommendations
API-->>Dashboard: recommendations with code examples
Dashboard-->>User: Display recommendations + copy button
Loading
Key factors:
Asynchronous Processing: Recommendation generation happens off the search path (zero query latency impact)
Cached Metadata if possible: Field types and cardinality cached for O(1) lookups (no cluster state queries during rule evaluation)
Fail-Safe: Errors in recommendation engine never propagate to query execution
What alternatives have you considered?
We can build recommendation as separate service outside OpenSearch cluster, let QI provide as much metadata as possible.
Pros:
Language flexibility (could use Python for ML)
Independent scaling
Isolation from cluster
Cons:
Data export requirements: To export top queries to external sinks, we must mask/remove sensitive information (usernames, IP addresses, PII in query values), losing critical query details needed for analysis
Loss of query context: External service cannot access cluster metadata like field types, field cardinality, index settings, and workload group configurations that are essential for rule evaluation
Rule-based recommendations become nearly impossible: almost any useful query specific rules require analyzing query context / metadata / the exact query pattern (*search), which may be masked during export.
Network latency: Additional hop for recommendation generation
Security concerns: Exporting query data outside the cluster increases attack surface
Overhead on emiting metrics: It is impossible to emit all required metrics for recommendation on external service, it will also add extra overhead to the cluster (so this is not like "NO Impact at all" with this approach).
Decision: Keep recommendation engine in-plugin. Rule-based recommendations fundamentally depend on having access to:
Exact query structure (e.g., detecting * at start of wildcard pattern)
Cluster metadata (field types, cardinality, index settings)
Real-time context (workload groups, current cluster state)
Do you have any additional context?
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem?
OpenSearch Query Insights currently identifies slow and resource-intensive queries but provides no actionable guidance on how to fix them. Users face several critical issues:
{"term": {"status": "Active"}}on text fields return 0 results without warnings*search) cause 100-1000x slowdowns with no indication whyFor example:
What solution would you like?
A rule-based recommendation engine integrated into Query Insights that:
sequenceDiagram participant User participant Dashboard participant API as REST API participant Service as RecommendationService participant Context as QueryContext participant Rules as Rule Engine participant Cache as Metadata Cache User->>Dashboard: View Top N Queries Dashboard->>API: GET /_insights/top_queries?recommendations=true API->>Service: analyzeTopQueries(records) loop For each query record Service->>Context: build(record) Context->>Cache: getFieldType(field) Cache-->>Context: "text" Context->>Cache: getFieldCardinality(field) Cache-->>Context: 10000000 Service->>Rules: evaluate(context) Rules->>Rules: match all active rules Rules-->>Service: List<Recommendation> Service->>Service: attach recommendations to record Service->>Service: store.put(queryHash, recommendations) end Service-->>API: List<QueryRecord with recommendations> API-->>Dashboard: top queries with recommendations embedded Dashboard-->>User: Show recommendations inline User->>Dashboard: Click specific query in Top N list Dashboard->>API: GET /_insights/recommendations/{queryId} API->>Service: getRecommendations(queryId) Service->>Service: store.get(queryHash) Service-->>API: List<Recommendation> API-->>Dashboard: recommendations for specific query Dashboard-->>User: Show detailed recommendations User->>Dashboard: Click "Analyze Query" (Profiler) Dashboard->>API: POST /_insights/recommendations/analyze API->>Service: analyzeQuery(query, indices) Service->>Context: build(query) Service->>Rules: evaluate(context) Rules-->>Service: List<Recommendation> Service-->>API: recommendations API-->>Dashboard: recommendations with code examples Dashboard-->>User: Display recommendations + copy buttonKey factors:
What alternatives have you considered?
We can build recommendation as separate service outside OpenSearch cluster, let QI provide as much metadata as possible.
Pros:
Cons:
*search), which may be masked during export.Decision: Keep recommendation engine in-plugin. Rule-based recommendations fundamentally depend on having access to:
*at start of wildcard pattern)Do you have any additional context?
Add any other context or screenshots about the feature request here.