[FEATURE] Rule-Based Query Recommendation Engine

### Is your feature request related to a problem?
OpenSearch Query Insights currently identifies slow and resource-intensive queries but provides **no actionable guidance** on how to fix them. Users face several critical issues:

1. **Silent Failures**: Queries like `{"term": {"status": "Active"}}` on text fields return 0 results without warnings
2. **Performance Issues**: Leading wildcards (`*search`) cause 100-1000x slowdowns with no indication why
3. **Safety Risks**: Sorting on text fields can trigger OOM crashes that bring down entire clusters
4. **Knowledge Barrier**: Fixing these issues requires deep OpenSearch expertise that most users lack

For example:
```
User sees: Query latency = 5,200ms in Top N Queries dashboard
User thinks: "Why is this slow? What should I do?"
User outcome: Spends hours searching documentation, may not find solution
```

### What solution would you like?
A **rule-based recommendation engine** integrated into Query Insights that:

1. **Automatically analyzes queries** from Top N Queries and potentially profiler requests
2. **Detects anti-patterns** using predefined rules (Rules TBD)
3. **Generates actionable recommendations** with:
   - Clear problem description
   - Impact assessment (latency, memory, correctness)
   - Specific fix with code examples
   - Confidence scores
4. **Surfaces recommendations** through:
   - Top N Queries dashboard (inline badges and panels)
   - Query profiler page (on-demand analysis)
   - REST API endpoints (see below for supports)

```mermaid
sequenceDiagram
    participant User
    participant Dashboard
    participant API as REST API
    participant Service as RecommendationService
    participant Context as QueryContext
    participant Rules as Rule Engine
    participant Cache as Metadata Cache

    User->>Dashboard: View Top N Queries
    Dashboard->>API: GET /_insights/top_queries?recommendations=true
    API->>Service: analyzeTopQueries(records)

    loop For each query record
        Service->>Context: build(record)
        Context->>Cache: getFieldType(field)
        Cache-->>Context: "text"
        Context->>Cache: getFieldCardinality(field)
        Cache-->>Context: 10000000

        Service->>Rules: evaluate(context)
        Rules->>Rules: match all active rules
        Rules-->>Service: List<Recommendation>

        Service->>Service: attach recommendations to record
        Service->>Service: store.put(queryHash, recommendations)
    end

    Service-->>API: List<QueryRecord with recommendations>
    API-->>Dashboard: top queries with recommendations embedded
    Dashboard-->>User: Show recommendations inline

    User->>Dashboard: Click specific query in Top N list
    Dashboard->>API: GET /_insights/recommendations/{queryId}
    API->>Service: getRecommendations(queryId)
    Service->>Service: store.get(queryHash)
    Service-->>API: List<Recommendation>
    API-->>Dashboard: recommendations for specific query
    Dashboard-->>User: Show detailed recommendations

    User->>Dashboard: Click "Analyze Query" (Profiler)
    Dashboard->>API: POST /_insights/recommendations/analyze
    API->>Service: analyzeQuery(query, indices)
    Service->>Context: build(query)
    Service->>Rules: evaluate(context)
    Rules-->>Service: List<Recommendation>
    Service-->>API: recommendations
    API-->>Dashboard: recommendations with code examples
    Dashboard-->>User: Display recommendations + copy button
```

Key factors:

1. **Asynchronous Processing**: Recommendation generation happens off the search path (zero query latency impact)
2. **Cached Metadata if possible**: Field types and cardinality cached for O(1) lookups (no cluster state queries during rule evaluation)
3. **Rule-Based (Phase 1)**: Deterministic, explainable recommendations. 
5. **Fail-Safe**: Errors in recommendation engine never propagate to query execution

### What alternatives have you considered?
We can build recommendation as separate service outside OpenSearch cluster, let QI provide as much metadata as possible.

**Pros**:
- Language flexibility (could use Python for ML)
- Independent scaling
- Isolation from cluster

**Cons**:
- **Data export requirements**: To export top queries to external sinks, we must mask/remove sensitive information (usernames, IP addresses, PII in query values), losing critical query details needed for analysis
- **Loss of query context**: External service cannot access cluster metadata like field types, field cardinality, index settings, and workload group configurations that are essential for rule evaluation
- **Rule-based recommendations become nearly impossible**: almost any useful query specific rules require analyzing query context / metadata / the exact query pattern (`*search`), which may be masked during export. 
- **Network latency**: Additional hop for recommendation generation
- **Security concerns**: Exporting query data outside the cluster increases attack surface
-  **Overhead on emiting metrics**: It is impossible to emit all required metrics for recommendation on external service, it will also add extra overhead to the cluster (so this is not like "NO Impact at all" with this approach).

**Decision**: Keep recommendation engine in-plugin. Rule-based recommendations fundamentally depend on having access to:
1. **Exact query structure** (e.g., detecting `*` at start of wildcard pattern)
2. **Cluster metadata** (field types, cardinality, index settings)
3. **Real-time context** (workload groups, current cluster state)

### Do you have any additional context?
_Add any other context or screenshots about the feature request here._


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Rule-Based Query Recommendation Engine #532

Is your feature request related to a problem?

What solution would you like?

What alternatives have you considered?

Do you have any additional context?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[FEATURE] Rule-Based Query Recommendation Engine #532

Description

Is your feature request related to a problem?

What solution would you like?

What alternatives have you considered?

Do you have any additional context?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions