[FEATURE] LLM-powered query recommendation engine (Phase 2 of #532)

### Is your feature request related to a problem?

[#532](https://github.com/opensearch-project/query-insights/issues/532) proposes a great rule-based approach for Phase 1. But I've been thinking about the cases that rules can't easily cover — situations where you need to understand *what data a field actually holds* to make a good recommendation.

For example, if a `match` query is running on a field that stores HTTP status codes (`200`, `404`, `500`), a `term` query would be far more efficient. But a rule engine can't know this because today's mapping only tells you the field type (`text`, `keyword`), not what the data represents.

### What solution would you like?

I'm thinking about an LLM-based approach as a Phase 2 complement to #532's rule engine:

- Query Insights provides the slow query data (already available via `/_insights/top_queries`)
- An LLM analyzes the query structure along with enriched field context (mappings + field-level semantic descriptions)
- The LLM generates natural-language recommendations explaining why a query is slow and how to optimize it

One challenge I see is that **field-level semantic metadata doesn't exist today** — `IndexMappingTool` returns types but not what fields mean. The existing `_meta` mapping parameter could hold this in theory, but it only propagates via index templates and doesn't cover indexes created directly or via dynamic mapping.

My initial thought is a dedicated ML Commons system index (e.g., `.plugins-ml-field-metadata`) keyed by index pattern, storing field descriptions and semantic types. Something like:

```json
{
  "index_pattern": "logs-*",
  "fields": {
    "sc": { "description": "HTTP status code", "semantic_type": "enum" },
    "rt": { "description": "Response time in ms", "semantic_type": "metric" }
  }
}
```

Being pattern-based and decoupled from the index creation path, it would work regardless of how indexes are created. An LLM could also auto-generate these descriptions by sampling documents. But this is just one possible approach — I haven't fully fleshed it out yet.

### What alternatives have you considered?

- **Expanding the rule set in Phase 1**: Valuable but fundamentally limited — you can't write rules for every semantic optimization case. I think rules and LLM analysis work best as complementary layers.
- **Storing field descriptions in `_meta`**: I considered using the existing `_meta` mapping parameter (either at index level or via index/component templates). This works for template-based indexes, but doesn't cover indexes created directly or via dynamic mapping, and updating a template's `_meta` doesn't retroactively apply to existing indexes. A separate metadata store keyed by index pattern avoids these limitations.

### Do you have any additional context?

If this direction is of interest to the project, I'd love to put together a more detailed design and contribute to the implementation. Happy to discuss further or adjust the approach based on your feedback.

cc. @ansjcy 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] LLM-powered query recommendation engine (Phase 2 of #532) #591

Is your feature request related to a problem?

What solution would you like?

What alternatives have you considered?

Do you have any additional context?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[FEATURE] LLM-powered query recommendation engine (Phase 2 of #532) #591

Description

Is your feature request related to a problem?

What solution would you like?

What alternatives have you considered?

Do you have any additional context?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions