Skip to content

[FEATURE] LLM-powered query recommendation engine (Phase 2 of #532) #591

@llilyy

Description

@llilyy

Is your feature request related to a problem?

#532 proposes a great rule-based approach for Phase 1. But I've been thinking about the cases that rules can't easily cover — situations where you need to understand what data a field actually holds to make a good recommendation.

For example, if a match query is running on a field that stores HTTP status codes (200, 404, 500), a term query would be far more efficient. But a rule engine can't know this because today's mapping only tells you the field type (text, keyword), not what the data represents.

What solution would you like?

I'm thinking about an LLM-based approach as a Phase 2 complement to #532's rule engine:

  • Query Insights provides the slow query data (already available via /_insights/top_queries)
  • An LLM analyzes the query structure along with enriched field context (mappings + field-level semantic descriptions)
  • The LLM generates natural-language recommendations explaining why a query is slow and how to optimize it

One challenge I see is that field-level semantic metadata doesn't exist todayIndexMappingTool returns types but not what fields mean. The existing _meta mapping parameter could hold this in theory, but it only propagates via index templates and doesn't cover indexes created directly or via dynamic mapping.

My initial thought is a dedicated ML Commons system index (e.g., .plugins-ml-field-metadata) keyed by index pattern, storing field descriptions and semantic types. Something like:

{
  "index_pattern": "logs-*",
  "fields": {
    "sc": { "description": "HTTP status code", "semantic_type": "enum" },
    "rt": { "description": "Response time in ms", "semantic_type": "metric" }
  }
}

Being pattern-based and decoupled from the index creation path, it would work regardless of how indexes are created. An LLM could also auto-generate these descriptions by sampling documents. But this is just one possible approach — I haven't fully fleshed it out yet.

What alternatives have you considered?

  • Expanding the rule set in Phase 1: Valuable but fundamentally limited — you can't write rules for every semantic optimization case. I think rules and LLM analysis work best as complementary layers.
  • Storing field descriptions in _meta: I considered using the existing _meta mapping parameter (either at index level or via index/component templates). This works for template-based indexes, but doesn't cover indexes created directly or via dynamic mapping, and updating a template's _meta doesn't retroactively apply to existing indexes. A separate metadata store keyed by index pattern avoids these limitations.

Do you have any additional context?

If this direction is of interest to the project, I'd love to put together a more detailed design and contribute to the implementation. Happy to discuss further or adjust the approach based on your feedback.

cc. @ansjcy

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions