Is your feature request related to a problem?
#532 proposes a great rule-based approach for Phase 1. But I've been thinking about the cases that rules can't easily cover — situations where you need to understand what data a field actually holds to make a good recommendation.
For example, if a match query is running on a field that stores HTTP status codes (200, 404, 500), a term query would be far more efficient. But a rule engine can't know this because today's mapping only tells you the field type (text, keyword), not what the data represents.
What solution would you like?
I'm thinking about an LLM-based approach as a Phase 2 complement to #532's rule engine:
- Query Insights provides the slow query data (already available via
/_insights/top_queries)
- An LLM analyzes the query structure along with enriched field context (mappings + field-level semantic descriptions)
- The LLM generates natural-language recommendations explaining why a query is slow and how to optimize it
One challenge I see is that field-level semantic metadata doesn't exist today — IndexMappingTool returns types but not what fields mean. The existing _meta mapping parameter could hold this in theory, but it only propagates via index templates and doesn't cover indexes created directly or via dynamic mapping.
My initial thought is a dedicated ML Commons system index (e.g., .plugins-ml-field-metadata) keyed by index pattern, storing field descriptions and semantic types. Something like:
{
"index_pattern": "logs-*",
"fields": {
"sc": { "description": "HTTP status code", "semantic_type": "enum" },
"rt": { "description": "Response time in ms", "semantic_type": "metric" }
}
}
Being pattern-based and decoupled from the index creation path, it would work regardless of how indexes are created. An LLM could also auto-generate these descriptions by sampling documents. But this is just one possible approach — I haven't fully fleshed it out yet.
What alternatives have you considered?
- Expanding the rule set in Phase 1: Valuable but fundamentally limited — you can't write rules for every semantic optimization case. I think rules and LLM analysis work best as complementary layers.
- Storing field descriptions in
_meta: I considered using the existing _meta mapping parameter (either at index level or via index/component templates). This works for template-based indexes, but doesn't cover indexes created directly or via dynamic mapping, and updating a template's _meta doesn't retroactively apply to existing indexes. A separate metadata store keyed by index pattern avoids these limitations.
Do you have any additional context?
If this direction is of interest to the project, I'd love to put together a more detailed design and contribute to the implementation. Happy to discuss further or adjust the approach based on your feedback.
cc. @ansjcy
Is your feature request related to a problem?
#532 proposes a great rule-based approach for Phase 1. But I've been thinking about the cases that rules can't easily cover — situations where you need to understand what data a field actually holds to make a good recommendation.
For example, if a
matchquery is running on a field that stores HTTP status codes (200,404,500), atermquery would be far more efficient. But a rule engine can't know this because today's mapping only tells you the field type (text,keyword), not what the data represents.What solution would you like?
I'm thinking about an LLM-based approach as a Phase 2 complement to #532's rule engine:
/_insights/top_queries)One challenge I see is that field-level semantic metadata doesn't exist today —
IndexMappingToolreturns types but not what fields mean. The existing_metamapping parameter could hold this in theory, but it only propagates via index templates and doesn't cover indexes created directly or via dynamic mapping.My initial thought is a dedicated ML Commons system index (e.g.,
.plugins-ml-field-metadata) keyed by index pattern, storing field descriptions and semantic types. Something like:{ "index_pattern": "logs-*", "fields": { "sc": { "description": "HTTP status code", "semantic_type": "enum" }, "rt": { "description": "Response time in ms", "semantic_type": "metric" } } }Being pattern-based and decoupled from the index creation path, it would work regardless of how indexes are created. An LLM could also auto-generate these descriptions by sampling documents. But this is just one possible approach — I haven't fully fleshed it out yet.
What alternatives have you considered?
_meta: I considered using the existing_metamapping parameter (either at index level or via index/component templates). This works for template-based indexes, but doesn't cover indexes created directly or via dynamic mapping, and updating a template's_metadoesn't retroactively apply to existing indexes. A separate metadata store keyed by index pattern avoids these limitations.Do you have any additional context?
If this direction is of interest to the project, I'd love to put together a more detailed design and contribute to the implementation. Happy to discuss further or adjust the approach based on your feedback.
cc. @ansjcy