What is the bug?
On a v3.5.0 OpenSearch cluster with 12 data nodes divided between 2 zones, it appears the top-queries index fails to initialize because of the cluster's zone awareness configuration.
The relevant settings are shown below:
{
"persistent": {
"cluster": {
"routing": {
"allocation": {
"awareness": {
"balance": "true",
"attributes": "zone",
"force": {
"zone": {
"values": [
"Zone1",
"Zone2"
]
}
}
}
}
}
}
}
}
How can one reproduce the bug?
Set up a similar cluster and monitor the OpenSearch cluster's log in /var/log/opensearch/.
What is the expected behavior?
The index would initialize and not produce the error below.
What is your host/environment?
RHEL 8.x and OpenSearch v3.5.0
Do you have any screenshots?
The following error was pulled from the cluster's log in /var/log/opensearch/.
[2026-05-13T08:55:01,801][ERROR][o.o.p.i.c.e.LocalIndexExporter] [datanode1] Unable to create query insights index:
java.lang.IllegalArgumentException: Validation Failed: 1: expected max cap on auto expand to be a multiple of total awareness attributes [2];
at org.opensearch.cluster.metadata.MetadataCreateIndexService.validateErrors(MetadataCreateIndexService.java:1645)
at org.opensearch.cluster.metadata.MetadataCreateIndexService.validateIndexSettings(MetadataCreateIndexService.java:1640)
at org.opensearch.cluster.metadata.MetadataCreateIndexService.validate(MetadataCreateIndexService.java:1632)
at org.opensearch.cluster.metadata.MetadataCreateIndexService.applyCreateIndexRequest(MetadataCreateIndexService.java:433)
at org.opensearch.cluster.metadata.MetadataCreateIndexService.applyCreateIndexRequest(MetadataCreateIndexService.java:494)
at org.opensearch.cluster.metadata.MetadataCreateIndexService$1.execute(MetadataCreateIndexService.java:394)
at org.opensearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:67)
at org.opensearch.cluster.service.ClusterManagerService.executeTasks(ClusterManagerService.java:890)
at org.opensearch.cluster.service.ClusterManagerService.calculateTaskOutputs(ClusterManagerService.java:441)
at org.opensearch.cluster.service.ClusterManagerService.runTasks(ClusterManagerService.java:301)
at org.opensearch.cluster.service.ClusterManagerService$Batcher.run(ClusterManagerService.java:214)
at org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:206)
at org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:264)
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:918)
at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:299)
at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
at java.lang.Thread.run(Thread.java:1474)
Do you have any additional context?
I tried creating an index template with 6 shards and 1 replica mapping to indexes named top_queries-*, but that did not resolve the issue. Nothing was restarted after making this change, just waiting for the error message to appear, which it does every ~5 minutes.
What is the bug?
On a v3.5.0 OpenSearch cluster with 12 data nodes divided between 2 zones, it appears the top-queries index fails to initialize because of the cluster's zone awareness configuration.
The relevant settings are shown below:
How can one reproduce the bug?
Set up a similar cluster and monitor the OpenSearch cluster's log in
/var/log/opensearch/.What is the expected behavior?
The index would initialize and not produce the error below.
What is your host/environment?
RHEL 8.x and OpenSearch v3.5.0
Do you have any screenshots?
The following error was pulled from the cluster's log in
/var/log/opensearch/.Do you have any additional context?
I tried creating an index template with 6 shards and 1 replica mapping to indexes named
top_queries-*, but that did not resolve the issue. Nothing was restarted after making this change, just waiting for the error message to appear, which it does every ~5 minutes.