Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions docs/public/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -1635,6 +1635,28 @@ Where:
| `integrationTests.securityContext` | object | no | {} | The pod-level security attributes and common container settings for the OpenSearch integration tests pod. |
| `integrationTests.priorityClassName` | string | no | "" | The priority class to be used by the OpenSearch integration tests pods. You should create the priority class beforehand. For more information about this feature, refer to [https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/](https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/). |

## Resource Migration
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New chapters are not added to the TOC

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, move this section after Tags Description of integration tests.


The resource migration job is a pre-install/pre-upgrade Helm hook that automatically detects and removes
OpenSearch 1.x StatefulSets that are incompatible with OpenSearch 2.x. This is required when upgrading via
ArgoCD, because ArgoCD's merge strategy cannot remove extra environment variables (such as `node.master`)
from existing StatefulSets. The job deletes affected StatefulSets with `--cascade=orphan`, which preserves
running pods while allowing Helm to recreate the StatefulSet with the correct 2.x spec.

The job iterates over all configured OpenSearch StatefulSets (master, data, arbiter depending on deployment
topology). If a StatefulSet does not exist or does not contain the `node.master` environment variable, it is
skipped.

| Parameter | Type | Mandatory | Default value | Description |
|------------------------------------------------|---------|-----------|---------------|----------------------------------------------------------------------------------------------------------------------------|
| `resourceMigration.enabled` | boolean | no | true | Enables the resource migration pre-upgrade hook job. |
| `resourceMigration.resources.requests.cpu` | string | no | 20m | The minimum number of CPUs the resource migration container should use. |
| `resourceMigration.resources.requests.memory` | string | no | 64Mi | The minimum amount of memory the resource migration container should use. |
| `resourceMigration.resources.limits.cpu` | string | no | 100m | The maximum number of CPUs the resource migration container should use. |
| `resourceMigration.resources.limits.memory` | string | no | 256Mi | The maximum amount of memory the resource migration container should use. |
| `resourceMigration.imagePullPolicy` | string | no | IfNotPresent | The image pull policy for the resource migration container. |
| `resourceMigration.runAsNonRoot` | boolean | no | true | If `true`, applies restricted security context (non-root, read-only filesystem, drops all capabilities) to the job pod. |

### Tags Description

This section contains information about integration test tags that can be used in order to test OpenSearch service. You can use the following tags:
Expand Down Expand Up @@ -2054,6 +2076,28 @@ If you need migrate to OpenSearch Service `1.x.x` (with OpenSearch 2.x) from pre
* if `0.2.4` (or newest) version installed just proceed with upgrade.
* if version before `0.2.4` installed, you need previously upgrade to version `0.2.4` to migrate security configuration to new format and then install required `1.x.x` version.

**ArgoCD upgrades:**

When upgrading from OpenSearch 1.x to 2.x via ArgoCD, the StatefulSet spec changes significantly (for example,
the `node.master` environment variable is replaced by `node.roles`). ArgoCD's merge strategy cannot remove
these extra environment variables from existing StatefulSets, which causes the upgrade to fail.

To handle this automatically, the chart includes a **resource migration** pre-upgrade hook
(`resourceMigration.enabled: true` by default). This hook inspects each OpenSearch StatefulSet for the
`node.master` environment variable (a marker of the 1.x spec). If found, it deletes the StatefulSet with
`--cascade=orphan`, preserving the running pods while allowing the new StatefulSet to be created with the
correct 2.x spec.

No manual intervention is required as long as `resourceMigration.enabled` is `true`. If you prefer to handle
the StatefulSet cleanup manually, set `resourceMigration.enabled: false` and delete the affected StatefulSets
yourself before upgrading:

```bash
kubectl -n <namespace> delete statefulset <statefulset-name> --cascade=orphan
```

For the full list of resource migration parameters, refer to the [Resource Migration](#resource-migration) section.

### Migration From OpenDistro Elasticsearch

OpenSearch Service allows migration from OpenDistro Elasticsearch deployments.
Expand Down
60 changes: 60 additions & 0 deletions docs/public/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,11 @@
* [Stack trace](#stack-trace-33)
* [How to solve](#how-to-solve-35)
* [Recommendations](#recommendations-33)
* [Upgrade Failed Due to Pre-Deploy Migration Hook](#upgrade-failed-due-to-pre-deploy-migration-hook)
* [Description](#description-36)
* [Stack trace](#stack-trace-34)
* [How to solve](#how-to-solve-36)
* [Recommendations](#recommendations-34)
<!-- TOC -->

## Cluster Health
Expand Down Expand Up @@ -1603,3 +1608,58 @@ If `opensearch.data.dedicatedPod.enabled: false`, master nodes also act as data
If dedicated data pods are enabled, increase `opensearch.data.dedicatedPod.replicas`.
Also keep index settings reasonable for new indexes: `index.number_of_shards` defaults to 1, while `index.number_of_replicas` defaults to 1, so unnecessary shard and replica growth should be avoided.
If really required, OpenSearch settings can also be provided through `opensearch.config` during installation.

## Upgrade Failed Due to Pre-Deploy Migration Hook

### Description

During an OpenSearch Service upgrade (especially from 2.x to 3.x), the Helm pre-deploy hook job `opensearch-migration-1x` may fail with a `BackoffLimitExceeded` error. This job is a Kubernetes Job that runs as a `pre-install`/`pre-upgrade` Helm hook and performs index migration for indices originally created on OpenSearch 1.x. If the cluster contains such indices, they must be reindexed before the upgrade to 3.x can proceed. When this migration fails, the entire Helm upgrade is blocked.

In **ArgoCD** deployments, this appears as a failed PreSync hook:

```text
- Job/opensearch-migration-1x; Hook: PreSync; Phase: Failed
Sync Message: Job has reached the specified backoff limit
```

In **Helm** output, the error looks like:

```text
Error: UPGRADE FAILED: pre-upgrade hooks failed: 1 error occurred:
* job opensearch-migration-1x failed: BackoffLimitExceeded
```

### Stack trace

```text
Error: UPGRADE FAILED: pre-upgrade hooks failed: 1 error occurred:
* job opensearch-migration-1x failed: BackoffLimitExceeded
```

### How to solve

1. **Check the migration Job logs** to find the root cause of the failure:

```sh
kubectl logs -n <namespace> job/opensearch-migration-1x
```

If the pod has already been cleaned up, look for the pod by label:

```sh
kubectl get pods -n <namespace> -l component=migration
kubectl logs -n <namespace> <migration-pod-name>
```

2. **Common failure reasons:**
- The cluster contains indices created on OpenSearch 1.x that block the upgrade to 3.x. The migrator attempts to reindex them but may fail due to insufficient resources, connectivity issues, or incompatible index settings.
- OpenSearch is not reachable from the migration pod (network or TLS issues).
- Insufficient permissions or missing secrets.

3. **After identifying and resolving the issue**, retry the upgrade. If using ArgoCD, trigger a new sync. If using Helm directly, re-run the `helm upgrade` command.

4. For detailed information about the index migration process, including how to run the migrator manually in dry-run mode, refer to the [Indices Migration](indices-migration.md) documentation.

### Recommendations

Before upgrading OpenSearch from 2.x to 3.x, run the migration tool in **dry-run mode** to identify any 1.x indices that would block the upgrade. Review the dry-run output and plan the migration during a maintenance window. See the [Indices Migration](indices-migration.md) guide for the full procedure.
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
{{- if .Values.resourceMigration.enabled }}
apiVersion: batch/v1
kind: Job
metadata:
name: {{ .Release.Name }}-resource-migrator
labels:
app.kubernetes.io/instance: {{ .Release.Name }}
annotations:
"helm.sh/hook": pre-install,pre-upgrade
"helm.sh/hook-weight": "2"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
template:
metadata:
labels:
app.kubernetes.io/instance: {{ .Release.Name }}
spec:
serviceAccountName: {{ .Release.Name }}-resource-migrator
restartPolicy: OnFailure
{{- if .Values.resourceMigration.runAsNonRoot }}
securityContext:
runAsNonRoot: true
seccompProfile: { type: RuntimeDefault }
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just

        seccompProfile:
          type: RuntimeDefault

?

{{- end }}
containers:
- name: migrator
image: {{ template "kubectl.image" . }}
imagePullPolicy: {{ .Values.resourceMigration.imagePullPolicy | default "IfNotPresent" }}
command: ["/bin/sh","-c"]
args:
- |
set -euo pipefail

command -v jq >/dev/null 2>&1 || { echo "[resource-migrator] jq is required"; exit 1; }

KUBECTL="kubectl"
NS="{{ .Release.Namespace }}"
STATEFULSET_NAMES="{{ trim (include "opensearch.statefulsetNames" .) }}"

if [ -z "$STATEFULSET_NAMES" ]; then
echo "[resource-migrator] No statefulset names configured, nothing to do"
exit 0
fi

IFS=','; for STS_NAME in $STATEFULSET_NAMES; do
STS_NAME="$(echo "$STS_NAME" | xargs)"
[ -z "$STS_NAME" ] && continue

STS_JSON="$($KUBECTL -n "$NS" get statefulset "$STS_NAME" -o json 2>/dev/null || true)"
if [ -z "$STS_JSON" ]; then
echo "[resource-migrator] StatefulSet $STS_NAME does not exist, skipping"
continue
fi

# 'node.master' env is a marker of OpenSearch 1.x StatefulSets that must be
# recreated during upgrade to 2.x because ArgoCD merge cannot remove extra envs.
NODE_MASTER="$(printf '%s' "$STS_JSON" \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use just echo "$STS_JSON" instead of printf '%s' "$STS_JSON"?

| jq -r '[.spec.template.spec.containers[].env[]? | select(.name == "node.master")] | length' 2>/dev/null || echo "0")"

if [ "$NODE_MASTER" -gt 0 ]; then
echo "[resource-migrator] StatefulSet $STS_NAME contains 'node.master' env (OpenSearch 1.x)"
echo "[resource-migrator] Deleting StatefulSet $STS_NAME with --cascade=orphan"
if $KUBECTL -n "$NS" delete statefulset "$STS_NAME" --cascade=orphan --ignore-not-found=true; then
Comment thread
PhBouzid marked this conversation as resolved.
echo "[resource-migrator] StatefulSet $STS_NAME deleted successfully"
else
echo "[resource-migrator][ERROR] Failed to delete StatefulSet $STS_NAME"
exit 1
fi
else
echo "[resource-migrator] StatefulSet $STS_NAME: no 'node.master' env found, skipping"
fi
done

echo "[resource-migrator] done"
resources:
limits:
cpu: {{ .Values.resourceMigration.resources.limits.cpu }}
memory: {{ .Values.resourceMigration.resources.limits.memory }}
requests:
cpu: {{ .Values.resourceMigration.resources.requests.cpu }}
memory: {{ .Values.resourceMigration.resources.requests.memory }}
{{- if .Values.resourceMigration.runAsNonRoot }}
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities: { drop: ["ALL"] }
{{- end }}
{{- end }}
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
{{- if .Values.resourceMigration.enabled }}
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: {{ .Release.Name }}-resource-migrator
labels:
app.kubernetes.io/instance: {{ .Release.Name }}
annotations:
"helm.sh/hook": pre-install,pre-upgrade
"helm.sh/hook-weight": "-190"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
rules:
- apiGroups: ["apps"]
resources: ["statefulsets"]
verbs: ["get","list","delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: {{ .Release.Name }}-resource-migrator
labels:
app.kubernetes.io/instance: {{ .Release.Name }}
annotations:
"helm.sh/hook": pre-install,pre-upgrade
"helm.sh/hook-weight": "-180"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
subjects:
- kind: ServiceAccount
name: {{ .Release.Name }}-resource-migrator
namespace: {{ .Release.Namespace }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: {{ .Release.Name }}-resource-migrator
{{- end }}
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{{- if .Values.resourceMigration.enabled }}
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{ .Release.Name }}-resource-migrator
labels:
app.kubernetes.io/instance: {{ .Release.Name }}
annotations:
"helm.sh/hook": pre-install,pre-upgrade
"helm.sh/hook-weight": "-200"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
{{- end }}
12 changes: 12 additions & 0 deletions operator/charts/helm/opensearch-service/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1183,6 +1183,18 @@ integrationTests:
# ref: https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/
priorityClassName: ""

resourceMigration:
enabled: true
resources:
requests:
memory: 64Mi
cpu: 20m
limits:
memory: 256Mi
cpu: 100m
imagePullPolicy: IfNotPresent
runAsNonRoot: true

groupMigration:
enabled: true
oldGroupPrefix: "qubership.org/"
Expand Down
Loading