-
Notifications
You must be signed in to change notification settings - Fork 10
feat: add resource migration job and troubleshooting section for upgrade failures #289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
7bd3c87
34a4b61
1148ac0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1635,6 +1635,28 @@ Where: | |
| | `integrationTests.securityContext` | object | no | {} | The pod-level security attributes and common container settings for the OpenSearch integration tests pod. | | ||
| | `integrationTests.priorityClassName` | string | no | "" | The priority class to be used by the OpenSearch integration tests pods. You should create the priority class beforehand. For more information about this feature, refer to [https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/](https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/). | | ||
|
|
||
| ## Resource Migration | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please, move this section after |
||
|
|
||
| The resource migration job is a pre-install/pre-upgrade Helm hook that automatically detects and removes | ||
| OpenSearch 1.x StatefulSets that are incompatible with OpenSearch 2.x. This is required when upgrading via | ||
| ArgoCD, because ArgoCD's merge strategy cannot remove extra environment variables (such as `node.master`) | ||
| from existing StatefulSets. The job deletes affected StatefulSets with `--cascade=orphan`, which preserves | ||
| running pods while allowing Helm to recreate the StatefulSet with the correct 2.x spec. | ||
|
|
||
| The job iterates over all configured OpenSearch StatefulSets (master, data, arbiter depending on deployment | ||
| topology). If a StatefulSet does not exist or does not contain the `node.master` environment variable, it is | ||
| skipped. | ||
|
|
||
| | Parameter | Type | Mandatory | Default value | Description | | ||
| |------------------------------------------------|---------|-----------|---------------|----------------------------------------------------------------------------------------------------------------------------| | ||
| | `resourceMigration.enabled` | boolean | no | true | Enables the resource migration pre-upgrade hook job. | | ||
| | `resourceMigration.resources.requests.cpu` | string | no | 20m | The minimum number of CPUs the resource migration container should use. | | ||
| | `resourceMigration.resources.requests.memory` | string | no | 64Mi | The minimum amount of memory the resource migration container should use. | | ||
| | `resourceMigration.resources.limits.cpu` | string | no | 100m | The maximum number of CPUs the resource migration container should use. | | ||
| | `resourceMigration.resources.limits.memory` | string | no | 256Mi | The maximum amount of memory the resource migration container should use. | | ||
| | `resourceMigration.imagePullPolicy` | string | no | IfNotPresent | The image pull policy for the resource migration container. | | ||
| | `resourceMigration.runAsNonRoot` | boolean | no | true | If `true`, applies restricted security context (non-root, read-only filesystem, drops all capabilities) to the job pod. | | ||
|
|
||
| ### Tags Description | ||
|
|
||
| This section contains information about integration test tags that can be used in order to test OpenSearch service. You can use the following tags: | ||
|
|
@@ -2054,6 +2076,28 @@ If you need migrate to OpenSearch Service `1.x.x` (with OpenSearch 2.x) from pre | |
| * if `0.2.4` (or newest) version installed just proceed with upgrade. | ||
| * if version before `0.2.4` installed, you need previously upgrade to version `0.2.4` to migrate security configuration to new format and then install required `1.x.x` version. | ||
|
|
||
| **ArgoCD upgrades:** | ||
|
|
||
| When upgrading from OpenSearch 1.x to 2.x via ArgoCD, the StatefulSet spec changes significantly (for example, | ||
| the `node.master` environment variable is replaced by `node.roles`). ArgoCD's merge strategy cannot remove | ||
| these extra environment variables from existing StatefulSets, which causes the upgrade to fail. | ||
|
|
||
| To handle this automatically, the chart includes a **resource migration** pre-upgrade hook | ||
| (`resourceMigration.enabled: true` by default). This hook inspects each OpenSearch StatefulSet for the | ||
| `node.master` environment variable (a marker of the 1.x spec). If found, it deletes the StatefulSet with | ||
| `--cascade=orphan`, preserving the running pods while allowing the new StatefulSet to be created with the | ||
| correct 2.x spec. | ||
|
|
||
| No manual intervention is required as long as `resourceMigration.enabled` is `true`. If you prefer to handle | ||
| the StatefulSet cleanup manually, set `resourceMigration.enabled: false` and delete the affected StatefulSets | ||
| yourself before upgrading: | ||
|
|
||
| ```bash | ||
| kubectl -n <namespace> delete statefulset <statefulset-name> --cascade=orphan | ||
| ``` | ||
|
|
||
| For the full list of resource migration parameters, refer to the [Resource Migration](#resource-migration) section. | ||
|
|
||
| ### Migration From OpenDistro Elasticsearch | ||
|
|
||
| OpenSearch Service allows migration from OpenDistro Elasticsearch deployments. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,88 @@ | ||
| {{- if .Values.resourceMigration.enabled }} | ||
| apiVersion: batch/v1 | ||
| kind: Job | ||
| metadata: | ||
| name: {{ .Release.Name }}-resource-migrator | ||
| labels: | ||
| app.kubernetes.io/instance: {{ .Release.Name }} | ||
| annotations: | ||
| "helm.sh/hook": pre-install,pre-upgrade | ||
| "helm.sh/hook-weight": "2" | ||
| "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded | ||
| spec: | ||
| template: | ||
| metadata: | ||
| labels: | ||
| app.kubernetes.io/instance: {{ .Release.Name }} | ||
| spec: | ||
| serviceAccountName: {{ .Release.Name }}-resource-migrator | ||
| restartPolicy: OnFailure | ||
| {{- if .Values.resourceMigration.runAsNonRoot }} | ||
| securityContext: | ||
| runAsNonRoot: true | ||
| seccompProfile: { type: RuntimeDefault } | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe just ? |
||
| {{- end }} | ||
| containers: | ||
| - name: migrator | ||
| image: {{ template "kubectl.image" . }} | ||
| imagePullPolicy: {{ .Values.resourceMigration.imagePullPolicy | default "IfNotPresent" }} | ||
| command: ["/bin/sh","-c"] | ||
| args: | ||
| - | | ||
| set -euo pipefail | ||
|
|
||
| command -v jq >/dev/null 2>&1 || { echo "[resource-migrator] jq is required"; exit 1; } | ||
|
|
||
| KUBECTL="kubectl" | ||
| NS="{{ .Release.Namespace }}" | ||
| STATEFULSET_NAMES="{{ trim (include "opensearch.statefulsetNames" .) }}" | ||
|
|
||
| if [ -z "$STATEFULSET_NAMES" ]; then | ||
| echo "[resource-migrator] No statefulset names configured, nothing to do" | ||
| exit 0 | ||
| fi | ||
|
|
||
| IFS=','; for STS_NAME in $STATEFULSET_NAMES; do | ||
| STS_NAME="$(echo "$STS_NAME" | xargs)" | ||
| [ -z "$STS_NAME" ] && continue | ||
|
|
||
| STS_JSON="$($KUBECTL -n "$NS" get statefulset "$STS_NAME" -o json 2>/dev/null || true)" | ||
| if [ -z "$STS_JSON" ]; then | ||
| echo "[resource-migrator] StatefulSet $STS_NAME does not exist, skipping" | ||
| continue | ||
| fi | ||
|
|
||
| # 'node.master' env is a marker of OpenSearch 1.x StatefulSets that must be | ||
| # recreated during upgrade to 2.x because ArgoCD merge cannot remove extra envs. | ||
| NODE_MASTER="$(printf '%s' "$STS_JSON" \ | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we use just |
||
| | jq -r '[.spec.template.spec.containers[].env[]? | select(.name == "node.master")] | length' 2>/dev/null || echo "0")" | ||
|
|
||
| if [ "$NODE_MASTER" -gt 0 ]; then | ||
| echo "[resource-migrator] StatefulSet $STS_NAME contains 'node.master' env (OpenSearch 1.x)" | ||
| echo "[resource-migrator] Deleting StatefulSet $STS_NAME with --cascade=orphan" | ||
| if $KUBECTL -n "$NS" delete statefulset "$STS_NAME" --cascade=orphan --ignore-not-found=true; then | ||
|
PhBouzid marked this conversation as resolved.
|
||
| echo "[resource-migrator] StatefulSet $STS_NAME deleted successfully" | ||
| else | ||
| echo "[resource-migrator][ERROR] Failed to delete StatefulSet $STS_NAME" | ||
| exit 1 | ||
| fi | ||
| else | ||
| echo "[resource-migrator] StatefulSet $STS_NAME: no 'node.master' env found, skipping" | ||
| fi | ||
| done | ||
|
|
||
| echo "[resource-migrator] done" | ||
| resources: | ||
| limits: | ||
| cpu: {{ .Values.resourceMigration.resources.limits.cpu }} | ||
| memory: {{ .Values.resourceMigration.resources.limits.memory }} | ||
| requests: | ||
| cpu: {{ .Values.resourceMigration.resources.requests.cpu }} | ||
| memory: {{ .Values.resourceMigration.resources.requests.memory }} | ||
| {{- if .Values.resourceMigration.runAsNonRoot }} | ||
| securityContext: | ||
| allowPrivilegeEscalation: false | ||
| readOnlyRootFilesystem: true | ||
| capabilities: { drop: ["ALL"] } | ||
| {{- end }} | ||
| {{- end }} | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,35 @@ | ||
| {{- if .Values.resourceMigration.enabled }} | ||
| apiVersion: rbac.authorization.k8s.io/v1 | ||
| kind: Role | ||
| metadata: | ||
| name: {{ .Release.Name }}-resource-migrator | ||
| labels: | ||
| app.kubernetes.io/instance: {{ .Release.Name }} | ||
| annotations: | ||
| "helm.sh/hook": pre-install,pre-upgrade | ||
| "helm.sh/hook-weight": "-190" | ||
| "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded | ||
| rules: | ||
| - apiGroups: ["apps"] | ||
| resources: ["statefulsets"] | ||
| verbs: ["get","list","delete"] | ||
| --- | ||
| apiVersion: rbac.authorization.k8s.io/v1 | ||
| kind: RoleBinding | ||
| metadata: | ||
| name: {{ .Release.Name }}-resource-migrator | ||
| labels: | ||
| app.kubernetes.io/instance: {{ .Release.Name }} | ||
| annotations: | ||
| "helm.sh/hook": pre-install,pre-upgrade | ||
| "helm.sh/hook-weight": "-180" | ||
| "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded | ||
| subjects: | ||
| - kind: ServiceAccount | ||
| name: {{ .Release.Name }}-resource-migrator | ||
| namespace: {{ .Release.Namespace }} | ||
| roleRef: | ||
| apiGroup: rbac.authorization.k8s.io | ||
| kind: Role | ||
| name: {{ .Release.Name }}-resource-migrator | ||
| {{- end }} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| {{- if .Values.resourceMigration.enabled }} | ||
| apiVersion: v1 | ||
| kind: ServiceAccount | ||
| metadata: | ||
| name: {{ .Release.Name }}-resource-migrator | ||
| labels: | ||
| app.kubernetes.io/instance: {{ .Release.Name }} | ||
| annotations: | ||
| "helm.sh/hook": pre-install,pre-upgrade | ||
| "helm.sh/hook-weight": "-200" | ||
| "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded | ||
| {{- end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New chapters are not added to the TOC