Add deploy-selfmanaged skill and slash command#982
Draft
EngHabu wants to merge 3 commits into
Draft
Conversation
Guides users through deploying a Union self-managed (BYOC) data plane end-to-end — captures intent, verifies prereqs, then runs the deterministic path for the chosen cloud (flyte e2e script for AWS/GCP, guided helm walkthrough for Azure/OCI/CoreWeave/generic). Whitelists .claude/skills/ and .claude/commands/ so team-shared skills are checked in while personal Claude state stays ignored. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Deploying docs with
|
| Latest commit: |
3f06fa5
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://58a4766b.docs-dog.pages.dev |
| Branch Preview URL: | https://add-deploy-selfmanaged-skill.docs-dog.pages.dev |
Documents the user-facing UX (six phases from trigger → teardown), how to invoke the skill, and the operating rules that govern every cloud path. Lives next to SKILL.md so contributors landing in the skill dir have a human-readable overview. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces a team-shared Claude skill and slash command to guide end-to-end Union self-managed (BYOC) dataplane deployments across multiple clouds, and updates repo hygiene/docs to support it.
Changes:
- Add the
deploy-selfmanagedClaude skill with per-cloud deployment/teardown walkthroughs and universal prereq gating. - Add
/deploy-selfmanagedslash command for explicit invocation with optional[cloud] [mode]arguments. - Update
.gitignoreto keep team-shared.claude/skills/+.claude/commands/tracked, and updatescripts/README.mdto reflect GCP E2E script support.
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/README.md | Updates E2E scripts README to reflect AWS + GCP support and credential stashing limitations. |
| .gitignore | Adjusts ignore rules to allow sharing Claude skills/commands while keeping other Claude state ignored. |
| .claude/skills/deploy-selfmanaged/SKILL.md | Defines the top-level skill flow, intent capture, universal prereqs, and verification/teardown rules. |
| .claude/skills/deploy-selfmanaged/aws.md | AWS-specific prereqs and script-driven deploy/teardown instructions. |
| .claude/skills/deploy-selfmanaged/gcp.md | GCP-specific prereqs and script-driven deploy/teardown instructions. |
| .claude/skills/deploy-selfmanaged/azure.md | Azure guided walkthrough aligned to self-managed Azure docs (with some command/flag divergences noted). |
| .claude/skills/deploy-selfmanaged/oci.md | OCI guided walkthrough aligned to self-managed OCI docs (with some command/flag divergences noted). |
| .claude/skills/deploy-selfmanaged/coreweave.md | CoreWeave guided walkthrough (currently has several critical mismatches vs source-of-truth docs). |
| .claude/skills/deploy-selfmanaged/generic.md | Generic Kubernetes guided walkthrough (currently has provider mismatch vs source-of-truth docs). |
| .claude/commands/deploy-selfmanaged.md | Adds the /deploy-selfmanaged command entrypoint wiring into the skill. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| Source of truth: | ||
| `content/deployment/selfmanaged/selfmanaged-generic/prepare-infra.md` | ||
| and `selfmanaged-generic/deploy-dataplane.md`. Use this path for |
| ```bash | ||
| uctl config init --host=<CONTROL_PLANE_URL> | ||
| uctl selfserve provision-dataplane-resources \ | ||
| --clusterName <CLUSTER_NAME> --provider generic |
Comment on lines
+90
to
+93
| -n union --create-namespace \ | ||
| -f <org>-values.yaml \ | ||
| --wait | ||
| ``` |
|
|
||
| Source of truth: | ||
| `content/deployment/selfmanaged/selfmanaged-coreweave/prepare-infra.md` | ||
| and `selfmanaged-coreweave/deploy-dataplane.md`. CoreWeave-specific |
Comment on lines
+60
to
+63
| --clusterName <CLUSTER_NAME> --provider generic | ||
| ``` | ||
| (CoreWeave uses the `generic` provider — same as on-prem, with | ||
| custom storage overrides.) |
Comment on lines
+4
to
+5
| and `selfmanaged-oci/deploy-dataplane.md`. Follow the doc — this file | ||
| captures gates and order. |
Comment on lines
+77
to
+80
| -n union --create-namespace \ | ||
| -f <org>-values.yaml \ | ||
| --wait | ||
| ``` |
| # Azure path | ||
|
|
||
| Source of truth: `content/deployment/selfmanaged/selfmanaged-azure/prepare-infra.md` | ||
| and `selfmanaged-azure/deploy-dataplane.md`. The doc is authoritative — |
Comment on lines
+102
to
+104
| -n union --create-namespace \ | ||
| -f <org>-values.yaml \ | ||
| --wait |
Comment on lines
+114
to
+119
| ```bash | ||
| helm version --short # expect: v3.x or newer | ||
| kubectl version --client # expect: any modern client | ||
| uctl version | head -1 # expect: >= 0.1.20 (note: subcommand, not --version) | ||
| which uv # required only for AWS/GCP (script venv) | ||
| ``` |
Collaborator
|
@EngHabu Can you resolve the copilot comments? |
ppiegaze
requested changes
May 7, 2026
ppiegaze
left a comment
Collaborator
There was a problem hiding this comment.
Please check the copilot comments. Probably false positives but worth checking
GHA build & deploy previewBuilt by
Updated automatically on every push. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
.claude/skills/deploy-selfmanaged/that guides users through deploying a Union self-managed (BYOC) data plane end-to-end. Captures intent (cloud, mode, control-plane URL, cluster name, region), verifies prereqs, then runs the deterministic path for the chosen cloud:scripts/selfmanaged_{aws,gcp}_e2e.pyflyte e2e scripts (full 4-phase flow: infra → helm → smoke → optional teardown).prepare-infra.md→deploy-dataplane.md, with cloud-specific gotchas surfaced./deploy-selfmanagedslash command (.claude/commands/deploy-selfmanaged.md) for explicit invocation, with optional[cloud] [mode]args..claude/skills/and.claude/commands/in.gitignoreso team-shared skills are checked in while personal Claude state (settings, transcripts) stays ignored.scripts/README.md— the GCP e2e script is fully implemented (not a stub); README is updated to reflect that and to call out that remote credential stashing is AWS-only today.What the user experience looks like
The skill is designed to be intuitive, guiding, deterministic. A typical AWS deploy session looks like this:
1. Trigger
The user either says something natural ("deploy a self-managed Union cluster on AWS") and the skill auto-engages from its description, or they run the slash command:
2. Intent capture (Step 1 in
SKILL.md)Claude asks two button-pick questions via
AskUserQuestion:aws | gcp | azure | oci | coreweave | genericdeploy | teardown | smoke-onlyThen prompts in plain text for the three free-form fields:
https://myorg.union.ai)Skipped questions if the user already volunteered the answer in their initial message. Then echoes a
Plan:block back so the user can sanity-check before anything runs:3. Prereq verification (Step 2 + cloud-specific)
Claude runs the prereq checks as a single batch and reports a pass/fail table:
helm version --shortkubectl version --clientuctl versionuvaws --versioneksctl versionaws sts get-caller-identityAWS_ACCESS_KEY_IDenv varscripts/.venvAny failure halts the flow with the exact remediation command — no "let me try a workaround." For SSO-authenticated users, the skill specifically catches the gap between
aws sts get-caller-identitysucceeding (via SSO) and the e2e script needingAWS_ACCESS_KEY_IDexported, and tells the user to runaws configure export-credentials --profile <sso-profile> --format env.4. Deploy command, shown before run
Claude composes the exact command, displays it, and asks for explicit
yesbefore running anything cloud-mutating. For AWS/GCP, the deploy is long-running (~25–40 min) and the recommendation is to run in the user's own terminal with--tui:The user sees Phase 1/4 → 2/4 → 3/4 → (4/4 teardown) in a live TUI tree with timings, logs, and links to each running cloud resource. For Azure/OCI/CoreWeave/generic, Claude walks
prepare-infra.md→deploy-dataplane.mdstep by step, asking before each cloud-mutating command and showing the values-file diff before saving.5. Verify
Once helm install completes, Claude runs:
kubectl get pods -n union— every pod must reach Running/Completedkubectl get events -n union --sort-by=.lastTimestamp | tail -20python -m smoke_testsinvocation against the new cluster6. Teardown is first-class
mode=teardownis a real entry point, not an afterthought. AWS/GCP invoke the script'steardown_clustertask (resolves account/project from ambient identity, deletes everythingcluster_nameimplied). Manual clouds get a reverse-order delete with per-resource confirmation — the skill never bulk-deletes a resource group, project, or subscription.Operating rules the skill enforces
These appear at the top of
SKILL.mdand govern every cloud path:cluster_name, region, or credentials → ask, don't guess.Test plan
/deploy-selfmanagedand confirm intent capture works (cloud + mode via questions, free-form for URL/name/region).helm/kubectl/uctl/uv.aws configure export-credentialsiseval'd.deployon a throwaway cluster (cost ~$1.30/hr) and confirm teardown completes cleanly.teardownmode against a--skip_teardowncluster.deployon a throwaway project; confirmgke-gcloud-auth-pluginprereq check catches its absence.<cloud>.mdagainst the correspondingcontent/deployment/selfmanaged/<cloud>/doc to confirm gates and command order match.🤖 Generated with Claude Code