feat(terraform): one-click Tencent Cloud cluster deployment with EN/ZH docs#667
feat(terraform): one-click Tencent Cloud cluster deployment with EN/ZH docs#667kinwin-ustc wants to merge 2 commits into
Conversation
…H docs Add a Terraform deployer that stands up a clustered Cube Sandbox on Tencent Cloud (managed TKE control plane + CVM PVM compute nodes, cloud MySQL/Redis, CFS, TCR, jumpserver) in one shot. - Docs: add EN/ZH deployment guides and register them in the VitePress sidebar - Security groups: split the single shared group into 4 least-privilege per-role groups (jumpserver / compute / tke-pod / clb); rebind the CVMs, TKE worker_config and node pool, and the 4 CLB Service annotations; expose a per-role security_group_ids map - Instances: default to the SA9 family (jumpserver SA9.MEDIUM4, compute SA9.2XLARGE16) and add a configurable TKE worker type (tke_worker_instance_type, default SA9.LARGE8) - Storage: attach a dedicated XFS CBS data disk to each compute node at /data/cubelet, sized via compute_data_disk_size (default 200GB), formatted and mounted by an idempotent first-boot script - Availability zone: auto-detect the first zone instead of a hardcoded value - Hardening: document public-facing service tips (WebUI dedicated SG + source-IP allowlist, cube-api Auth Callback, cube-proxy restrict-public-access) Assisted-by: CodeBuddy Signed-off-by: Feng Jin <ronyjin@tencent.com>
auto-pause/auto-resume is driven by the co-resident cube-proxy-sidecar sweeper, which only observes last-active timestamps for requests hitting its own replica. With >1 replica behind a round-robin/least-conn LB, a sandbox's traffic is split across replicas, so a replica that did not serve recent requests wrongly believes the sandbox is idle and pauses it, causing a pause -> auto-resume churn loop. Change cube_proxy_replicas default 2 -> 1 (variables.tf, create.sh, env.example), add liveness/readiness TCP probes (8081) to the cube-proxy Deployment, and document the single-replica requirement plus the SandboxID-hash prerequisite for scaling out (README, README_zh, lifecycle, tencentcloud-terraform-deploy in both languages). Assisted-by: CodeBuddy Signed-off-by: Feng Jin <ronyjin@tencent.com>
|
No description provided. |
| description = "Allow VPC internal traffic" | ||
| } | ||
|
|
||
| egress { |
There was a problem hiding this comment.
Consider restricting egress for defense-in-depth
All four security groups allow unrestricted egress. While this is common for initial deployments, the compute nodes run sandboxed code by design — a sandbox escape would inherit full internet egress. Restricting egress per role (e.g., compute → VPC + pod CIDR only; jumpserver → VPC + TCP 443 to internet) would meaningfully reduce blast radius. Same applies to cubesandbox-sg-compute (line 306), cubesandbox-sg-tke-pod (line 344), and cubesandbox-sg-clb (line 396).
| description = "Allow VPC internal traffic (jumpserver management, cube-master scheduling)" | ||
| } | ||
|
|
||
| egress { |
There was a problem hiding this comment.
Same unrestricted egress concern as the jumpserver group — compute nodes run sandboxed code and a sandbox escape would inherit full internet egress.
| description = "Allow VPC internal traffic (CLB health checks, jumpserver, CFS NFS)" | ||
| } | ||
|
|
||
| egress { |
There was a problem hiding this comment.
Same unrestricted egress concern — these host the TKE control-plane pods.
| description = "Allow cube-master CLB (VPC-internal only)" | ||
| } | ||
|
|
||
| egress { |
There was a problem hiding this comment.
The CLB egress description says "Allow CLB -> backend (pod/node)". This could be scoped to just the backend CIDRs rather than 0.0.0.0/0 — the CLB doesn't need general internet egress.
| description = "Allow VPC internal traffic" | ||
| } | ||
|
|
||
| egress { |
There was a problem hiding this comment.
Consider restricting egress for defense-in-depth
All four security groups allow unrestricted egress. While this is common for initial deployments, the compute nodes run sandboxed code by design — a sandbox escape would inherit full internet egress. Restricting egress per role (e.g., compute → VPC + pod CIDR only; jumpserver → VPC + TCP 443 to internet) would meaningfully reduce blast radius. Same applies to cubesandbox-sg-compute (line 306), cubesandbox-sg-tke-pod (line 344), and cubesandbox-sg-clb (line 396).
Summary
Add a Terraform deployer that stands up a clustered Cube Sandbox on Tencent Cloud in one shot: a managed TKE control plane (
cube-master/cube-api/cube-proxy/cube-webui), CVM PVM compute nodes, cloud MySQL + Redis, CFS shared storage, a TCR registry, and a jumpserver bastion. Ships with full English / Chinese deployment guides.What's included
docs/guide+docs/zh/guide), registered in the VitePress sidebar.jumpserver/compute/tke-pod/clb); each role only opens the ingress it needs, so compromising one role no longer inherits the others' inbound surface. CVMs, TKEworker_config/node pool and the 4 CLB Service annotations are rebound accordingly, and the output becomes a per-rolesecurity_group_idsmap.SA9.MEDIUM4, computeSA9.2XLARGE16); new configurabletke_worker_instance_type(defaultSA9.LARGE8)./data/cubelet(sandbox image templates, snapshots, runtime data), sized viacompute_data_disk_size(default 200GB), formatted/mounted by an idempotent first-boot script.Changes
deploy/one-click/terraform/tencentcloud/{main.tf,variables.tf,tke-addons.tf,create.sh,env.example}docs/guide/tencentcloud-terraform-deploy.md,docs/zh/guide/tencentcloud-terraform-deploy.md,docs/.vitepress/config.mjs8 files changed, +911 / -53.
Testing
terraform validate/terraform planon the module.vitepress build) passes; sidebar links resolve.Notes
destroy.sh.Assisted-by: CodeBuddy
Signed-off-by: Feng Jin ronyjin@tencent.com