Skip to content

feat(terraform): one-click Tencent Cloud cluster deployment with EN/ZH docs#667

Open
kinwin-ustc wants to merge 2 commits into
TencentCloud:masterfrom
kinwin-ustc:feat/doc-add-terrorform
Open

feat(terraform): one-click Tencent Cloud cluster deployment with EN/ZH docs#667
kinwin-ustc wants to merge 2 commits into
TencentCloud:masterfrom
kinwin-ustc:feat/doc-add-terrorform

Conversation

@kinwin-ustc

Copy link
Copy Markdown
Collaborator

Summary

Add a Terraform deployer that stands up a clustered Cube Sandbox on Tencent Cloud in one shot: a managed TKE control plane (cube-master / cube-api / cube-proxy / cube-webui), CVM PVM compute nodes, cloud MySQL + Redis, CFS shared storage, a TCR registry, and a jumpserver bastion. Ships with full English / Chinese deployment guides.

What's included

  • Docs — EN/ZH deployment guides (docs/guide + docs/zh/guide), registered in the VitePress sidebar.
  • Per-role security groups — replaces the single shared group with 4 least-privilege groups (jumpserver / compute / tke-pod / clb); each role only opens the ingress it needs, so compromising one role no longer inherits the others' inbound surface. CVMs, TKE worker_config/node pool and the 4 CLB Service annotations are rebound accordingly, and the output becomes a per-role security_group_ids map.
  • SA9 instances — defaults updated to the SA9 family (jumpserver SA9.MEDIUM4, compute SA9.2XLARGE16); new configurable tke_worker_instance_type (default SA9.LARGE8).
  • Dedicated compute data disk — each compute node gets an XFS CBS data disk mounted at /data/cubelet (sandbox image templates, snapshots, runtime data), sized via compute_data_disk_size (default 200GB), formatted/mounted by an idempotent first-boot script.
  • Availability zone — auto-detects the first available zone instead of a hardcoded value.
  • Public-service hardening tips — WebUI dedicated SG + strict source-IP allowlist, cube-api Auth Callback, cube-proxy restrict-public-access.

Changes

Area Files
Terraform deploy/one-click/terraform/tencentcloud/{main.tf,variables.tf,tke-addons.tf,create.sh,env.example}
Docs docs/guide/tencentcloud-terraform-deploy.md, docs/zh/guide/tencentcloud-terraform-deploy.md, docs/.vitepress/config.mjs

8 files changed, +911 / -53.

Testing

  • terraform validate / terraform plan on the module.
  • Docs build (vitepress build) passes; sidebar links resolve.

Notes

  • All resources default to pay-as-you-go (POSTPAID); tear down with destroy.sh.
  • Databases and the TKE API server are VPC-internal only.

Assisted-by: CodeBuddy
Signed-off-by: Feng Jin ronyjin@tencent.com

…H docs

Add a Terraform deployer that stands up a clustered Cube Sandbox on Tencent Cloud (managed TKE control plane + CVM PVM compute nodes, cloud MySQL/Redis, CFS, TCR, jumpserver) in one shot.

- Docs: add EN/ZH deployment guides and register them in the VitePress sidebar

- Security groups: split the single shared group into 4 least-privilege per-role groups (jumpserver / compute / tke-pod / clb); rebind the CVMs, TKE worker_config and node pool, and the 4 CLB Service annotations; expose a per-role security_group_ids map

- Instances: default to the SA9 family (jumpserver SA9.MEDIUM4, compute SA9.2XLARGE16) and add a configurable TKE worker type (tke_worker_instance_type, default SA9.LARGE8)

- Storage: attach a dedicated XFS CBS data disk to each compute node at /data/cubelet, sized via compute_data_disk_size (default 200GB), formatted and mounted by an idempotent first-boot script

- Availability zone: auto-detect the first zone instead of a hardcoded value

- Hardening: document public-facing service tips (WebUI dedicated SG + source-IP allowlist, cube-api Auth Callback, cube-proxy restrict-public-access)

Assisted-by: CodeBuddy
Signed-off-by: Feng Jin <ronyjin@tencent.com>
auto-pause/auto-resume is driven by the co-resident cube-proxy-sidecar sweeper, which only observes last-active timestamps for requests hitting its own replica. With >1 replica behind a round-robin/least-conn LB, a sandbox's traffic is split across replicas, so a replica that did not serve recent requests wrongly believes the sandbox is idle and pauses it, causing a pause -> auto-resume churn loop.

Change cube_proxy_replicas default 2 -> 1 (variables.tf, create.sh, env.example), add liveness/readiness TCP probes (8081) to the cube-proxy Deployment, and document the single-replica requirement plus the SandboxID-hash prerequisite for scaling out (README, README_zh, lifecycle, tencentcloud-terraform-deploy in both languages).

Assisted-by: CodeBuddy
Signed-off-by: Feng Jin <ronyjin@tencent.com>
@cubesandboxbot

Copy link
Copy Markdown

No description provided.

description = "Allow VPC internal traffic"
}

egress {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider restricting egress for defense-in-depth

All four security groups allow unrestricted egress. While this is common for initial deployments, the compute nodes run sandboxed code by design — a sandbox escape would inherit full internet egress. Restricting egress per role (e.g., compute → VPC + pod CIDR only; jumpserver → VPC + TCP 443 to internet) would meaningfully reduce blast radius. Same applies to cubesandbox-sg-compute (line 306), cubesandbox-sg-tke-pod (line 344), and cubesandbox-sg-clb (line 396).

description = "Allow VPC internal traffic (jumpserver management, cube-master scheduling)"
}

egress {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same unrestricted egress concern as the jumpserver group — compute nodes run sandboxed code and a sandbox escape would inherit full internet egress.

description = "Allow VPC internal traffic (CLB health checks, jumpserver, CFS NFS)"
}

egress {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same unrestricted egress concern — these host the TKE control-plane pods.

description = "Allow cube-master CLB (VPC-internal only)"
}

egress {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLB egress description says "Allow CLB -> backend (pod/node)". This could be scoped to just the backend CIDRs rather than 0.0.0.0/0 — the CLB doesn't need general internet egress.

description = "Allow VPC internal traffic"
}

egress {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider restricting egress for defense-in-depth

All four security groups allow unrestricted egress. While this is common for initial deployments, the compute nodes run sandboxed code by design — a sandbox escape would inherit full internet egress. Restricting egress per role (e.g., compute → VPC + pod CIDR only; jumpserver → VPC + TCP 443 to internet) would meaningfully reduce blast radius. Same applies to cubesandbox-sg-compute (line 306), cubesandbox-sg-tke-pod (line 344), and cubesandbox-sg-clb (line 396).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants