A centrally-managed edge computing system for medical imaging data capture and upload to XNAT. Part of NIF FDRI Stream 2.
- Management node: Ubuntu 22.04+, 8GB+ RAM, 100GB+ disk
- Edge worker(s): Ubuntu 22.04+, 4GB+ RAM, 50GB+ disk
- SSH access: Key-based SSH from management node to each edge worker
- XNAT instance: Accessible via HTTPS with a local service account
- DICOM source: One or more modalities that can C-STORE to the edge node on port 4242 (AET=
AISEDGE). Their Called-AETs must be listed inconfig/orthanc/routing.json. AIS_DEID_HMAC_SALT: A per-deployment secret. Generate one withopenssl rand -hex 32and set it inconfig/management.envbefore running07c.- Outbound internet: Both management and edge nodes need it (for pulling container images)
Four files — every one has a .template next to it. Copy and fill in.
All four are gitignored once copied, so secrets never end up in version control.
| File (after copy) | What to set | Source |
|---|---|---|
config/management.env |
MGMT_NODE_IP, XNAT URL/user/pass, S3 admin keys, AIS_DEID_HMAC_SALT, observability vars |
management.env.template |
config/edge-nodes.env |
EDGE_NODES array — one line per edge site (IP, SSH user/key, XNAT project, scoped S3 key/secret) |
edge-nodes.env.template |
config/orthanc/routing.json |
AETMap — each modality's Called-AET → XNAT project |
routing.json.template |
config/orthanc/deidentification-profile.json |
Replace / Keep blocks per Orthanc /modify API — the deid contract for this site. Applied to every accepted study |
deidentification-profile.json.template |
Anything else under config/ (k0s-controller.yaml, the Lua hook, orthanc.json) ships with sane defaults and rarely needs editing. Inside each template, look for # REQUIRED markers (env files) or REPLACE_* placeholders (JSON files) to identify the fields you must fill in.
# 1. Clone this repo on the management node
git clone <repo-url> && cd k0s-k0smotron-mvp
# 2. Copy templates + edit the four files above
cp config/management.env.template config/management.env
cp config/edge-nodes.env.template config/edge-nodes.env
cp config/orthanc/routing.json.template config/orthanc/routing.json
cp config/orthanc/deidentification-profile.json.template config/orthanc/deidentification-profile.json
$EDITOR config/management.env \
config/edge-nodes.env \
config/orthanc/routing.json \
config/orthanc/deidentification-profile.json
# 3. Generate the deid HMAC salt and paste it into management.env
openssl rand -hex 32 # set AIS_DEID_HMAC_SALT="<paste>" in config/management.env
# 4. Ensure SSH access to edge nodes
ssh-keygen -t ed25519 # if you don't have a key
ssh-copy-id ubuntu@<edge-ip>
# 5. Install — step 07c will show the AETMap + profile list and ask for explicit
# confirmation before deploying the deid policy
chmod +x install.sh scripts/*.sh
./install.shArchitecture, data flow, security model, and component-by-component reference are all below.
AIS-Edge supports two deployment shapes, selected by INSTALL_TOPOLOGY in
config/management.env. Both run the same application stack — the
difference is purely how the management cluster exposes itself to edges
and how edge workers resolve management-side hostnames.
INSTALL_TOPOLOGY=onprem (default) |
INSTALL_TOPOLOGY=cloud |
|
|---|---|---|
| Inbound TLS | nginx-ingress binds :443 via hostNetwork: true on the mgmt node |
nginx-ingress exposed as Service type: LoadBalancer; cloud LB owns the public IP |
| Hostname resolution | /etc/hosts writes on each edge VM + hostAliases: inside each pod |
Real public DNS (a registered zone or nip.io for dev) |
| Where it runs | k0s on a single mgmt VM you fully control | Managed K8s (EKS, GKE, AKS, OpenStack Magnum, Nectar k0s + Octavia) |
| Cert issuer (default) | ais-edge-ca-issuer (self-signed root, distributed to edges) |
Same ais-edge-ca-issuer for dev; letsencrypt-prod for production |
Cloud-topology details differ per provider (LB controller, credential
format, FIP semantics, DNS-01 solver). Each provider has its own page
under docs/clouds/:
| Provider | Doc | Status |
|---|---|---|
| OpenStack — private subnet + FIP (recommended for production) | openstack-private-subnet.md |
mirrors AWS / GCP / Azure shape |
| OpenStack — Nectar QLD shared external network (dev/test only) | openstack-nectar.md |
✅ E2E tested |
| AWS (EKS) | aws.md |
design complete |
| GCP (GKE) | gcp.md |
design complete |
| Azure (AKS) | azure.md |
design complete |
Architecture overview, packet flow, dev→prod swap procedure are in
docs/cloud-deployment.md.
All set in config/management.env. Defaults + per-knob docstrings live in
config/management.env.template.
export INSTALL_TOPOLOGY="cloud"
export CLOUD_PROVIDER="openstack" # openstack | aws | gcp | azure | none
export CLOUD_CREDENTIALS_FILE="/path/to/openrc.sh" # any shell script of `export X=Y` lines;
# auto-sourced by install.sh before cloud steps
export LB_PUBLIC_IP="" # set if you can pre-allocate a FIP, leave blank
# to let the cloud LB controller auto-assign
export LB_SUBNET_ID="" # OpenStack: subnet UUID where the LB VIP lives
export LB_AVAILABILITY_ZONE="" # OpenStack: Octavia AZ (must match mgmt-VM AZ)
export PRECREATE_LB="" # set to "1" only on Nectar's shared-network
# topology — step 00a creates the LB up front
# and writes the VIP back into this file so the
# rest of the install runs uninterrupted
export OCCM_CLUSTER_NAME="aisedge" # unique per project to avoid stale-LB name
# collisions in OCCM's name-based lookup
export INTERNAL_DOMAIN="aisedge.example.com" # your DNS zone, or .<LB-IP-dashed>.nip.io for dev
export CERT_ISSUER="ais-edge-ca-issuer" # or "letsencrypt-prod" once on a real domain
export DNS_PROVIDER="" # only when CERT_ISSUER=letsencrypt-*
# (cloudflare | route53 | clouddns | azuredns | rfc2136)A single extra step runs ahead of 01b when INSTALL_TOPOLOGY=cloud +
CLOUD_PROVIDER=openstack + PRECREATE_LB=1:
00a-precreate-lb— pre-creates the Octavia LB on the tenant subnet, captures its auto-assigned VIP, and writesLB_PUBLIC_IP+INTERNAL_DOMAINback intomanagement.envbefore any cert is minted. Skips silently for managed K8s (EKS / GKE / AKS / Magnum) where the platform CCM handles this synchronously with the Service creation in step 02c.
Everything else (01, 01b, 02…07c) is identical to the onprem
path — just with different rendering of templates (no hostAliases:
blocks on edge pods, no /etc/hosts writes on mgmt VM or edge VMs, no
IP SANs in server certs).
AIS-Edge has two "edge" components, both running the same ingest stack but at different points in the pipeline:
- Facility edge — sits on the facility's network alongside the modalities. Receives DICOMs locally (C-STORE on port 4242 or file drop), stages, and pushes over the internet to the server side.
- Server edge — sits on or alongside the AIS-Server / XNAT cluster. Receives staged data from the facility edge and uploads it into XNAT via REST.
Internet transfer happens only between the two edges — never directly from a facility to XNAT.
| Reason | Effect |
|---|---|
| Centralised k8s management | The Kubernetes management cluster runs on the server side and manages both the server-edge and facility-edge worker nodes via k0smotron + konnectivity. AIS operators kubectl from the server side to manage the facility edge — no per-site kubeconfig juggling |
| Outbound-only from the facility | The facility edge only opens outbound TCP/443 to the server side. No inbound ports needed from the internet or management network. Works behind aggressive hospital firewalls |
| XNAT credentials stay server-side | The XNAT REST credentials live only on the server edge. A compromised facility edge cannot reach XNAT directly |
| Low resource floor at facility | Facility edge runs a k0s worker only (the control plane is hosted on the server side via k0smotron). 4GB RAM is enough |
Edge nodes connect to the management cluster over a single TLS port (443). nginx-ingress on the management host reads the SNI from the TLS handshake and routes to the right backend. The only inbound port at the edge is DICOM port 4242 on the local facility LAN (for modality C-STOREs); there are no inbound ports from the internet or management network.
Management Node Edge Worker(s)
┌──────────────────────────────────┐ ┌──────────────────────────┐
│ nginx-ingress (hostNet :443) │ │ k0s worker │
│ ├─ TLS terminate / SNI-route │ │ │
│ │ - seaweedfs.aisedge.local │◄────────┤ Outbound :443 to mgmt │
│ │ - k0s.aisedge.local │ TLS │ │
│ │ - konnect.aisedge.local │ :443 │ Orthanc pod │
│ │ (all certs signed by │ │ ├─ DICOM SCP :4242 on │
│ │ ais-edge-ca via cert-mgr) │ │ │ local facility LAN │
│ │ │ │ ├─ Lua hook: deid + │
│ k0smotron operator │ │ │ /facility-backup │
│ ├─ hosted control plane (CIP) │ │ └─ Storage on hostPath│
│ │ ↳ Ingress for API+konect │ │ /data/orthanc- │
│ │ │ │ storage/ │
│ SeaweedFS (ClusterIP only) │ │ │
│ ├─ S3 :8333 (HTTP, in-cluster) │ │ xnat-ingest-sort pod │
│ │ edges hit via Ingress :443 │ │ ├─ REST-polls Orthanc │
│ │ │ │ └─ hardlinks deid'd │
│ xnat-ingest-upload pod │ │ instances into │
│ └─ in-cluster DNS to seaweedfs │ │ /data/staging/ │
│ │ │ │
│ cert-manager: ais-edge-ca │ │ s3-uploader pod │
│ ├─ self-signed root (10 yr) │ │ ├─ mc trusts ais-edge-│
│ └─ issues server certs (1 yr) │ │ │ ca via ca-bundle │
│ │ │ └─ mc mirror → │
│ │ │ https://seaweedfs..│
│ │ │ │
│ │ │ Credentials on edge: │
│ │ │ ├─ S3 write-only key │
│ │ │ ├─ AIS_DEID_HMAC_SALT │
│ │ │ └─ ais-edge-ca.crt │
│ │ │ │
│ │ │ Mgmt-net inbound: ZERO │
└─────────┬────────────────────────┘ │ LAN inbound: :4242 only │
│ HTTPS REST API └──────────────────────────┘
▼
┌──────────────────────────┐ ◄──────── Modalities
│ XNAT Server │ C-STORE to AET=
│ (separate infrastructure)│ AISEDGE :4242
└──────────────────────────┘
1. Modality C-STOREs to Orthanc on edge worker
- Orthanc receives on port 4242 with AET=AISEDGE
- Per-AET routing in routing.json selects recipe + XNAT project
│
▼
2. Orthanc Lua hook (on edge, deidentify-and-forward.lua)
- OnStoredInstance:
a. Writes ORIGINAL to /facility-backup/ (site-controlled retention)
b. /modify with deid recipe; UIDs are kept so the deid'd instance
lands in the same Study
c. Deletes ORIGINAL from Orthanc (keeps the deid'd instance in storage)
- OnStableStudy (after StableAge=30s silence):
d. PUTs label "xnat-ingest-ready" on the study
│
▼
3. xnat-ingest sort (on edge, REST-pull mode)
- Polls Orthanc REST every INGEST_LOOP_SECONDS
- Filters: has label "xnat-ingest-ready", lacks label "xnat-ingest-skip"
- Hardlinks instances from /data/orthanc-storage/ into
/data/staging/PROJECT.SUBJECT.VISIT/<scan>/DICOM/
(same filesystem — hardlink, not copy)
- PUTs label "xnat-ingest-skip" on the study
│
▼
4. s3-uploader (on edge, using `mc mirror`)
- Reads staged sessions from /data/staging/
- Mirrors to SeaweedFS at s3://ingest-bucket/staged/ (write-only key)
- Deletes local /data/staging/ copy only after successful upload
- SeaweedFS (via S3 API) handles multipart upload, checksums, resume on failure
│
▼
5. SeaweedFS (on management node)
- Stores files under s3://ingest-bucket/staged/<session>/
- Write-only from edge, full access from management
│
▼
6. xnat-ingest upload (on management node)
- Reads from s3://ingest-bucket/staged
- Uploads to XNAT via REST API (XNAT credentials only here)
- Creates project/subject/session/scan hierarchy in XNAT
- Verifies checksums after upload
- Skips sessions already in XNAT (idempotent)
The deid happens at step 2 inside Orthanc; everything downstream of
OnStoredInstance works with deid'd identifiers. The original DICOM
exists in /facility-backup (real identifiers, site-retained) and
nowhere else — never leaves the edge worker.
The s3-uploader pod runs on the edge worker using the minio/mc (MinIO Client) image.
It's a simple shell loop — no custom code:
# Configure mc with the write-only credentials.
# mc reads PEM files in /root/.mc/certs/CAs/ — we mount the ca-bundle Secret
# there, so mc trusts our ais-edge-ca-signed seaweedfs-tls cert.
mc alias set edge "https://seaweedfs.aisedge.local" "<access-key>" "<secret-key>"
# Loop forever, checking every 30 seconds
while true; do
for session_dir in /data/staging/*/; do
# Upload entire session directory to SeaweedFS, preserving structure
mc mirror --overwrite "$session_dir" "edge/ingest-bucket/staged/$session_name/"
# Delete local copy only after successful upload
rm -rf "$session_dir"
done
sleep 30
donemc mirror is like rsync for S3. Under the hood it:
- Breaks large files into multipart chunks (handles 100GB+ files)
- Uploads chunks in parallel for speed
- Verifies MD5 checksums after each chunk
- Retries failed chunks automatically
- Only transfers new/changed files if re-run (delta sync)
The actual protocol is standard HTTP PUT to the S3 API — the same protocol AWS S3 uses. If SeaweedFS is swapped for AWS S3, the uploader works without changes (just a different endpoint URL).
Edge Worker Management Node XNAT
├─ S3 write-only key ├─ S3 admin key ├─ User data
│ (write+list on one bucket only) ├─ XNAT admin credentials │
│ (cannot read other sites' data) ├─ ais-edge-ca PRIVATE key │
├─ NO XNAT credentials │ (in cert-manager Secret) │
├─ NO inbound ports │ │
├─ ais-edge-ca PUBLIC cert (mounted) │ │
│ used to verify mgmt server identity │ │
├─ Outbound only, single port: │ │
│ → mgmt :443 (TLS, SNI-routed) │ │
| If compromised... | Impact |
|---|---|
| Edge worker | Attacker sees local DICOMs + scoped S3 key + public CA cert. Key can only write to ingest bucket. Cannot read other sites' data. Cannot access XNAT. Cannot forge new server certs (CA private key is on management). |
| Edge S3 key | Can write junk to one bucket. Cannot read data. Cannot access XNAT. Cannot impersonate other sites. |
| Wire (between edge and management) | Sniffer sees TLS-encrypted bytes only. Cannot read DICOMs in transit. Cannot impersonate the management server (would need a cert signed by ais-edge-ca). |
| Management node | Full access — this is your crown jewel. Harden accordingly. CA private key lives here; back it up offline if you can't tolerate re-rolling all edge trust on rebuild. |
k0s-k0smotron-mvp/
├── README.md ← You are here
├── install.sh ← Main installer (run this)
├── ais-edge-ca.crt ← Public CA cert (gitignored; generated at install)
├── config/
│ ├── management.env.template ← Management node config (copy to management.env)
│ ├── edge-nodes.env.template ← Edge nodes config (copy to edge-nodes.env)
│ ├── k0s-controller.yaml ← k0s cluster config
│ └── orthanc/ ← Edge-side Orthanc config (mounted as ConfigMaps)
│ ├── orthanc.json ← Daemon config (AET, ports, storage paths)
│ ├── deidentify-and-forward.lua ← Generic deid + label Lua hook
│ ├── routing.json ← Per-site AET → recipe + project mapping
│ └── recipe-*.json ← Deid recipes (research-default is the MVP one)
├── manifests/
│ ├── 01-management/ ← Runs on management cluster
│ │ ├── cert-issuers.yaml ← cert-manager bootstrap + CA + CA Issuer
│ │ ├── nginx-ingress-values.yaml.tpl ← helm values for nginx-ingress
│ │ ├── seaweedfs.yaml.tpl ← SeaweedFS Deployment + ClusterIP Service
│ │ ├── seaweedfs-tls-cert.yaml.tpl ← server cert for SeaweedFS Ingress
│ │ ├── seaweedfs-ingress.yaml.tpl ← nginx Ingress at :443 with SNI route
│ │ ├── edge-cluster.yaml.tpl ← Hosted k0s control plane (with spec.ingress)
│ │ └── xnat-upload.yaml.tpl ← Reads SeaweedFS → uploads to XNAT
│ └── 02-edge/ ← Runs on edge workers (child cluster)
│ ├── xnat-ingest.yaml.tpl ← Sort (Orthanc REST-pull mode) + s3-uploader
│ └── orthanc.yaml.tpl ← Orthanc Deployment + Service + Secret
├── scripts/
│ ├── 00-common.sh ← Shared functions
│ ├── 01-install-k0s.sh ← Install k0s on mgmt
│ ├── 02-install-k0smotron.sh ← cert-manager + k0smotron
│ ├── 02b-bootstrap-ca.sh ← Bootstrap self-signed CA (ais-edge-ca)
│ ├── 02c-install-nginx-ingress.sh ← nginx-ingress on hostNetwork :443
│ ├── 02d-install-observability.sh ← Loki + Prom + Grafana + Vector (optional)
│ ├── 03-deploy-seaweedfs.sh ← SeaweedFS + TLS cert + Ingress
│ ├── 04-deploy-xnat-upload.sh ← Mgmt-side XNAT upload pod
│ ├── 05-setup-edge-cluster.sh ← Per-edge: hosted control plane + token
│ ├── 06-join-edge-worker.sh ← Per-edge: install k0s worker, /etc/hosts, CoreDNS
│ ├── 07-deploy-edge-ingest.sh ← Per-edge: deploy sort (REST-pull) + s3-uploader
│ ├── 07b-deploy-edge-observability.sh ← Per-edge: Vector log shipper (optional)
│ ├── 07c-deploy-edge-orthanc.sh ← Per-edge: deploy Orthanc + deid Lua hook
│ ├── rotate-ca.sh ← CA rotation (--phase=1 / --phase=2)
│ └── uninstall.sh ← Tears down everything
└── .gitignore
.tpl files are manifest templates — placeholders like {{CLUSTER_NAME}} are replaced with
values from your config files during installation.
If you already have a Kubernetes cluster running (k3s, kubeadm, MicroK8s, etc.):
- Set
INSTALL_MODE="existing"inconfig/management.env - Ensure
kubectlis configured and pointing to your cluster (~/.kube/config) - Ensure a default StorageClass exists (check with
kubectl get sc) - Run
./install.sh— it will skip k0s installation and use your existing cluster
The installer will deploy k0smotron, SeaweedFS, and the upload pod as regular workloads on your existing cluster. Everything else works the same.
Edit config/edge-nodes.env and add entries to the EDGE_NODES array:
EDGE_NODES=(
"edge-uqcai|203.101.230.171|ubuntu|~/.ssh/id_ed25519|uqcai-project|edge-uqcai-key|uqcai-secret"
"edge-usyd|10.0.1.50|ubuntu|~/.ssh/id_ed25519|usyd-project|edge-usyd-key|usyd-secret"
"edge-newcastle|10.0.2.50|ubuntu|~/.ssh/id_ed25519|newcastle-project|edge-newcastle-key|newcastle-secret"
)Each edge node gets:
- Its own hosted k0s control plane (separate namespace on management cluster)
- Its own scoped S3 credentials (write+list to ingest bucket only — isolated per site)
- Its own kubeconfig file (
kubeconfig-edge-uqcai, etc.) - Its own xnat-ingest pods
Then re-run ./install.sh — it will skip already-installed components and only set up new nodes.
To remove one edge site without affecting others:
# 1. Delete workloads on the edge cluster
kubectl --kubeconfig kubeconfig-edge-uqcai delete namespace xnat-ingest
# 2. Reset the edge worker VM
ssh ubuntu@<edge-ip> "sudo k0s stop && sudo k0s reset"
# 3. Delete the hosted cluster from management
kubectl delete namespace edge-uqcai
# 4. Remove the S3 identity (regenerate s3.json without this edge user)
# Just remove the entry from edge-nodes.env, then re-run scripts/03-deploy-seaweedfs.sh
# (it regenerates the SeaweedFS s3.json ConfigMap and rolls the pod)
# 5. Clean up generated files
rm kubeconfig-edge-uqcai join-token-edge-uqcai
# 6. Remove the entry from config/edge-nodes.env| Component | Version | Notes |
|---|---|---|
| Ubuntu | 22.04.5 LTS | Management and edge nodes |
| k0s | v1.35.2+k0s.0 | Both management cluster and edge workers |
| k0smotron | v1.10.4 (stable) | Installed via kubectl apply (not Helm). Uses built-in spec.ingress for SNI routing. |
| cert-manager | latest | Issues ais-edge-ca (10y root) + per-service server certs (1y, auto-renew) |
| nginx-ingress | latest (helm) | hostNetwork :443. SSL passthrough enabled (k0s API + konnectivity); TLS termination for SeaweedFS. |
| SeaweedFS | 3.99 | chrislusf/seaweedfs:3.99 (last 3.x stable, avoids 4.18/4.19 filer memory regression — issue #9035). ClusterIP only; external access via Ingress. |
| MinIO Client (mc) | latest | minio/mc:latest — vendor-neutral S3 client used by edge s3-uploader. Trusts ais-edge-ca via mounted ca-bundle Secret. |
| Orthanc | 1.12.6 (plugins) | jodogne/orthanc-plugins:1.12.6 — DICOM SCP on edge port 4242. Needs ≥ 1.12.0 for study-level labels. |
| xnat-ingest | v0.1.0 | ghcr.io/aswinnarayanan/xnat-ingest:v0.1.0 — logging-v3 + Orthanc REST-pull mode. |
| local-path-provisioner | v0.0.30 | Default StorageClass for etcd PVCs |
To pin specific versions in production, replace :latest / :3.99 tags in the .tpl
manifests with explicit versions (e.g. chrislusf/seaweedfs:3.99-rc1).
Manifest files ending in .tpl contain placeholders like {{S3_BUCKET}}.
The render() function in scripts/00-common.sh performs simple string replacement
at install time — no Helm, no Jinja, no external tools required.
# Example: what happens when install.sh processes seaweedfs-ingress.yaml.tpl
Input: host: {{SEAWEEDFS_HOSTNAME}}
Output: host: seaweedfs.aisedge.localValues come from config/management.env and config/edge-nodes.env. You never edit .tpl files.
s3://ingest-bucket/
└── staged/
├── test-project.patient01.visit01/ ← one directory per session
│ └── 1.T1w_MPRAGE/ ← scan ID + description
│ └── DICOM/ ← resource type
│ ├── file1.dcm
│ ├── file2.dcm
│ └── MANIFEST.json
├── test-project.patient02.visit01/
│ └── ...
└── ...
The staged/ prefix separates ingest data from any other bucket contents.
Session directory names follow the format PROJECT.SUBJECT.VISIT.
On each edge worker:
/data/xnat-ingest/
├── orthanc-storage/ ← Orthanc DICOM storage tree (deid'd instances live here)
└── staging/
└── PROJECT.SUBJECT.VISIT/ ← xnat-ingest sort's hardlinked output, awaiting S3 upload
/data/facility-backup/ ← ORIGINAL DICOMs (real identifiers) — site-controlled retention.
Written by the Orthanc deid Lua hook; never leaves the edge.
Files flow: Modality C-STORE → Orthanc (deid + delete original, keep deid'd) → sort hardlinks → staging/ → SeaweedFS → eventually deleted from edge after successful S3 upload.
/data/orthanc-storage and /data/xnat-ingest/staging must be on the same physical filesystem so hardlinks resolve (cross-fs hardlinks fail with EXDEV).
# Management cluster health
kubectl get pods -A # all pods should be Running
kubectl get nodes # management node should be Ready
# Edge cluster health
kubectl --kubeconfig kubeconfig-<name> get nodes # edge worker should be Ready
kubectl --kubeconfig kubeconfig-<name> get pods -n xnat-ingest # sort + s3-uploader Running
# SeaweedFS health (TLS via nginx-ingress; CA bundle is ais-edge-ca.crt)
curl --cacert ais-edge-ca.crt \
--resolve seaweedfs.aisedge.local:443:<MGMT_IP> \
https://seaweedfs.aisedge.local/ # → 403 (S3 unauth) means path works
# SeaweedFS master + filer (admin only — port-forward, no external port)
kubectl port-forward -n seaweedfs svc/seaweedfs 9333:9333 & # master http://localhost:9333
kubectl port-forward -n seaweedfs svc/seaweedfs 8888:8888 & # filer http://localhost:8888
# SeaweedFS from edge (with CA verification)
ssh ubuntu@<EDGE_IP> "curl --cacert /tmp/ais-edge-ca.crt https://seaweedfs.aisedge.local/"
# SeaweedFS bucket contents (mgmt-side via port-forward + mc alias)
kubectl port-forward -n seaweedfs svc/seaweedfs 8333:8333 &
mc alias set seaweed-admin http://localhost:8333 <admin-key> <admin-secret>
mc ls seaweed-admin/ingest-bucket/staged/ # list sessions in bucket
# XNAT connectivity
curl -sk <XNAT_URL> # should return HTML
# Check logs for errors
kubectl logs -n xnat-upload -l component=upload --tail=5 # XNAT upload
kubectl --kubeconfig kubeconfig-<name> logs -n xnat-ingest -l component=sort --tail=5
kubectl --kubeconfig kubeconfig-<name> logs -n xnat-ingest -l component=s3-uploader --tail=5The SeaweedFS Service is ClusterIP only — no external port. Reach
the admin UIs via kubectl port-forward from the management node:
# Master UI — cluster topology, volume servers, free capacity, leader election
kubectl port-forward -n seaweedfs svc/seaweedfs 9333:9333 &
xdg-open http://localhost:9333
# Filer UI — browse the filesystem layer (objects under /buckets/<bucket>/...)
kubectl port-forward -n seaweedfs svc/seaweedfs 8888:8888 &
xdg-open http://localhost:8888For an S3-style admin experience, port-forward 8333 and use mc:
kubectl port-forward -n seaweedfs svc/seaweedfs 8333:8333 &
mc alias set seaweed-admin http://localhost:8333 <admin-key> <admin-secret>
mc ls seaweed-admin/ # list buckets
mc ls --recursive seaweed-admin/ingest-bucket/staged/ # list sessions
mc admin info seaweed-admin # cluster infoUpdate xnat-ingest image:
# Edge cluster — restart pods to pull latest image
kubectl --kubeconfig kubeconfig-<name> rollout restart deployment/xnat-ingest-sort -n xnat-ingest
kubectl --kubeconfig kubeconfig-<name> rollout restart deployment/s3-uploader -n xnat-ingest
# Management cluster — restart XNAT upload pod
kubectl rollout restart deployment/xnat-ingest-upload -n xnat-uploadUpdate SeaweedFS:
kubectl rollout restart deployment/seaweedfs -n seaweedfsUpdate k0s on edge workers: k0s supports in-place upgrades via Autopilot. For manual upgrade:
ssh ubuntu@<EDGE_IP>
sudo k0s stop
curl -sSLf https://get.k0s.sh | sudo sh # installs latest
sudo k0s startUpdate k0smotron:
kubectl apply --server-side=true -f https://docs.k0smotron.io/stable/install.yamlWhat to back up:
config/management.envandconfig/edge-nodes.env— your configuration- SeaweedFS data (
/data/seaweedfs/on management node) — staged files in transit - XNAT — your actual data destination (backed up separately)
What does NOT need backup:
- Edge worker data (
/data/xnat-ingest/) — transient staging area - k0s/k0smotron state — can be rebuilt from this repo
- Generated files (
kubeconfig-*,join-token-*) — regenerated on install
Restoring from scratch:
- Provision fresh VMs
- Clone this repo, copy your saved config files
- Run
./install.sh
- Self-signed CA (no public trust chain) —
ais-edge-cais local to this deployment. Anything that doesn't load the CA bundle (browsers, third-party tools) will see certificate-untrusted warnings. For a publicly-trusted chain, plug in a real CA (e.g. Let's Encrypt via cert-manager's HTTP-01 / DNS-01 ACME issuer). - mTLS not implemented — edges authenticate to SeaweedFS via S3 access keys, not client certificates. The wire is encrypted; identity is via key. Add a mutual-TLS layer for stronger edge identity.
- No monitoring/alerting — SeaweedFS disk usage, pod health, and upload failures are not automatically monitored. Add Prometheus + Grafana for production (SeaweedFS exposes Prometheus metrics on the master and filer).
- Single management node — no HA for k0smotron, nginx-ingress, or SeaweedFS. For production, split SeaweedFS into separate master/volume/filer/s3 deployments with 3 masters, run multiple ingress replicas (drop hostNetwork, use a real load balancer or VRRP), and run a multi-replica k0smotron control plane per cluster; see "Scaling SeaweedFS" below.
- DICOM files with missing AccessionNumber go to
__invalid__/— requires manual rename. This is an xnat-ingest limitation, not a system issue. Real clinical DICOMs will have this field populated. - emptyDir persistence for hosted control planes — etcd data is lost if the management node restarts. For production, use a proper StorageClass with persistent volumes.
- No automatic cleanup of SeaweedFS — successfully uploaded sessions remain in SeaweedFS until manually deleted. Add an S3 lifecycle rule or a cleanup job for production.
- First-time install needs internet on the edge VM. The edge worker pulls the
k0s binary from
get.k0s.shand container images fromquay.io,docker.io, andghcr.io(k0s, konnectivity-agent, haproxy, xnat-ingest, minio/mc). Once installed, the edge needs only the single 443 connection back to management — no further registry access. For air-gapped sites, pre-stage the k0s binary at/usr/local/bin/k0sand drop ak0s airgap-style image bundle into/var/lib/k0s/images/on the edge VM before running the installer (k0s auto-imports on start). A "build airgap bundle" helper script is not yet included. - Konnectivity is HTTP/2 + gRPC over TLS. Stateful firewalls or IDS appliances that aggressively normalise TLS or block long-lived HTTP/2 streams can disrupt the reverse tunnel — see the "Konnectivity and middleboxes" section below before enabling the deployment behind a deep-inspection proxy.
The single-pod all-in-one deployment is for the MVP. For production scale-out, split the SeaweedFS components into separate Deployments/StatefulSets:
| Component | What it does | HA recommendation |
|---|---|---|
| Master | Cluster metadata, leader election | 3 replicas (Raft consensus) |
| Volume server | Stores chunked data | N replicas across nodes; each backed by its own disk |
| Filer | Filesystem layer (required for S3) | 2+ replicas; backed by an external metadata store (Redis/ScyllaDB/Postgres) |
| S3 gateway | S3 API endpoint | 2+ replicas behind a Service / load balancer |
Edge clients (mc mirror) don't change — they still talk to the S3 endpoint. The
internal architecture changes; the external API does not.
Before ingesting data, ensure:
- XNAT project exists — create it in the XNAT web UI before uploading.
The project ID must match
PROJECT_IDinconfig/edge-nodes.env. - XNAT user is a local account — not AAF/OIDC. Create via Administer → Users.
- XNAT user has project permissions — at least Member or Collaborator on the target project.
xnat-ingest authenticates via POST /data/JSESSION with username/password and uses the
session token for all subsequent REST API calls.
# Management cluster
kubectl get pods -A
# Specific edge cluster
kubectl --kubeconfig kubeconfig-edge-uqcai get pods -n xnat-ingest
kubectl --kubeconfig kubeconfig-edge-usyd get nodes
# Logs
kubectl --kubeconfig kubeconfig-edge-uqcai logs -n xnat-ingest -l component=sort -f
kubectl --kubeconfig kubeconfig-edge-uqcai logs -n xnat-ingest -l component=s3-uploader -f
kubectl logs -n xnat-upload -l component=upload -f # management upload to XNAT
# SeaweedFS admin UIs (ClusterIP only — port-forward from mgmt)
kubectl port-forward -n seaweedfs svc/seaweedfs 9333:9333 & # master
kubectl port-forward -n seaweedfs svc/seaweedfs 8888:8888 & # filer# C-STORE a DICOM to Orthanc at the edge. The Called-AET must be listed in
# config/orthanc/routing.json on the edge — that's how the deid hook knows
# which recipe + XNAT project to route to.
storescu -aec <AET-from-routing.json> -aet TEST_MOD <EDGE_IP> 4242 test.dcm
# Watch the Orthanc Lua deid + label events
kubectl --kubeconfig kubeconfig-edge-dev logs -n xnat-ingest deploy/orthanc -f \
| grep -E 'instance_deidentified|study_labeled_ready|REJECT|ERROR'
# Watch sort pod REST-pull from Orthanc and hardlink into staging
kubectl --kubeconfig kubeconfig-edge-dev logs -n xnat-ingest -l component=sort -f
# Watch s3-uploader push to SeaweedFS
kubectl --kubeconfig kubeconfig-edge-dev logs -n xnat-ingest -l component=s3-uploader -f
# Watch upload to XNAT
kubectl logs -n xnat-upload -l component=upload -f| Scenario | What happens | Recovery |
|---|---|---|
| Network drops mid-upload | SeaweedFS S3 multipart — completed chunks saved | mc retries on next loop cycle |
| Edge VM crashes | Files safe in /data/staging/ | k0s auto-starts, pods resume |
| SeaweedFS crashes | Edge uploads fail, files safe on edge | Pod auto-restarts, edge retries |
| Management node crashes | Edge files accumulate locally | Management restarts, edge reconnects |
| XNAT is down | SeaweedFS fills up | XNAT returns, upload pod clears backlog |
| SeaweedFS disk full | Edge uploads fail, files safe on edge | Expand /data/seaweedfs/ or clear XNAT backlog |
The overview above shows the major components and the single 443 outbound path. This section drills into the host-level state, every namespace, the trust relationships between certificates, and how in-cluster Service traffic actually reaches the API server. Useful when debugging or reviewing the design.
══════════════════════════════════════════════════════════════════════════════════════
EDGE VM (203.101.230.171) ZERO inbound • outbound only TCP :443
══════════════════════════════════════════════════════════════════════════════════════
Host state
/etc/hosts 203.101.224.240 seaweedfs.aisedge.local
k0s.aisedge.local
konnect.aisedge.local (added by 06)
/etc/haproxy/certs/
server.pem cert+key, signed by the cluster's internal k0s CA
(so workload pods trust haproxy via the projected
serviceaccount ca.crt — without this every pod that
hits the kubernetes Service gets "unknown authority")
ca.crt the same cluster CA — haproxy uses it to verify the
upstream API
k0sworker.service systemd unit; kubelet talks to https://k0s.aisedge.local:443
(URL rewritten inside the join-token by 05-setup-edge..)
┌─ default ns ──────────────────────────────────────────────────────────────────┐
│ k0smotron-haproxy DaemonSet, hostNetwork:true │
│ frontend bind [::]:7443 ssl crt /etc/haproxy/certs/server.pem │
│ backend k0s.aisedge.local:443 ssl verify required sni=k0s.aisedge.local │
│ * EndpointSlice for the kubernetes Service points at <edge-IP>:7443 │
│ so any pod calling 10.96.0.1:443 → kube-proxy NAT → local haproxy → mgmt │
└───────────────────────────────────────────────────────────────────────────────┘
┌─ kube-system ns ──────────────────────────────────────────────────────────────┐
│ coredns Corefile has hosts { … aisedge.local … fallthrough } │
│ konnectivity-agent --proxy-server-host=konnect.aisedge.local --port=443 │
│ kube-proxy / kube-router / metrics-server │
└───────────────────────────────────────────────────────────────────────────────┘
┌─ xnat-ingest ns ──────────────────────────────────────────────────────────────┐
│ orthanc DICOM SCP :4242 (hostPort), Lua deid + label, │
│ storage at /data/orthanc-storage │
│ env AIS_DEID_HMAC_SALT (Secret) │
│ mounts ConfigMaps orthanc-config/-scripts/-routing/-recipes │
│ xnat-ingest-sort loop 60s, REST-pull from orthanc.xnat-ingest.svc:8042 │
│ hardlinks /data/orthanc-storage → /data/staging │
│ s3-uploader loop 30s, runs: mc mirror /data/staging edge/bucket │
│ env S3_ENDPOINT=https://seaweedfs.aisedge.local │
│ mount Secret ca-bundle (= ais-edge-ca.crt) → /root/.mc/certs/CAs/ │
│ hostAliases 3 aisedge.local names → MGMT_NODE_IP │
│ hostPath /data/xnat-ingest/{orthanc-storage,staging} │
│ hostPath /data/facility-backup (Orthanc-only, original DICOMs) │
│ Secret s3-edge-credentials (write+list scoped to ingest-bucket) │
└───────────────────────────────────────────────────────────────────────────────┘
│
│ ALL outbound traffic: TCP 443 (TLS, SNI-routed)
│ firewall rule: ALLOW edge → MGMT_IP dst-port 443
▼
══════════════════════════════════════════════════════════════════════════════════════
MGMT NODE (203.101.224.240) k0s controller+worker (single-node)
══════════════════════════════════════════════════════════════════════════════════════
Host state
/etc/hosts same 3 aisedge.local entries (added by 05-setup-edge..)
*:443 owned by ingress-nginx-controller pod (hostNetwork:true)
┌─ ingress-nginx ns ────────────────────────────────────────────────────────────┐
│ ingress-nginx-controller helm-managed; --enable-ssl-passthrough │
│ proxy-body-size=50g proxy-read-timeout=3600 proxy-send-timeout=3600 │
│ ┌─ SNI router (port 443) ─────────────────────────────────────────────┐ │
│ │ seaweedfs.aisedge.local → svc/seaweedfs:8333 (TLS terminate) │ │
│ │ k0s.aisedge.local → kmc-edge-dev-nodeport:30443 (passthrough) │ │
│ │ konnect.aisedge.local → kmc-edge-dev-nodeport:30132 (passthrough) │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────────────────────┘
┌─ cert-manager ns ─────────────────────────────────────────────────────────────┐
│ ClusterIssuer selfsigned-bootstrap │
│ Certificate ais-edge-ca isCA, RSA 4096, 10 yr │
│ ClusterIssuer ais-edge-ca-issuer ─── signs server certs ───► │
│ • seaweedfs-tls 1 yr, auto-renew -30d, SANs = seaweedfs..local +MGMT_IP │
│ Export ais-edge-ca.crt → REPO_DIR (distributed to edges as Secret) │
└───────────────────────────────────────────────────────────────────────────────┘
┌─ k0smotron ns + edge-dev ns (per-edge cluster) ───────────────────────────────┐
│ k0smotron-controller-manager operator │
│ kmc-edge-dev-0 k0s API server pod │
│ spec.k0sConfig.spec.api.sans = [k0s.aisedge.local, konnect.., MGMT_IP] │
│ cert issued by k0smotron-managed cluster CA (Secret edge-dev-ca) │
│ kmc-edge-dev-etcd-0 etcd │
│ svc/kmc-edge-dev-nodeport NodePort 30443/30132 (in-cluster bridge only) │
│ Ingress kmc-edge-dev auto-created from spec.ingress on the Cluster CR │
│ ssl-passthrough on hosts k0s.aisedge.local + konnect.aisedge.local │
└───────────────────────────────────────────────────────────────────────────────┘
┌─ seaweedfs ns ────────────────────────────────────────────────────────────────┐
│ seaweedfs Deployment, all-in-one (master+volume+filer+s3) chrislusf:3.99 │
│ svc/seaweedfs ClusterIP only — no external port (admin via port-forward) │
│ ConfigMap s3-config admin + per-edge IAM identities (config-hash rolls) │
│ hostPath /data/seaweedfs Haystack volumes + filer leveldb │
└───────────────────────────────────────────────────────────────────────────────┘
┌─ xnat-upload ns ──────────────────────────────────────────────────────────────┐
│ xnat-ingest-upload polls s3://ingest-bucket/staged/, pushes to XNAT │
│ env S3_ENDPOINT=http://seaweedfs.seaweedfs.svc.cluster.local:8333 │
│ (in-cluster path — never leaves the management node) │
│ Secrets xnat-credentials, s3-credentials │
└───────────────────────────────────────────────────────────────────────────────┘
│ HTTPS to XNAT's public-CA-signed endpoint
▼
┌────────────────────────────────────────────────────────────────────────────────┐
│ XNAT SERVER (xnat-test.ssdsorg.cloud.edu.au — separate k3s cluster) │
│ Receives sessions via REST API; out of scope for this repo │
└────────────────────────────────────────────────────────────────────────────────┘
ais-edge-ca (10 yr root) ──signs──► seaweedfs-tls (presented by nginx)
▲
└─ trusted by edge mc via mounted
ca-bundle Secret (= ais-edge-ca.crt)
k0smotron cluster CA ──signs──► k0s API server cert (kmc-edge-dev-0)
(Secret edge-dev-ca) ▲
└─ trusted by edge kubelet via the
CA embedded in the join-token
cluster CA (same as above)──signs──► /etc/haproxy/certs/server.pem
▲
└─ trusted by every workload pod via
its projected serviceaccount ca.crt
pod (in child cluster)
│ GET kubernetes.default.svc.cluster.local
│ → resolves to ClusterIP 10.96.0.1:443
▼
kube-proxy iptables NAT
│ destination rewritten to <edge-IP>:7443 (per EndpointSlice)
▼
k0smotron-haproxy DS pod (hostNetwork on the same worker)
│ TLS terminate using server.pem (signed by cluster CA → pod trusts it)
│ open NEW outbound TLS conn:
▼
nginx-ingress on MGMT_IP:443 (SNI = k0s.aisedge.local → ssl-passthrough)
│
▼
kmc-edge-dev-nodeport:30443 → kmc-edge-dev-0 (k0s API)
Q: Can I run k0smotron on my existing k3s/kubeadm cluster?
Yes. Set INSTALL_MODE="existing" in config/management.env. k0smotron is just a Kubernetes
operator — it runs on any conformant cluster with cert manager. Edge workers still use k0s.
Q: Does k0s run on Windows? Not natively. Options: WSL2, Hyper-V VM, or Docker Desktop.
Q: What credentials are stored on the edge? Only a scoped SeaweedFS S3 key. It can only PUT/LIST on one bucket. It cannot read other sites' data, access XNAT, or do anything else. XNAT credentials never leave the management node.
Q: How does the edge communicate without inbound ports? All connections are outbound from the edge to the management node on a single port:
- TLS port 443 — multiplexed by SNI:
k0s.aisedge.local→ k0s API server (kubelet → API)konnect.aisedge.local→ konnectivity tunnel (API → kubelet via reverse tunnel)seaweedfs.aisedge.local→ SeaweedFS S3 (mc mirror data uploads) The management node sends commands back through the konnectivity tunnel (edge-initiated).
Q: What is konnectivity? A reverse tunnel built into Kubernetes. The edge opens an outbound connection to the management node and keeps it open. kubectl commands flow back through this same connection. No inbound ports needed on the edge.
Q: What happens if the SeaweedFS edge key is stolen? An attacker can only write junk files to the ingest bucket. They cannot read other sites' data, cannot access XNAT, and cannot access patient information. The key is easily rotated.
Q: How do I rotate the SeaweedFS edge credentials?
- Generate new credentials in
config/edge-nodes.envfor that edge entry. - Re-run
scripts/03-deploy-seaweedfs.sh— it regeneratess3.jsonfrom env and rolls the SeaweedFS pod via the config-hash annotation. Old credentials become invalid. - Re-run
scripts/07-deploy-edge-ingest.sh <entry>for the affected edge — the K8s Secret on the edge cluster is updated and the s3-uploader pod restarts.
Q: What is a "child cluster" vs "management cluster"? The management cluster runs k0smotron and hosts control planes for edge sites. Each edge site has a "child cluster" — its own Kubernetes cluster whose control plane runs as pods on the management node, but whose workers are at the edge site. They have separate kubeconfigs, namespaces, and RBAC.
Q: Can one edge site have multiple workers?
Yes. Give multiple machines the same join token and they all join the same child cluster.
Pin specific pods to specific workers using nodeSelector in the manifest.
k0s worker not joining:
- Check token:
sudo cat /etc/k0s/join-token | head -c 50(should not be empty) - Verify the embedded URL:
cat /etc/k0s/join-token | base64 -d | gunzip | grep server(should showhttps://k0s.aisedge.local:443) - Check
/etc/hostshas the aisedge.local entries:grep aisedge /etc/hosts - Check connectivity:
curl --cacert /tmp/ais-edge-ca.crt https://k0s.aisedge.local/version(TLS error = cert/CA mismatch; refused = nginx-ingress / network) - Check logs:
sudo journalctl -u k0sworker --no-pager -n 30 - Note:
k0s statusdoes NOT work on workers. Usesystemctl is-active k0sworker.
konnectivity-agent in CrashLoop / "lookup konnect.aisedge.local: no such host":
- The child cluster's CoreDNS does not have the aisedge.local hosts entry.
Re-run
06-join-edge-worker.sh(idempotent) or apply the Corefile manually. - Verify:
KUBECONFIG=kubeconfig-<edge> kubectl get cm coredns -n kube-system -o jsonpath='{.data.Corefile}' | grep aisedge
s3-uploader: "x509: certificate signed by unknown authority":
- The
ca-bundleSecret is missing or empty on the edge cluster. Re-run07-deploy-edge-ingest.sh <edge-entry>— it pushesais-edge-ca.crtto the edge cluster'sxnat-ingest/ca-bundleSecret and rolls the s3-uploader. - Verify:
KUBECONFIG=kubeconfig-<edge> kubectl get secret -n xnat-ingest ca-bundle -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -noout -subject
Pods stuck in Pending:
- Check events:
kubectl describe pod <name> -n <namespace> - Common cause: no StorageClass (management cluster needs local-path-provisioner)
- Edge pods use hostPath, not PVC — check directory exists on worker
xnat-ingest sort puts files in invalid:
- The DICOM file is missing required metadata (usually AccessionNumber)
- This is normal for sample files. Rename and move manually for testing.
- With real clinical DICOMs, this won't happen.
Upload pod can't reach SeaweedFS:
- The mgmt upload pod uses in-cluster DNS — TLS/Ingress not involved.
Test:
kubectl exec -n xnat-upload deploy/xnat-ingest-upload -- curl -s http://seaweedfs.seaweedfs.svc.cluster.local:8333/ - Check SeaweedFS pod:
kubectl logs -n seaweedfs -l app=seaweedfs
Upload pod can't reach XNAT:
- Test:
curl -sk <XNAT_URL> - Check XNAT credentials in management cluster secret
- XNAT project must exist before upload (create in XNAT web UI)
Server cert about to expire (or compromised CA):
- Server certs auto-renew via cert-manager (1-year duration, 30-day renewBefore).
- Force renewal:
kubectl delete secret seaweedfs-tls -n seaweedfsand cert-manager re-issues from the CA Issuer. - Full CA rotation:
scripts/rotate-ca.sh --phase=1then (after 14-30 days)--phase=2.
This setup uses self-hosted SeaweedFS by default, but you can swap it for AWS S3 (or any
S3-compatible service like Google Cloud Storage, Backblaze B2, MinIO, Garage, Ceph RGW)
with minimal changes — mc and boto3 speak vanilla S3.
| Component | SeaweedFS (default) | AWS S3 |
|---|---|---|
| Storage server | SeaweedFS pod on management node | AWS managed service |
| Management manifests | manifests/01-management/seaweedfs.yaml.tpl deployed |
Not deployed — skip step 03 |
| Upload pod S3 endpoint | http://seaweedfs.seaweedfs.svc.cluster.local:8333 |
https://s3.amazonaws.com (default) |
| Edge S3 endpoint | https://seaweedfs.aisedge.local (TLS, ais-edge-ca) |
https://s3.<region>.amazonaws.com (TLS, public CA) |
| Credentials | SeaweedFS s3.json identities | AWS IAM access keys |
1. Create AWS resources:
# Create an S3 bucket
aws s3 mb s3://my-ingest-bucket --region ap-southeast-2
# Create an IAM user for the edge (write-only)
aws iam create-user --user-name edge-writer
aws iam put-user-policy --user-name edge-writer --policy-name write-only --policy-document '{
"Version": "2012-10-17",
"Statement": [
{"Effect":"Allow","Action":["s3:PutObject","s3:DeleteObject"],"Resource":"arn:aws:s3:::my-ingest-bucket/*"},
{"Effect":"Allow","Action":["s3:ListBucket","s3:GetBucketLocation"],"Resource":"arn:aws:s3:::my-ingest-bucket"}
]
}'
aws iam create-access-key --user-name edge-writer
# → note the AccessKeyId and SecretAccessKey
# Create an IAM user for the management upload pod (read + delete)
aws iam create-user --user-name mgmt-reader
aws iam put-user-policy --user-name mgmt-reader --policy-name read-delete --policy-document '{
"Version": "2012-10-17",
"Statement": [
{"Effect":"Allow","Action":["s3:GetObject","s3:DeleteObject","s3:ListBucket","s3:GetBucketLocation"],"Resource":["arn:aws:s3:::my-ingest-bucket","arn:aws:s3:::my-ingest-bucket/*"]}
]
}'
aws iam create-access-key --user-name mgmt-reader2. Update config files:
config/management.env:
export S3_BUCKET="my-ingest-bucket"
# These become the management upload pod's AWS credentials:
export S3_ADMIN_ACCESS_KEY="<mgmt-reader-access-key>"
export S3_ADMIN_SECRET_KEY="<mgmt-reader-secret-key>"config/edge-nodes.env:
EDGE_NODES=(
"edge-uqcai|203.101.230.171|ubuntu|~/.ssh/id_ed25519|uqcai-project|<edge-writer-access-key>|<edge-writer-secret-key>"
)3. Modify manifests:
manifests/01-management/xnat-upload.yaml.tpl — remove the AWS_ENDPOINT_URL env var
(so boto3 defaults to real AWS S3):
# DELETE this line:
# - name: AWS_ENDPOINT_URL
# value: "http://seaweedfs.seaweedfs.svc.cluster.local:8333"manifests/02-edge/xnat-ingest.yaml.tpl — change the s3-uploader endpoint env to AWS S3:
# Change the S3_ENDPOINT value to:
value: "https://s3.ap-southeast-2.amazonaws.com"4. Install — skip step 03 (SeaweedFS):
When running ./install.sh, press s at step 03 to skip SeaweedFS deployment.
Everything else remains the same.
- No SeaweedFS to manage, monitor, or back up
- Automatic redundancy and durability (11 nines)
- Cross-region replication available
- Pay-per-use (no disk provisioning)
- IAM policies are more granular than SeaweedFS's
- Data never leaves your infrastructure (important for patient data pre-de-identification)
- No cloud costs
- No internet dependency between management and storage
- Full control over data residency and compliance
All edge ↔ management traffic flows over a single TLS port (443) multiplexed by SNI. Three components make this work:
1. Self-signed root CA — ais-edge-ca
- Created by cert-manager at install time (script
02b-bootstrap-ca.sh). - 10-year duration, 4096-bit RSA, stored as a Secret in the
cert-managernamespace. - The PUBLIC half is exported to
ais-edge-ca.crt(gitignored, distributed to edges). - The PRIVATE half NEVER leaves the management node.
2. Server certs (per service)
- cert-manager issues 1-year RSA certs signed by
ais-edge-ca. - Auto-renewed 30 days before expiry — no site action required.
- Servers:
seaweedfs.aisedge.local(and any future TLS-fronted service). - The k0smotron-managed k0s API + konnectivity have their own internal CA — those
certs include the aisedge.local hostnames as SANs (configured via
spec.k0sConfig.spec.api.sans).
3. Edge trust
- Each edge cluster gets a Secret
xnat-ingest/ca-bundlecontainingais-edge-ca.crt. - The
s3-uploaderpod mounts it at/root/.mc/certs/CAs/ca.crtsomctrusts our CA. - Edge worker kubelet: standard k0s mTLS — kubelet uses the auto-generated kubeconfig
CA cert (k0smotron's CA, not
ais-edge-ca) for API server verification.
Hostname resolution without DNS:
- Edge VMs get a static
/etc/hostsentry:<MGMT_IP> seaweedfs.aisedge.local k0s.aisedge.local konnect.aisedge.local(added by script06-join-edge-worker.sh). - Pods on the edge cluster get
hostAliases(in the manifest) for the same hostnames. - The child cluster's CoreDNS gets a
hostsplugin entry so the konnectivity-agent (which uses cluster DNS, not host /etc/hosts) can also resolve them.
Trust chain at handshake time (e.g. mc upload from edge to SeaweedFS):
edge mc client
├─ resolves seaweedfs.aisedge.local → MGMT_NODE_IP (via /etc/hosts in pod)
├─ opens TCP to MGMT_NODE_IP:443
├─ TLS ClientHello includes SNI=seaweedfs.aisedge.local
├─ mgmt nginx-ingress matches Ingress, terminates TLS using seaweedfs-tls Secret
├─ presents server cert (signed by ais-edge-ca)
├─ mc validates cert against /root/.mc/certs/CAs/ca.crt (= ais-edge-ca.crt)
└─ chain verifies → S3 PUT proceeds over TLS
CA rotation:
When the CA is approaching expiry (or in a compromise scenario), use scripts/rotate-ca.sh:
# Phase 1: issue NEW CA + push bundle (old + new) to all edges
./scripts/rotate-ca.sh --phase=1
# Wait 14-30 days for renewal cycles to settle.
# During this window: BOTH CAs are trusted on edges. Server certs still
# signed by the OLD CA. Pipeline keeps working.
# Phase 2: switch the Issuer to NEW, re-issue all server certs, drop OLD from bundle
./scripts/rotate-ca.sh --phase=2Use --dry-run first to preview. Test in staging before running in production.
Optional log-aggregation, metrics, dashboarding, and alerting stack:
- Loki stores logs (chunks land in a SeaweedFS
logs-bucket) - Prometheus scrapes per-pod
/metricsand stores time series - Grafana queries both, hosts pre-built dashboards
- Alertmanager routes alerts via email (primary) and optional Slack
- Vector runs as a DaemonSet on every worker (mgmt + edge) and ships
pod stdout to Loki over the same single 443 outbound port — adds two
more SNI routes (
grafana.aisedge.local,loki.aisedge.local) to the existing nginx-ingress; no new firewall rules.
The stack is optional. With ALERT_EMAIL_TO blank in config/management.env
the install script skips it cleanly. Set the email + SMTP vars and re-run
./install.sh (or bash scripts/02d-install-observability.sh and
bash scripts/07b-deploy-edge-observability.sh <edge-entry> directly).
Four dashboards land in Grafana under the AIS Edge folder:
- Pipeline Overview — cross-cluster counters and timeseries for the whole ingest pipeline (DICOMs vs S3 objects vs sessions, failures, invalid sessions, per-edge throughput, recent events log).
- Edge Site Drilldown — single-cluster + per-worker-node view with
dropdowns for
clusterandnode. - Session Timeline — single-session trace across edges and mgmt by session name.
- SeaweedFS Health — storage-layer metrics from Prometheus.
For exactly what every panel measures, including the s3-uploader event
schema and the difference between the dicoms and files fields, see
docs/dashboards.md. For the architectural reason
edge-side alerts live in Loki ruler instead of mgmt Prometheus, see
docs/alerting-architecture.md. For
per-component detail (what each one stores, what it has access to, what
the failure modes are, how to scale or replace it), see
docs/components/.
Konnectivity is the reverse tunnel that lets the management API server reach
back into worker components (kubectl exec, kubectl logs, kubectl port-forward, metrics scraping). On the edge it runs as
konnectivity-agent, which dials out to https://konnect.aisedge.local:443
and keeps a long-lived HTTP/2 + gRPC over TLS connection open. The
management nginx-ingress forwards the raw TLS bytes through to the
konnectivity-server inside the hosted control plane (SSL passthrough — nginx
never decrypts).
This protocol works through every standards-compliant firewall, but a few classes of network appliance can disrupt it. Worth flagging for site IT:
- Deep-packet-inspection / TLS-intercepting proxies. Devices that
terminate TLS to scan traffic break the tunnel. Konnectivity uses mutual
cert verification; an interception proxy presents its own cert which the
agent will reject (
x509: certificate signed by unknown authority). The fix at the site is either to bypass interception for the management IP or install the proxy's CA into the agent's trust store, but the cleanest answer is bypass. - Aggressive HTTP/2 stream timeouts. Some firewalls and L7 load
balancers drop HTTP/2 streams that are idle for a few minutes. The
konnectivity tunnel uses long-lived streams (often >1h) for keepalive
and watch traffic. If the appliance kills the stream, kubelet briefly
disconnects from the control plane and the agent reconnects — usually
invisible, but
kubectl execandkubectl logsmay stall during the reconnect. Configure the appliance with idle timeout ≥ 60 minutes on outbound 443 to the management IP. - gRPC-aware filtering / QUIC enforcement. A few enterprise proxies block gRPC on port 443 by default, or rewrite responses to force HTTP/1.1. Konnectivity requires HTTP/2 end to end; HTTP/1.1 downgrade breaks it. Allow plain HTTPS/HTTP-2 to the management IP without protocol rewriting.
- Stateful firewalls with short connection-tracking tables. Same idle- timeout class of issue as above. A flow that sees no packets for ~5 min may get evicted from the conntrack table; keepalive on the agent should prevent it but is occasionally too sparse on tightly-tuned appliances.
If kubectl logs or kubectl exec against a child cluster suddenly stops
working ("No agent available" in the API server logs), check the
konnectivity-agent pod: KUBECONFIG=kubeconfig-<edge> kubectl get pods -n kube-system -l k8s-app=konnectivity-agent. Restarts on the agent are the
classic symptom of a middlebox kicking the tunnel.
The data path (DICOM upload via mc mirror to SeaweedFS) does NOT use
konnectivity — it is plain HTTPS REST and tolerates short connection drops
naturally. Konnectivity disruption affects only central-admin visibility, not
data integrity.
./scripts/uninstall.shThis removes everything: edge workers, SeaweedFS data, hosted clusters, k0smotron, and optionally k0s itself (if installed fresh). Resources removed include:
- ingress-nginx (helm release + namespace)
- ais-edge-ca Issuer + Secret + exported
ais-edge-ca.crt - /etc/hosts entries on management and edge VMs
- /etc/haproxy/certs/ on each edge worker
A single TLS port carries all edge ↔ management traffic. SNI on the nginx-ingress controller routes to the right backend.
| From | To | Port | Purpose | Encrypted? |
|---|---|---|---|---|
| Edge | Management | 443 | All edge traffic, SNI-routed: seaweedfs.aisedge.local, k0s.aisedge.local, konnect.aisedge.local |
TLS — server cert signed by ais-edge-ca |
| Management | XNAT | 443 | XNAT REST API uploads | HTTPS |
| Management | Edge | 22 | SSH (initial setup only) | SSH |
All edge traffic is outbound only (zero inbound on edge VMs).
Site IT firewall rule: ALLOW outbound TCP from edge IP to management IP, dst-port 443.
Admin-only endpoints (SeaweedFS master/filer UIs, S3 admin) are now ClusterIP-only on the
management cluster — reach them via kubectl port-forward. No external port required.