You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
description: `The production layer is everything you need to run the platform as a real service. Metering and billing so you can charge tenants. Observability so you can debug problems. Backup so you can recover from failures. TLS automation so certificates don't expire at 3 AM. This layer is what separates a demo from a product.`,
16
+
projects: [
17
+
{
18
+
name: 'OpenMeter',
19
+
url: 'https://openmeter.io',
20
+
badge: 'Open Source',
21
+
role: 'Usage metering & billing',
22
+
why: 'OpenMeter ingests usage events (CPU hours, GPU time, storage bytes, network transfer) and aggregates them into billable meters. It handles the hard part of cloud billing: event deduplication, windowed aggregation, and real-time usage reporting. Connect it to Stripe or your billing system for invoicing. Without metering, you can\'t run a sustainable cloud.',
23
+
alternatives: 'Amberflo (SaaS, proprietary), custom metering (complex and error-prone). OpenMeter is open source and purpose-built for usage-based billing.',
24
+
},
25
+
{
26
+
name: 'Prometheus',
27
+
url: 'https://prometheus.io',
28
+
badge: 'CNCF Graduated',
29
+
role: 'Metrics collection',
30
+
why: 'Prometheus scrapes metrics from every component in the stack — kubelet, Kube-OVN, kcp, operators, and application workloads. It\'s the universal metrics standard in the Kubernetes ecosystem. Every project in this stack exposes Prometheus metrics natively, making it the obvious choice for platform-level monitoring.',
31
+
alternatives: 'Datadog (SaaS, expensive at scale), InfluxDB (less ecosystem integration). Prometheus is the CNCF standard and has the deepest Kubernetes integration.',
32
+
},
33
+
{
34
+
name: 'Grafana',
35
+
url: 'https://grafana.com/oss/grafana',
36
+
badge: 'Open Source',
37
+
role: 'Dashboards & visualization',
38
+
why: 'Grafana provides operator dashboards for platform health, capacity planning, and tenant resource usage. It connects to Prometheus, VictoriaMetrics, and logs to give a unified view. Pre-built dashboards exist for every component in the stack.',
39
+
alternatives: 'Kibana (Elastic-oriented), Chronograf (InfluxDB-oriented). Grafana is the most flexible and has the largest dashboard ecosystem.',
40
+
},
41
+
{
42
+
name: 'VictoriaMetrics',
43
+
url: 'https://victoriametrics.com',
44
+
badge: 'Open Source',
45
+
role: 'Long-term metrics storage',
46
+
why: 'Prometheus is great for real-time metrics but struggles with long-term retention at scale. VictoriaMetrics provides a Prometheus-compatible remote write target with superior compression, query performance, and storage efficiency. Keep months of metrics without the storage cost.',
47
+
alternatives: 'Thanos (more complex multi-cluster setup), Cortex (heavier). VictoriaMetrics is the simplest path to scalable long-term storage.',
48
+
},
49
+
{
50
+
name: 'Velero',
51
+
url: 'https://velero.io',
52
+
badge: 'CNCF Incubating',
53
+
role: 'Backup & disaster recovery',
54
+
why: 'Velero backs up Kubernetes resources and persistent volumes to object storage (S3-compatible, which Rook-Ceph provides). Scheduled backups, cross-cluster restores, and migration support. For a cloud platform, this protects both the platform state and tenant data.',
55
+
alternatives: 'Kasten (Veeam, proprietary), custom etcd snapshots (insufficient). Velero is the Kubernetes-native standard for backup.',
56
+
},
57
+
{
58
+
name: 'cert-manager',
59
+
url: 'https://cert-manager.io',
60
+
badge: 'CNCF Incubating',
61
+
role: 'TLS certificate automation',
62
+
why: 'cert-manager automates the issuance and renewal of TLS certificates from Let\'s Encrypt (or internal CAs). Every tenant domain, every API endpoint, every webhook — all get valid TLS without manual intervention. In a cloud platform with hundreds of tenant domains, manual certificate management is impossible.',
63
+
alternatives: 'Manual cert management (doesn\'t scale), Caddy (embedded use only). cert-manager is the standard for Kubernetes TLS automation.',
64
+
},
65
+
],
66
+
},
67
+
{
68
+
id: 'platform',
69
+
number: '02',
70
+
title: 'Platform Layer',
71
+
subtitle: 'Multi-tenant cloud APIs via kcp',
72
+
color: 'cyan',
73
+
gradient: 'from-cyan-600 to-cyan-500',
74
+
description: `The platform layer is what turns infrastructure into a cloud. Tenants interact with high-level APIs — creating VMs, notebooks, storage — without ever seeing the underlying Kubernetes primitives. They get isolated workspaces with their own RBAC, resource quotas, and API surface. The platform controls the abstraction boundary.`,
75
+
projects: [
76
+
{
77
+
name: 'kcp',
78
+
url: 'https://kcp.io',
79
+
badge: 'CNCF Sandbox',
80
+
role: 'Multi-tenant control plane',
81
+
why: 'kcp provides logical clusters (workspaces) on a single Kubernetes API server. Each tenant gets their own isolated workspace with its own set of APIs, RBAC, and resources — without needing a dedicated Kubernetes cluster. This is the core innovation: instead of cluster-per-tenant (expensive) or namespace-per-tenant (weak isolation), kcp gives workspace-per-tenant with full API-level isolation.',
82
+
alternatives: 'vCluster (virtual clusters, more overhead), raw namespaces (insufficient isolation). kcp is purpose-built for multi-tenant cloud APIs.',
83
+
},
84
+
{
85
+
name: 'Reference Architecture',
86
+
url: 'https://github.com/faroshq/neocloud',
87
+
badge: 'Open Source',
88
+
role: 'Cloud operators & APIs',
89
+
why: 'The reference architecture provides the bridge between kcp workspaces and infrastructure — operators that reconcile tenant resources (VMs, storage, networking) into running infrastructure, plus a set of opinionated Cloud APIs (CRDs) that abstract Kubernetes primitives so tenants never see pods, nodes, or PVCs. When a tenant creates a VirtualMachine in their workspace, the operator provisions it via KubeVirt and reports status back. Designed to be forked and adapted to your specific cloud offering.',
90
+
alternatives: 'Build your own from scratch using controller-runtime and CRDs. The reference architecture saves you months of boilerplate and gives you proven patterns to extend.',
91
+
},
92
+
{
93
+
name: 'Zitadel IAM',
94
+
url: 'https://zitadel.com',
95
+
badge: 'Open Source',
96
+
role: 'Identity & access management',
97
+
why: 'Zitadel provides OIDC/OAuth2 authentication, user management, and organization-level multi-tenancy. It handles user registration, login flows, MFA, service accounts, and API keys. Each tenant organization maps to a kcp workspace, giving you unified identity across the platform. It\'s self-hosted, fully open source, and built for exactly this use case.',
98
+
alternatives: 'Keycloak (heavier, Java-based), Auth0 (SaaS, proprietary). Zitadel is lighter, Go-based, and has native multi-tenancy.',
99
+
},
100
+
],
101
+
},
8
102
{
9
103
id: 'infrastructure',
10
104
number: '01',
@@ -80,106 +174,6 @@ const layers = [
80
174
},
81
175
],
82
176
},
83
-
{
84
-
id: 'platform',
85
-
number: '02',
86
-
title: 'Platform Layer',
87
-
subtitle: 'Multi-tenant cloud APIs via kcp',
88
-
color: 'cyan',
89
-
gradient: 'from-cyan-600 to-cyan-500',
90
-
description: `The platform layer is what turns infrastructure into a cloud. Tenants interact with high-level APIs — creating VMs, notebooks, storage — without ever seeing the underlying Kubernetes primitives. They get isolated workspaces with their own RBAC, resource quotas, and API surface. The platform controls the abstraction boundary.`,
91
-
projects: [
92
-
{
93
-
name: 'kcp',
94
-
url: 'https://kcp.io',
95
-
badge: 'CNCF Sandbox',
96
-
role: 'Multi-tenant control plane',
97
-
why: 'kcp provides logical clusters (workspaces) on a single Kubernetes API server. Each tenant gets their own isolated workspace with its own set of APIs, RBAC, and resources — without needing a dedicated Kubernetes cluster. This is the core innovation: instead of cluster-per-tenant (expensive) or namespace-per-tenant (weak isolation), kcp gives workspace-per-tenant with full API-level isolation.',
98
-
alternatives: 'vCluster (virtual clusters, more overhead), raw namespaces (insufficient isolation). kcp is purpose-built for multi-tenant cloud APIs.',
99
-
},
100
-
{
101
-
name: 'Cloud Operator',
102
-
url: 'https://github.com/faroshq/neocloud',
103
-
badge: 'Open Source',
104
-
role: 'Reconciliation engine',
105
-
why: 'The Cloud Operator is the bridge between kcp workspaces and infrastructure. When a tenant creates a VirtualMachine resource in their workspace, the Cloud Operator watches that resource, provisions the actual VM via KubeVirt on the infrastructure cluster, and reports status back. It\'s the controller pattern applied to cloud services — declarative resources in, running infrastructure out.',
106
-
alternatives: 'This is custom to Sovereign Stack. You could build your own using controller-runtime, but this gives you the patterns and CRDs to start.',
107
-
},
108
-
{
109
-
name: 'Zitadel IAM',
110
-
url: 'https://zitadel.com',
111
-
badge: 'Open Source',
112
-
role: 'Identity & access management',
113
-
why: 'Zitadel provides OIDC/OAuth2 authentication, user management, and organization-level multi-tenancy. It handles user registration, login flows, MFA, service accounts, and API keys. Each tenant organization maps to a kcp workspace, giving you unified identity across the platform. It\'s self-hosted, fully open source, and built for exactly this use case.',
114
-
alternatives: 'Keycloak (heavier, Java-based), Auth0 (SaaS, proprietary). Zitadel is lighter, Go-based, and has native multi-tenancy.',
115
-
},
116
-
{
117
-
name: 'Cloud APIs',
118
-
role: 'Custom-built APIs for your cloud',
119
-
why: 'The platform ships with a set of reference APIs — VMs, compute, notebooks, storage, networking — that give you a starting point, not a final answer. These are opinionated abstractions over Kubernetes primitives so tenants never see pods, nodes, or PVCs. They\'re designed to be forked and adapted to your specific cloud offering. Contributions are welcome — this is not set in stone.',
120
-
alternatives: 'Build your own from scratch using controller-runtime and CRDs. The reference APIs save you months of boilerplate and give you proven patterns to extend.',
description: `The production layer is everything you need to run the platform as a real service. Metering and billing so you can charge tenants. Observability so you can debug problems. Backup so you can recover from failures. TLS automation so certificates don't expire at 3 AM. This layer is what separates a demo from a product.`,
132
-
projects: [
133
-
{
134
-
name: 'OpenMeter',
135
-
url: 'https://openmeter.io',
136
-
badge: 'Open Source',
137
-
role: 'Usage metering & billing',
138
-
why: 'OpenMeter ingests usage events (CPU hours, GPU time, storage bytes, network transfer) and aggregates them into billable meters. It handles the hard part of cloud billing: event deduplication, windowed aggregation, and real-time usage reporting. Connect it to Stripe or your billing system for invoicing. Without metering, you can\'t run a sustainable cloud.',
139
-
alternatives: 'Amberflo (SaaS, proprietary), custom metering (complex and error-prone). OpenMeter is open source and purpose-built for usage-based billing.',
140
-
},
141
-
{
142
-
name: 'Prometheus',
143
-
url: 'https://prometheus.io',
144
-
badge: 'CNCF Graduated',
145
-
role: 'Metrics collection',
146
-
why: 'Prometheus scrapes metrics from every component in the stack — kubelet, Kube-OVN, kcp, operators, and application workloads. It\'s the universal metrics standard in the Kubernetes ecosystem. Every project in this stack exposes Prometheus metrics natively, making it the obvious choice for platform-level monitoring.',
147
-
alternatives: 'Datadog (SaaS, expensive at scale), InfluxDB (less ecosystem integration). Prometheus is the CNCF standard and has the deepest Kubernetes integration.',
148
-
},
149
-
{
150
-
name: 'Grafana',
151
-
url: 'https://grafana.com/oss/grafana',
152
-
badge: 'Open Source',
153
-
role: 'Dashboards & visualization',
154
-
why: 'Grafana provides operator dashboards for platform health, capacity planning, and tenant resource usage. It connects to Prometheus, VictoriaMetrics, and logs to give a unified view. Pre-built dashboards exist for every component in the stack.',
155
-
alternatives: 'Kibana (Elastic-oriented), Chronograf (InfluxDB-oriented). Grafana is the most flexible and has the largest dashboard ecosystem.',
156
-
},
157
-
{
158
-
name: 'VictoriaMetrics',
159
-
url: 'https://victoriametrics.com',
160
-
badge: 'Open Source',
161
-
role: 'Long-term metrics storage',
162
-
why: 'Prometheus is great for real-time metrics but struggles with long-term retention at scale. VictoriaMetrics provides a Prometheus-compatible remote write target with superior compression, query performance, and storage efficiency. Keep months of metrics without the storage cost.',
163
-
alternatives: 'Thanos (more complex multi-cluster setup), Cortex (heavier). VictoriaMetrics is the simplest path to scalable long-term storage.',
164
-
},
165
-
{
166
-
name: 'Velero',
167
-
url: 'https://velero.io',
168
-
badge: 'CNCF Incubating',
169
-
role: 'Backup & disaster recovery',
170
-
why: 'Velero backs up Kubernetes resources and persistent volumes to object storage (S3-compatible, which Rook-Ceph provides). Scheduled backups, cross-cluster restores, and migration support. For a cloud platform, this protects both the platform state and tenant data.',
171
-
alternatives: 'Kasten (Veeam, proprietary), custom etcd snapshots (insufficient). Velero is the Kubernetes-native standard for backup.',
172
-
},
173
-
{
174
-
name: 'cert-manager',
175
-
url: 'https://cert-manager.io',
176
-
badge: 'CNCF Incubating',
177
-
role: 'TLS certificate automation',
178
-
why: 'cert-manager automates the issuance and renewal of TLS certificates from Let\'s Encrypt (or internal CAs). Every tenant domain, every API endpoint, every webhook — all get valid TLS without manual intervention. In a cloud platform with hundreds of tenant domains, manual certificate management is impossible.',
179
-
alternatives: 'Manual cert management (doesn\'t scale), Caddy (embedded use only). cert-manager is the standard for Kubernetes TLS automation.',
0 commit comments