Add AWS production architecture planning doc by amalbet · Pull Request #70 · energy-iot/docker-openems

amalbet · 2026-04-16T17:29:34Z

Summary

Planning document for the target AWS production deployment architecture. Covers OpenEMS Backend + MBE on AWS with OpenVPN for Raspberry Pi edge devices at microgrid sites.

This is a draft for team review — we're iterating on the architecture over the next few days before committing to specific tickets. Please leave line-by-line comments on anything that needs refinement.

What's covered

Current state: existing Terraform IaC, CI/CD, gaps to address
Target architecture: VPC layout, OpenVPN server (EC2), ECS Fargate for Backend + MBE
OpenVPN details: PKI structure, VPN subnet, certificate per edge site
Two edge types: simulation edges (Docker on ECS, testing) vs production edges (Raspberry Pi, OpenVPN tunnel)
Security model: network segmentation table, credential management, TLS coverage
Environments: dev / stage / prod isolation, promotion flow, Terraform workspaces
Implementation phases
8 open questions for team input
Cost estimate (~$280/mo per environment)

Open questions (need team input)

ECS Fargate vs EC2 for the backend stack (Fargate can't run OpenVPN, no persistent storage for InfluxDB)
InfluxDB persistence strategy (EFS vs EC2 vs Timestream)
Domain name strategy
Pi provisioning process (manual vs SD card image)
Monitoring (CloudWatch vs Grafana)
Cost optimization (Reserved Instances?)
OpenVPN HA (single point of failure acceptable for MVP?)
MBE deployment target (ECS vs keep on Vercel with API Gateway)

Test plan

Doc renders correctly on GitHub (tables, code blocks, ASCII diagrams)
Team review — collect comments and iterate
Once aligned, create implementation tickets (or refine existing ones: Deploy OpenEMS stack to AWS EC2 with VPC networking #67, Add TLS reverse proxy for OpenEMS B2B REST endpoint #68, Add WireGuard VPN for edge device connectivity #69)

🤖 Generated with Claude Code

Documents the target deployment architecture for OpenEMS + MBE on AWS with OpenVPN for Raspberry Pi edge devices. Covers VPC layout, security model, environment strategy (dev/stage/prod), data flow, cost estimates, and open questions for team iteration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Alejandro Malbet <amalbet@gmail.com>

Guidelines for the IT admin to provision a dedicated dev AWS account isolated from production, with safe scoping for Claude Code agent sessions. Covers account structure, IAM strategy, Service Control Policies, cost controls, and a provisioning checklist. Complements the production architecture doc by establishing the foundation needed before we can start provisioning dev infrastructure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Alejandro Malbet <amalbet@gmail.com>

Split out to #72 so each doc can be reviewed independently. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Alejandro Malbet <amalbet@gmail.com>

tushabe

Thanks for documenting this, it gives me some visibility on decisions made here and why they were made. I especially appreciate the cost estimate of running the entire stack because it helps facilitate conversations around business feasibility post Pilot.

amalbet · 2026-04-21T01:59:38Z

Discussion: Do we need MBE in the same VPC as OpenEMS?

When we drafted this architecture, we defaulted to putting MBE (billing engine) in the same VPC as OpenEMS so the B2B REST communication stays on a private network. On reflection, we missed an important tradeoff discussion — we should consider whether MBE can stay on Vercel/Supabase (where it already works) and connect to OpenEMS over a secured public endpoint instead.

Why this matters

MBE is already deployed and working on Vercel + cloud Supabase. Moving it to ECS means:

Migrating off Vercel (new deployment pipeline, ALB, health checks, log management)
Either self-hosting Supabase on AWS (significant complexity — we've seen this firsthand) or continuing to use cloud Supabase from ECS (which means we're moving to AWS only for the OpenEMS connection)
Ongoing ops burden that Vercel currently handles for us (auto-scaling, TLS, CDN, zero-downtime deploys)
Losing the free tier — both Vercel and Supabase free tiers are sufficient for MBE's current and near-term usage. Moving to ECS adds ~$40-60/mo in AWS costs for infrastructure that Vercel provides at $0.

The only reason to move MBE into the VPC is to keep the OpenEMS B2B endpoint private. So the real question is: can we secure that connection without migrating MBE?

How MBE uses the OpenEMS APIs

MBE connects to the OpenEMS Backend B2B REST endpoint (port 8082, /jsonrpc) from Next.js API routes — server-side only. The browser never talks to OpenEMS directly. The calls are:

getEdgesStatus — check if edges are online/offline
getEdgeConfig — discover meters on an edge
queryHistoricTimeseriesEnergy — pull energy readings for billing
getEdgesChannelsValues — read current channel values

The data flowing through is energy consumption readings — not financial data, not PII, not credentials.

Three options

Option 1: MBE on Vercel + API Gateway (public endpoint)

Expose a single API Gateway endpoint in front of the OpenEMS B2B REST API.

MBE (Vercel, server-side) ──HTTPS + API key──▶ API Gateway (public)
                                                     │
                                              TLS termination
                                              API key validation
                                              Rate limiting
                                                     │
                                                     ▼
                                            OpenEMS Backend (private EC2)
                                              Basic auth on /jsonrpc

Security layers: TLS + API key + Basic auth + rate limiting + CloudWatch logging + optional WAF.

What stays private: OpenEMS UI (:4200), edge websocket (:8081), Odoo (:10016), InfluxDB (:8086), SSH/SSM — all IP-allowlisted or VPN-only. Only the API Gateway URL is public.

Risk profile:

The API Gateway endpoint is discoverable and scannable on the internet
If API key + Basic auth credentials leak, an attacker can query energy data
Mitigation: credentials rotatable, API Gateway logging flags unusual patterns, rate limiting prevents brute force
Data sensitivity is low (kWh readings, edge status — not financial/PII)

Option 2: Move MBE to ECS (same VPC)

Move MBE off Vercel into ECS Fargate within the same VPC. OpenEMS B2B stays fully private.

Browser ──HTTPS──▶ ALB ──▶ MBE (ECS Fargate) ──private──▶ OpenEMS Backend
                                  │
                                  │ internet (HTTPS)
                                  ▼
                           Cloud Supabase (auth + DB)

Security: Zero public surface for OpenEMS. MBE-to-backend communication is private network only.

Cost: ~$40-60/mo (ALB + Fargate) — replacing what Vercel and Supabase provide for free today. For a pilot-stage startup, this is meaningful spend on infrastructure that adds operational burden without adding user value.

Effort: ~3 days. No MBE code changes — Dockerfile already exists and works. All infrastructure: ECS task definition, ALB, ACM cert, CI/CD pipeline, CloudWatch logging, DNS cutover.

Ongoing ops: Every deploy, log investigation, scaling decision, and certificate renewal that Vercel handles today becomes our responsibility.

Option 3: Lambda VPC proxy (recommended middle ground)

Keep MBE on Vercel. Move only the OpenEMS API calls to a thin Lambda function inside the VPC. OpenEMS stays fully private — no public endpoint.

Browser ──HTTPS──▶ MBE (Vercel)
                      │
                      │ MBE API routes call the Lambda
                      ▼
              Lambda Function URL (HTTPS, IAM auth)
              ┌─────────────────────────────┐
              │  VPC private subnet         │
              │                             │
              │  Lambda (passthrough proxy) │
              │       │                     │
              │       │ private network     │
              │       ▼                     │
              │  OpenEMS Backend :8082      │
              └─────────────────────────────┘

How it works: The Lambda is a ~10-line function — no business logic, no dependencies, no state. It receives the JSON-RPC request from Vercel, forwards it to OpenEMS on the private IP, and returns the response. The billing logic, meter discovery, and kWh calculations stay in MBE on Vercel.

What changes in MBE: One env var — OPENEMS_B2B_URL points to the Lambda Function URL instead of directly to OpenEMS. Zero code logic changes.

Security posture — better than Option 1:

	Option 1 (API Gateway)	Option 3 (Lambda proxy)
OpenEMS public surface	API Gateway URL (discoverable, scannable)	None — OpenEMS is fully private
Auth to reach OpenEMS	API key + Basic auth (secrets that can leak)	IAM auth — Lambda Function URL uses AWS SigV4. No shared secrets over the wire. Vercel calls with an IAM access key that only authorizes this specific Lambda.
Attack surface	Public HTTPS endpoint on the internet	Lambda Function URL can be scoped to IAM auth only — not callable without valid AWS credentials. Not useful even if discovered.
Credential compromise impact	Attacker can query energy data from anywhere	Attacker needs AWS IAM credentials (not just an API key), and the Lambda still enforces Basic auth to OpenEMS internally
Network path	Internet → API Gateway → Backend	Internet → Lambda Function URL → private network → Backend
What's exposed if breached	B2B REST API (read-only energy data)	Same, but harder to breach (IAM >> API key)

Why this is more secure than Option 1: The key difference is authentication strength. Option 1 uses an API key (a static string anyone can use if leaked). Option 3 uses AWS IAM authentication (SigV4 signed requests) — significantly harder to compromise, automatically time-limited, logged in CloudTrail, and revocable per-credential. And the OpenEMS backend itself remains on a private network with no public-facing port.

Cost: Essentially $0 at MBE's request volume (Lambda free tier: 1M requests/month). VPC-attached Lambda has no additional cost. Combined with the Vercel and Supabase free tiers that MBE already runs within, the entire MBE stack costs $0/mo. Moving to ECS (Option 2) would replace this with ~$40-60/mo in AWS infrastructure costs — meaningful spend for a pilot-stage project that adds operational burden without adding user-facing value.

Effort: ~1 day. Write the Lambda (10 lines), Terraform for Lambda + VPC config + Function URL + IAM role, update one env var in Vercel.

Stability: The Lambda is a stateless passthrough — no business logic, no dependencies, nothing to update. Deploy once, forget about it.

Comparison

	Option 1: API Gateway	Option 2: MBE to ECS	Option 3: Lambda proxy
OpenEMS stays private	❌ (API Gateway is public)	✅	✅
MBE stays on Vercel	✅	❌	✅
Supabase stays cloud	✅	✅	✅
Auth strength	API key + Basic auth	N/A (private network)	IAM SigV4 + Basic auth
Effort	~1 day	~3 days	~1 day
Ongoing ops	API Gateway only	ECS + ALB + CI/CD + logs	Lambda only (stateless)
MBE monthly cost	$0 (Vercel free)	~$38-71/mo ¹	~$0 ² (Vercel free + Lambda free)
Public attack surface	1 endpoint	0	0

¹ Option 2 cost breakdown (us-east-1, on-demand pricing):

Resource	Calculation	Monthly
ALB (fixed)	$0.0225/hr × 730 hrs	~$16
ALB LCU (usage)	Minimal traffic	~$2
Fargate vCPU	0.5 vCPU × $0.04048/hr × 730 hrs	~$15
Fargate memory	1 GB × $0.004445/hr × 730 hrs	~$3
CloudWatch Logs	Minimal ingestion	~$2
Subtotal (public subnet)		~$38/mo
NAT Gateway (if private subnet)	$0.045/hr × 730 hrs + data processing	~$33
Subtotal (private subnet)		~$71/mo

NAT Gateway is required if MBE runs in a private subnet — it needs outbound internet to reach cloud Supabase. Placing MBE in a public subnet avoids this cost but exposes the container to inbound internet (mitigated by security group rules).

² Option 3 cost breakdown (us-east-1, on-demand pricing):

MBE makes ~100-500 OpenEMS API calls/month (billing generation, meter discovery, status checks). Even at 100× that volume the cost is negligible.

Component	Rate	500 req/mo	50,000 req/mo
Lambda requests	$0.20 per 1M	$0.0001	$0.01
Lambda compute	$0.0000166667/GB-sec	$0.0005 (32 GB-sec)	$0.05 (3,200 GB-sec)
Data transfer (Lambda → OpenEMS)	$0 (intra-VPC)	$0	$0
Data transfer (Lambda → Vercel)	$0.09/GB after 100 GB free	$0 (~2.5 MB)	$0 (~250 MB)
Total		< $0.01/mo	$0.06/mo

Lambda free tier includes 1M requests/month and 400,000 GB-seconds/month — MBE won't approach either limit. Vercel free tier (100 GB bandwidth, 100K serverless function invocations) and Supabase free tier (500 MB database, 50K auth users) are similarly unconstrained at MBE's current and near-term scale.

What we'd like to hear from the team

Is Option 3 (Lambda proxy) acceptable? It keeps OpenEMS fully private while avoiding the ECS migration — and preserves the $0/mo cost of MBE running on Vercel/Supabase free tiers.
Is there a compliance or policy requirement that prevents any intermediary (even IAM-authenticated) from bridging Vercel to the VPC?
Does the team prefer the simplicity of Option 1 (accept the small risk of a public endpoint) over the zero-surface guarantee of Option 3?

If Option 3 works for the team, we can implement it in a day and move on to higher-value work.

cc @aidan-barnes-axm @tushabe @axmsoftware

amalbet · 2026-04-21T03:05:40Z

note that Claude-Code estimates assume human execution. It is probably a few minutes so don't take it literally.

amalbet · 2026-04-21T03:14:04Z

Update: PR #75 implements Option 3 (Lambda VPC proxy). Ready for review alongside this architecture discussion. (Replaces #74 which was closed to remove infrastructure metadata from the public repo.)

tushabe · 2026-04-21T20:34:14Z

Limit Security by VPN: I am not sure that a VPN is a good long term option to secure our data in transit so I like the options that limit that security approach to openEMS for now.
No compliance requirement that prevents any intermediaries that I am aware of, at least for the customers and the regulatory bodies in Uganda.
I like the simplicity of option 1. In another thread, I would like to understand why OpenEMS endpoints are public and why we can't add IAM to them.

@amalbet I'd say, God speed with option 3.

Captures the explicit PR #71 scope decision (UI/Odoo/B2B/WS over plain HTTP, IP-allowlisted) so it's not silently normalized, and opens the team discussion for HTTPS options (self-signed, Caddy+LE, ALB+ACM) before the dev env becomes persistent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Alejandro Malbet <amalbet@gmail.com>

amalbet mentioned this pull request Apr 16, 2026

Add iac/dev/ for single-EC2 dev environment #71

Open

8 tasks

amalbet mentioned this pull request Apr 16, 2026

Add AWS dev account setup guidelines #72

Open

3 tasks

Move dev account setup doc to its own PR

16e43b2

Split out to #72 so each doc can be reviewed independently. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Alejandro Malbet <amalbet@gmail.com>

tushabe reviewed Apr 16, 2026

View reviewed changes

amalbet mentioned this pull request Apr 21, 2026

Add Lambda VPC proxy for MBE-to-OpenEMS B2B access #73

Closed

13 tasks

amalbet mentioned this pull request Apr 21, 2026

Add Lambda VPC proxy for MBE-to-OpenEMS B2B access #74

Closed

7 tasks

amalbet mentioned this pull request Apr 21, 2026

Add Lambda VPC proxy for MBE-to-OpenEMS B2B access #75

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add AWS production architecture planning doc#70

Add AWS production architecture planning doc#70
amalbet wants to merge 4 commits into
mainfrom
docs/aws-production-architecture

amalbet commented Apr 16, 2026

Uh oh!

tushabe left a comment

Uh oh!

amalbet commented Apr 21, 2026 •

edited

Loading

Uh oh!

amalbet commented Apr 21, 2026

Uh oh!

amalbet commented Apr 21, 2026 •

edited

Loading

Uh oh!

tushabe commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

amalbet commented Apr 16, 2026

Summary

What's covered

Open questions (need team input)

Test plan

Uh oh!

tushabe left a comment

Choose a reason for hiding this comment

Uh oh!

amalbet commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Discussion: Do we need MBE in the same VPC as OpenEMS?

Why this matters

How MBE uses the OpenEMS APIs

Three options

Option 1: MBE on Vercel + API Gateway (public endpoint)

Option 2: Move MBE to ECS (same VPC)

Option 3: Lambda VPC proxy (recommended middle ground)

Comparison

What we'd like to hear from the team

Uh oh!

amalbet commented Apr 21, 2026

Uh oh!

amalbet commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tushabe commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amalbet commented Apr 21, 2026 •

edited

Loading

amalbet commented Apr 21, 2026 •

edited

Loading