Add AWS production architecture planning doc#70
Conversation
Documents the target deployment architecture for OpenEMS + MBE on AWS with OpenVPN for Raspberry Pi edge devices. Covers VPC layout, security model, environment strategy (dev/stage/prod), data flow, cost estimates, and open questions for team iteration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Alejandro Malbet <amalbet@gmail.com>
Guidelines for the IT admin to provision a dedicated dev AWS account isolated from production, with safe scoping for Claude Code agent sessions. Covers account structure, IAM strategy, Service Control Policies, cost controls, and a provisioning checklist. Complements the production architecture doc by establishing the foundation needed before we can start provisioning dev infrastructure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Alejandro Malbet <amalbet@gmail.com>
Split out to #72 so each doc can be reviewed independently. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Alejandro Malbet <amalbet@gmail.com>
tushabe
left a comment
There was a problem hiding this comment.
Thanks for documenting this, it gives me some visibility on decisions made here and why they were made. I especially appreciate the cost estimate of running the entire stack because it helps facilitate conversations around business feasibility post Pilot.
Discussion: Do we need MBE in the same VPC as OpenEMS?When we drafted this architecture, we defaulted to putting MBE (billing engine) in the same VPC as OpenEMS so the B2B REST communication stays on a private network. On reflection, we missed an important tradeoff discussion — we should consider whether MBE can stay on Vercel/Supabase (where it already works) and connect to OpenEMS over a secured public endpoint instead. Why this mattersMBE is already deployed and working on Vercel + cloud Supabase. Moving it to ECS means:
The only reason to move MBE into the VPC is to keep the OpenEMS B2B endpoint private. So the real question is: can we secure that connection without migrating MBE? How MBE uses the OpenEMS APIsMBE connects to the OpenEMS Backend B2B REST endpoint (port 8082,
The data flowing through is energy consumption readings — not financial data, not PII, not credentials. Three optionsOption 1: MBE on Vercel + API Gateway (public endpoint)Expose a single API Gateway endpoint in front of the OpenEMS B2B REST API. Security layers: TLS + API key + Basic auth + rate limiting + CloudWatch logging + optional WAF. What stays private: OpenEMS UI (:4200), edge websocket (:8081), Odoo (:10016), InfluxDB (:8086), SSH/SSM — all IP-allowlisted or VPN-only. Only the API Gateway URL is public. Risk profile:
Option 2: Move MBE to ECS (same VPC)Move MBE off Vercel into ECS Fargate within the same VPC. OpenEMS B2B stays fully private. Security: Zero public surface for OpenEMS. MBE-to-backend communication is private network only. Cost: ~$40-60/mo (ALB + Fargate) — replacing what Vercel and Supabase provide for free today. For a pilot-stage startup, this is meaningful spend on infrastructure that adds operational burden without adding user value. Effort: ~3 days. No MBE code changes — Dockerfile already exists and works. All infrastructure: ECS task definition, ALB, ACM cert, CI/CD pipeline, CloudWatch logging, DNS cutover. Ongoing ops: Every deploy, log investigation, scaling decision, and certificate renewal that Vercel handles today becomes our responsibility. Option 3: Lambda VPC proxy (recommended middle ground)Keep MBE on Vercel. Move only the OpenEMS API calls to a thin Lambda function inside the VPC. OpenEMS stays fully private — no public endpoint. How it works: The Lambda is a ~10-line function — no business logic, no dependencies, no state. It receives the JSON-RPC request from Vercel, forwards it to OpenEMS on the private IP, and returns the response. The billing logic, meter discovery, and kWh calculations stay in MBE on Vercel. What changes in MBE: One env var — Security posture — better than Option 1:
Why this is more secure than Option 1: The key difference is authentication strength. Option 1 uses an API key (a static string anyone can use if leaked). Option 3 uses AWS IAM authentication (SigV4 signed requests) — significantly harder to compromise, automatically time-limited, logged in CloudTrail, and revocable per-credential. And the OpenEMS backend itself remains on a private network with no public-facing port. Cost: Essentially $0 at MBE's request volume (Lambda free tier: 1M requests/month). VPC-attached Lambda has no additional cost. Combined with the Vercel and Supabase free tiers that MBE already runs within, the entire MBE stack costs $0/mo. Moving to ECS (Option 2) would replace this with ~$40-60/mo in AWS infrastructure costs — meaningful spend for a pilot-stage project that adds operational burden without adding user-facing value. Effort: ~1 day. Write the Lambda (10 lines), Terraform for Lambda + VPC config + Function URL + IAM role, update one env var in Vercel. Stability: The Lambda is a stateless passthrough — no business logic, no dependencies, nothing to update. Deploy once, forget about it. Comparison
¹ Option 2 cost breakdown (us-east-1, on-demand pricing):
NAT Gateway is required if MBE runs in a private subnet — it needs outbound internet to reach cloud Supabase. Placing MBE in a public subnet avoids this cost but exposes the container to inbound internet (mitigated by security group rules). ² Option 3 cost breakdown (us-east-1, on-demand pricing): MBE makes ~100-500 OpenEMS API calls/month (billing generation, meter discovery, status checks). Even at 100× that volume the cost is negligible.
Lambda free tier includes 1M requests/month and 400,000 GB-seconds/month — MBE won't approach either limit. Vercel free tier (100 GB bandwidth, 100K serverless function invocations) and Supabase free tier (500 MB database, 50K auth users) are similarly unconstrained at MBE's current and near-term scale. What we'd like to hear from the team
If Option 3 works for the team, we can implement it in a day and move on to higher-value work. |
|
note that Claude-Code estimates assume human execution. It is probably a few minutes so don't take it literally. |
@amalbet I'd say, God speed with option 3. |
Captures the explicit PR #71 scope decision (UI/Odoo/B2B/WS over plain HTTP, IP-allowlisted) so it's not silently normalized, and opens the team discussion for HTTPS options (self-signed, Caddy+LE, ALB+ACM) before the dev env becomes persistent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Alejandro Malbet <amalbet@gmail.com>
Summary
Planning document for the target AWS production deployment architecture. Covers OpenEMS Backend + MBE on AWS with OpenVPN for Raspberry Pi edge devices at microgrid sites.
This is a draft for team review — we're iterating on the architecture over the next few days before committing to specific tickets. Please leave line-by-line comments on anything that needs refinement.
What's covered
Open questions (need team input)
Test plan
🤖 Generated with Claude Code