K8s-friendly nginx: cluster DNS resolver, /matter route, MATTER_API_URL env#3
K8s-friendly nginx: cluster DNS resolver, /matter route, MATTER_API_URL env#3an9xyz wants to merge 1 commit into
Conversation
Three coupled changes to nginx.conf.template + docker-entrypoint.sh so that the prebuilt image works out of the box in both Kubernetes and docker-compose deployments: 1. Replace the hardcoded `resolver 127.0.0.11` (docker embedded DNS, only valid inside docker-compose) with the cluster's CoreDNS service - kube-dns.kube-system.svc.cluster.local. This is what production nginx configs already use and unblocks variable-form proxy_pass on Kubernetes (the previous setting returned 502 on /summary because nginx could not resolve the service hostname at request time). 2. Add a `location /matter/` block that proxies to $matter_api_url (env: MATTER_API_URL) with the same optional-503 + rewrite pattern already used for /summary. Closes a gap where the matter (task / timeline) service had no front-door route in octo-web. 3. Extend docker-entrypoint.sh to envsubst the new MATTER_API_URL variable alongside SUMMARY_API_URL and API_URL.
Jerry-Xin
left a comment
There was a problem hiding this comment.
This PR adds the right /matter/ route shape, but the resolver configuration is internally inconsistent and will break at least one supported deployment path.
🔴 Blocking
🟡 Warning — nginx.conf.template hardcodes resolver kube-dns.kube-system.svc.cluster.local even though docker-entrypoint.sh defines and exports NGINX_RESOLVER with a docker-compose default of 127.0.0.11. envsubst includes ${NGINX_RESOLVER}, but the template never references it, so users cannot actually override the resolver through the documented env var. This contradicts the PR goal of working out of the box in both Kubernetes and docker-compose, and can make /summary/ and /matter/ fail in compose because nginx will try to use the Kubernetes DNS service name as its runtime resolver. Use resolver ${NGINX_RESOLVER} valid=30s ipv6=off; or otherwise align the entrypoint/defaults/template.
💬 Non-blocking
🔵 Suggestion — Add a small config-rendering test for docker-entrypoint.sh/nginx.conf.template that asserts MATTER_API_URL and NGINX_RESOLVER are substituted into the generated config. The current security-header test would not catch this regression.
✅ Highlights
The /matter/ location mirrors the existing optional /summary/ pattern, returns JSON for the unconfigured case, and rewrites /matter/api/v1/... to /api/v1/..., which matches the existing frontend API base path.
Jerry-Xin
left a comment
There was a problem hiding this comment.
octo-web #3 Review — CHANGES_REQUESTED
/matter/ 路由 + resolver 提级到 server 块 + env var 化,方向完全正确。Jerry-Xin 抓到的 P0 我逐行确认了,确实存在。
🔴 P0 — NGINX_RESOLVER env var 有名无实(+1 Jerry-Xin)
entrypoint 侧链路完整:
: "${NGINX_RESOLVER:=127.0.0.11}"
export ... NGINX_RESOLVER
envsubst '... ${NGINX_RESOLVER}' < /nginx.conf.template > ...但 模板里硬编码了 K8s DNS 名:
resolver kube-dns.kube-system.svc.cluster.local valid=30s ipv6=off;应该是:
resolver ${NGINX_RESOLVER} valid=30s ipv6=off;现状:NGINX_RESOLVER 被 envsubst 处理但模板里没有占位符引用它,env var 写了等于没写。docker-compose 用户拿到的 resolver 是 kube-dns.kube-system.svc.cluster.local——容器里根本解析不到,/summary/* 和 /matter/* 全部 502。
修法一行:模板里 kube-dns.kube-system.svc.cluster.local → ${NGINX_RESOLVER}。
💬 Non-blocking
-
注释可更新:模板里的注释写的是 "K8s CoreDNS service",修完之后 resolver 值由 env 决定,注释应改成说明 env var 机制而非指向特定值,比如 "Runtime DNS resolver, default 127.0.0.11 (Docker embedded DNS), override via NGINX_RESOLVER env"
-
/matter/rewrite 比/summary/更宽:/summary/只匹配/summary/api/v1/*,/matter/匹配/matter/*全量。设计上没问题(matter 后端可能有/api/v1/以外的路径),只是留意两者风格不一致
✅ Highlights
/matter/location 完全复用了 #2 建立的 optional-503 + rewrite + variable proxy_pass 模式,一致性好- resolver 从 location 提到 server 级消除了重复,新 location 自动继承
resolver_timeout 5s是好补充- entrypoint 里 POSIX default + export + envsubst 链路设计是对的,模板占位符补上就通了
- 顺手修了
docker-entrypoint.sh末尾缺少的 newline 👍
lml2468
left a comment
There was a problem hiding this comment.
I agree the existing resolver blocker still needs a fix, and I did not find an additional PR-introduced blocker in the /matter path handling.
Blocking items:
- Existing P1, already raised by Jerry-Xin: I independently rendered the template with NGINX_RESOLVER=127.0.0.11 and the generated config still contains
resolver kube-dns.kube-system.svc.cluster.local valid=30s ipv6=off;, whiledocker-entrypoint.shincludes${NGINX_RESOLVER}in envsubst. That confirms the env var is currently not connected to the template. I am not opening a duplicate blocker for the same issue.
Non-blocking notes:
apps/web/vite.config.ts:118: the dev proxy rewrites/matter/api/v1whenVITE_MATTER_API_URLis set, but not when the fallbackVITE_TODO_API_URLis set. IfVITE_TODO_API_URLis also intended to point directly at the matter/todo backend, local dev will forward/matter/api/v1/...instead of/api/v1/.... If it is only a legacy alias for the gateway, the current behavior is fine but the comment/variable naming is ambiguous.nginx.conf.template:53: the existing/summarylocation keeps the PR #2 fix pattern: rewrite first, thenproxy_pass $summary_api_urlwithout a URI suffix, so I do not see the variable proxy_pass path-loss bug there.nginx.conf.template:78:/matter/api/v1/mattersrewrites to/api/v1/mattersbeforeproxy_pass $matter_api_url, so the new production route does strip the/matterprefix correctly.
Highlights:
- The new
/matter/production location mirrors the optional-service behavior from/summary: blank env returns JSON 503, configured env proxies with auth/forwarding headers, and query suffixes should remain on the rewritten URI. Dockerfilestill copies the entrypoint and template into the nginx image, so the entrypoint change is on the runtime path.
lml2468
left a comment
There was a problem hiding this comment.
Upgrading to CHANGES_REQUESTED — I independently verified the resolver blocker raised by Jerry-Xin.
P0 — NGINX_RESOLVER env var not wired into template (+1 Jerry-Xin, +1 Allen)
I rendered the template locally with envsubst — the generated config still has the hardcoded resolver kube-dns.kube-system.svc.cluster.local valid=30s ipv6=off; regardless of NGINX_RESOLVER value. The entrypoint defines/exports/envsubsts the var, but nginx.conf.template:27 never references ${NGINX_RESOLVER}.
Fix: resolver kube-dns.kube-system.svc.cluster.local valid=30s ipv6=off; → resolver ${NGINX_RESOLVER} valid=30s ipv6=off;
Impact: docker-compose users get a resolver that does not exist inside the container → all variable proxy_pass locations (/summary/*, /matter/*) return 502.
Non-blocking
vite.config.ts:118-122: dev proxy rewrites/matterprefix only whenVITE_MATTER_API_URLis set, not whenVITE_TODO_API_URLfallback is used. IfVITE_TODO_API_URLalso points directly at the matter service, dev requests will arrive with the/matterprefix intact. Minor — only affects local dev, not production nginx.
Highlights
/matter/location correctly mirrors the/summary/optional-503 + rewrite + variable proxy_pass pattern from PR #2- Resolver promoted to server level eliminates per-location duplication
resolver_timeout 5sis a good addition- Entrypoint POSIX defaults are clean; template placeholder is the only missing link
Jerry-Xin
left a comment
There was a problem hiding this comment.
Resolver configuration bug:
The entrypoint correctly exports NGINX_RESOLVER (default 127.0.0.11) and passes it to envsubst, but nginx.conf.template hardcodes the resolver to kube-dns.kube-system.svc.cluster.local instead of using ${NGINX_RESOLVER}.
This means:
- Docker-compose deployments will break —
kube-dns.kube-system.svc.cluster.localdoesn't resolve outside K8s - The
NGINX_RESOLVERenv var has no effect (exported but never referenced in template)
Fix: change the template line to:
resolver ${NGINX_RESOLVER} valid=30s ipv6=off;The rest of the PR (MATTER_API_URL route, graceful 503 fallback) looks good.
Addresses the new findings from the second round of review on commit 3cac5f1. Blocking - B1' (Jerry-Xin + yujiawei): the round-1 fix only handled known HTTP status codes on confirm/create, but mapBindError's top branch returns `terminal: false` for any non-OidcBindHttpError (network timeout, fetch abort, parseLoginResp JSON throw, applyLoginResp invariant throw). Same stuck-spinner failure mode as B1 since the confirming/creating loader doesn't render inlineError. Now: post-verify endpoints terminate on *every* failure shape, including non-HTTP errors and confirm 401. Non-blocking cleanups picked up - yujiawei #2: move `/oidc/bind` pathname guard above the `?invite=` short-circuit in apps/web/src/Layout/index.tsx. Defense in depth — no documented URL composes both today, but if one ever did, the bind token would otherwise be silently dropped. - yujiawei #4: merge the two `from '../oidc'` import lines in BindPage.tsx. - yujiawei #5: drop the now-contradictory "唯一例外 429 / 503 — 允许再试" comment in runConfirm (since round-1 made those terminal). Replace with a one-liner describing the actual contract. - yujiawei #7: remove `BindErrorDisplay.retryable`. The field was set on 503/500 of interactive endpoints but no caller read it (the "retry" UX it was meant to gate never materialised — users retry by re-submitting the form directly). Drop the field + the three test assertions on it. - yujiawei #9: replace the magic 200ms setTimeout with a comment explaining it's there to let the success state paint + Toast settle before navigation tears down the React tree. Skipped - yujiawei #3 (entryRef unmount cleanup): incompatible with the existing `initRanRef` guard under React 18 StrictMode dev — the spurious unmount fires the new cleanup, nulling entryRef, then the remount skips the init body because initRanRef.current is true (refs persist across strict-mode remount). Same trap as the OidcResume bug fixed earlier. Memory pressure of leaving entryRef on the closure is one object until the full-page navigation kills the JS context — not worth re-introducing the trap to save it. Tests - vitest 162/162 - playwright 19/19
Summary
Three coupled changes to `nginx.conf.template` + `docker-entrypoint.sh` so the prebuilt image works out of the box in both Kubernetes and docker-compose deployments:
Test plan