Skip to content
Merged
123 changes: 98 additions & 25 deletions docs/PROJECT_RULES.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,31 +16,32 @@ This document defines the specific standards for Flowker development, following
| 4 | [Frameworks & Libraries](#frameworks--libraries) | Required packages and versions |
| 5 | [Data Stores](#data-stores) | Dual-database architecture (MongoDB + PostgreSQL) |
| 6 | [Authentication](#authentication) | Plugin Auth + API Key dual model |
| 7 | [Executor Catalog & Providers](#executor-catalog--providers) | Static catalog vs dynamic configuration |
| 8 | [Configuration](#configuration) | Environment variable handling |
| 9 | [Observability](#observability) | OpenTelemetry integration |
| 10 | [Bootstrap](#bootstrap) | Application initialization |
| 11 | [HTTP Responses](#http-responses) | Response standardization |
| 12 | [Pagination](#pagination) | Cursor-based pagination |
| 13 | [Middleware Order](#middleware-order) | Critical middleware sequence |
| 14 | [Data Transformation](#data-transformation-toentityfromentity-mandatory) | ToEntity/FromEntity patterns |
| 15 | [UUID Fields in Models](#uuid-fields-in-models) | UUID type usage |
| 16 | [Error Handling](#error-handling) | Business vs Technical errors, wrapping rules |
| 17 | [Context Cancellation Checks](#context-cancellation-checks-mandatory) | Check ctx.Err() before processing |
| 18 | [Input Normalization Order](#input-normalization-order-mandatory) | Normalize-Validate-Store pattern |
| 19 | [Sentinel Errors for Constructors](#sentinel-errors-for-constructors-mandatory) | Return errors, never panic |
| 20 | [Function Design](#function-design-mandatory) | Single responsibility principle |
| 21 | [Whitespace Style](#whitespace-style-wsl_v5) | Code formatting conventions |
| 22 | [Testing](#testing) | Table-driven tests, deterministic data, edge cases |
| 23 | [Shared Test Helpers](#shared-test-helpers) | Centralized test utilities |
| 24 | [Logging](#logging) | Structured logging with libLog.Any |
| 25 | [Linting](#linting) | golangci-lint configuration |
| 26 | [API Documentation](#api-documentation) | Swagger/OpenAPI generation |
| 27 | [Architecture Patterns](#architecture-patterns) | Hexagonal architecture |
| 28 | [Directory Structure](#directory-structure) | Project layout (Lerian pattern) |
| 29 | [File Naming Conventions](#file-naming-conventions-mandatory) | Go snake_case file naming |
| 30 | [Domain Models](#domain-models-mandatory) | Rich Domain Models (NOT anemic) |
| 31 | [Forbidden Practices](#forbidden-practices) | What NOT to do |
| 7 | [SSRF Protection](#ssrf-protection-mandatory-for-tenant-supplied-urls) | Mandatory `pkg/safehttp` for tenant-supplied URLs |
| 8 | [Executor Catalog & Providers](#executor-catalog--providers) | Static catalog vs dynamic configuration |
| 9 | [Configuration](#configuration) | Environment variable handling |
| 10 | [Observability](#observability) | OpenTelemetry integration |
| 11 | [Bootstrap](#bootstrap) | Application initialization |
| 12 | [HTTP Responses](#http-responses) | Response standardization |
| 13 | [Pagination](#pagination) | Cursor-based pagination |
| 14 | [Middleware Order](#middleware-order) | Critical middleware sequence |
| 15 | [Data Transformation](#data-transformation-toentityfromentity-mandatory) | ToEntity/FromEntity patterns |
| 16 | [UUID Fields in Models](#uuid-fields-in-models) | UUID type usage |
| 17 | [Error Handling](#error-handling) | Business vs Technical errors, wrapping rules |
| 18 | [Context Cancellation Checks](#context-cancellation-checks-mandatory) | Check ctx.Err() before processing |
| 19 | [Input Normalization Order](#input-normalization-order-mandatory) | Normalize-Validate-Store pattern |
| 20 | [Sentinel Errors for Constructors](#sentinel-errors-for-constructors-mandatory) | Return errors, never panic |
| 21 | [Function Design](#function-design-mandatory) | Single responsibility principle |
| 22 | [Whitespace Style](#whitespace-style-wsl_v5) | Code formatting conventions |
| 23 | [Testing](#testing) | Table-driven tests, deterministic data, edge cases |
| 24 | [Shared Test Helpers](#shared-test-helpers) | Centralized test utilities |
| 25 | [Logging](#logging) | Structured logging with libLog.Any |
| 26 | [Linting](#linting) | golangci-lint configuration |
| 27 | [API Documentation](#api-documentation) | Swagger/OpenAPI generation |
| 28 | [Architecture Patterns](#architecture-patterns) | Hexagonal architecture |
| 29 | [Directory Structure](#directory-structure) | Project layout (Lerian pattern) |
| 30 | [File Naming Conventions](#file-naming-conventions-mandatory) | Go snake_case file naming |
| 31 | [Domain Models](#domain-models-mandatory) | Rich Domain Models (NOT anemic) |
| 32 | [Forbidden Practices](#forbidden-practices) | What NOT to do |

---

Expand Down Expand Up @@ -205,6 +206,78 @@ router.Post("/v1/webhooks/:id", authGuard.With("webhooks", "execute", true), han

---

## SSRF Protection (MANDATORY for tenant-supplied URLs)

Flowker's executors call URLs supplied by tenant operators (provider `base_url`, executor `baseURL`, HTTP runner `Config["url"]`). Without protection these are classic **SSRF** sinks (CWE-918): a malicious or careless tenant can reach cloud metadata endpoints (`169.254.169.254`), lateral-move into internal services, or use DNS rebinding to bypass naive host checks.

### Design: dial-time enforcement (NOT URL substitution)

Earlier iterations of `pkg/safehttp` returned a "pinned URL" — the original URL with the hostname substituted by a resolved IP literal — and instructed callers to assign the original authority to `req.Host`. **That approach is removed.** It works for plain HTTP but **breaks HTTPS**: Go derives TLS SNI and certificate verification from `req.URL.Host`, so substituting an IP literal makes the TLS handshake reject the certificate (`x509: cannot validate certificate for <ip> because it doesn't contain any IP SANs`). `req.Host` only influences the HTTP `Host` header at layer 7 — it never reaches TLS.

The current design keeps the hostname in `req.URL.Host` and moves SSRF IP enforcement to `Transport.DialContext`. The transport re-resolves DNS at dial time, validates every candidate IP against the policy, and dials the first safe one. TLS handshake uses the correct SNI because `req.URL.Host` is unchanged, and DNS rebinding is closed out by the atomic resolve-and-dial in the same call.

### Rules

- **Every outbound HTTP request that originates from tenant-supplied configuration MUST route through `pkg/safehttp`.** No exceptions: no `http.DefaultClient`, no bare `http.NewRequest` against a tenant URL.
- **Pre-flight write-time validation: `safehttp.Validate(ctx, rawURL, allowedHosts)`.** Performs scheme check, per-tenant allow-list match (when non-empty), and DNS+IP policy lookup. Returns an error wrapping one of `ErrInvalidURL`, `ErrHostNotAllowed`, `ErrBlocked`, `ErrDNSFailed` — use `errors.Is` to branch. Call this when persisting a tenant URL or producing a friendly error before a runtime request.
- **Do NOT rewrite the URL before issuing the request.** Issue the request against the ORIGINAL `rawURL` (hostname intact). The dial-time policy is authoritative — `Validate`'s DNS lookup is informational only.
- **Build the `*http.Client` via `safehttp.NewClient(base, timeout)`.** `NewClient` shallow-copies the base client (never mutates the caller's value), installs `SafeTransport()` as the Transport, and wraps any existing `CheckRedirect` via `safehttp.CheckRedirectGuard`. The guard re-validates every redirect target against the static SSRF policy before the dial happens.
- **`SafeTransport()` installs `dialSafe` as `DialContext`.** It re-resolves DNS via `net.DefaultResolver`, rejects any hostname whose resolution returns even one blocked IP (matches `libSSRF.ResolveAndValidate` semantics — mixed safe/unsafe IP sets are treated as hostile), and dials the first safe IP in resolver order.
- **Per-tenant allow-list: `ProviderConfiguration.AllowedHosts`.** OPT-IN — empty/nil means "no restriction" (permissive default). When populated:
- Entries are normalized: lowercase, trailing dot stripped.
- Plain entries (`acme.com`) are **exact match only**.
- Leading-dot entries (`.acme.com`) are **suffix-with-label-boundary** — match `api.acme.com` and `x.y.acme.com` but explicitly NOT `acme.com` itself and NOT `acme.com.attacker.io`.
- Validation rules in `validateAllowedHosts` reject IP literals, wildcards, ports, entries > 253 chars, and the slice is capped at 100 entries as defense in depth.
- Surfaced for cross-field checks via `safehttp.HostAllowed(host, allowed)` so command handlers can detect "this URL hostname is not permitted by this tenant's own allow-list" before persistence.

### Default block-list (static policy)

`libSSRF` default policy blocks: loopback, RFC1918 private ranges, link-local (including `169.254.169.254`), IPv6 ULA/link-local, IPv4-mapped IPv6, and non-`http`/`https` schemes. For local development set `SSRF_ALLOW_PRIVATE=true` (cached once per process via a sync.Once; re-arm in tests with `safehttp.ResetForTest`).

**`SSRF_ALLOW_PRIVATE=true` is NOT a full opt-out.** A `criticalBlockedRanges` carve-out is enforced unconditionally at dial time, even when allow-private is set:

| Range | Coverage |
|-------|----------|
| `169.254.0.0/16` | IPv4 link-local — includes AWS IMDS (`169.254.169.254`), Azure IMDS, GCP `metadata.google.internal` |
| `fe80::/10` | IPv6 link-local |
| `fd00:ec2::/64` | AWS IPv6 IMDS (`fd00:ec2::254`) |

This is the **Capital One canonical case**: a developer enabling allow-private for local Docker should never lose protection against metadata endpoints. Without this carve-out, `SSRF_ALLOW_PRIVATE=true` would silently lift the most important block.

### `HTTP_PROXY` / `HTTPS_PROXY` are intentionally ignored

`SafeTransport()` does NOT set `Transport.Proxy`. Honouring `http.ProxyFromEnvironment` would route outbound dials to whatever address `HTTP_PROXY` / `HTTPS_PROXY` resolves to — bypassing `dialSafe` entirely against the tenant-supplied destination (`dialSafe` would run against the proxy host, not the target). That defeats the SSRF guarantee.

If infrastructure requires egress through a corporate proxy, configure a transport-level CONNECT proxy (e.g. Smokescreen) at the network layer, NOT via Go HTTP client env vars. This boundary is enforced by a pinning test (`TestSafeTransport_DoesNotHonorHTTPProxyEnv`).

### Error sentinels (HTTP 422 in handlers)

| Sentinel | Code | Surface |
|----------|------|---------|
| `ErrProviderConfigSSRFBlocked` | `FLK-0341` | Provider base URL targets a restricted destination |
| `ErrProviderConfigInvalidAllowedHost` | `FLK-0317` | Allow-list entry rejected by `validateAllowedHosts` (IP literal, wildcard, port, > 253 chars, etc.) |
| `ErrProviderConfigHostNotInAllowlist` | `FLK-0320` | Cross-field check: a base URL hostname is not permitted by the configuration's own allow-list |
| `ErrExecutorConfigSSRFBlocked` | `FLK-0284` | Executor write-time SSRF validation (base URL targets restricted destination) |
| `ErrExecutorConfigHostNotAllowed` | `FLK-0285` | Executor write-time check: target host not in allow-list |

### Known gaps (accepted; tracked)

- **Per-tenant `AllowedHosts` is NOT re-evaluated on redirect.** `CheckRedirectGuard` re-runs the STATIC policy (scheme + IP-range check via `libSSRF.ValidateURL`, plus the dial-time check at the next hop) on every redirect target, but the hostname allow-list applied at `Validate()` is not. **Justification:** the dial-time IP policy still blocks metadata / private / link-local on redirect targets; the residual exposure is a one-shot response leak to an arbitrary public host (no internal-network compromise). Re-applying the allow-list on redirect requires context plumbing through `net/http.CheckRedirect` or stateful per-tenant clients — overengineering for the residual risk. Pinned by `TestSafeTransport_DialBlocksMetadataEvenOnRedirect`. See the godoc on `safehttp.CheckRedirectGuard` for the full threat-model discussion.
- **`ExecutorConfiguration` has no `providerConfigurationID` field today**, so the executor write-time / runtime path cannot enforce the per-tenant allow-list transitively. Only the static policy (private/loopback/metadata blocking) applies. Adding the FK is a follow-up; the sentinel `ErrExecutorConfigHostNotAllowed` is already wired so the change is localized when it lands.
- **HTTP Runner** (`pkg/executors/http/runner.go`) receives `Config["url"]` at runtime with no link to any ExecutorConfiguration record. Same gap as above; same mitigations apply.
- **Network-level defenses** (Kubernetes `NetworkPolicy`, IMDSv2 hop-limit, egress proxy like Smokescreen) are explicitly **SRE responsibility** — not enforced in code. Coordinate via the infra trilha.

### Forbidden patterns

- **Substituting a resolved IP literal into `req.URL.Host`** (the legacy "pinned URL" pattern). Breaks TLS for HTTPS targets — TLS uses `req.URL.Host` for SNI and certificate verification, and IP literals are not in production cert SANs. The dial-time approach already closes the TOCTOU/DNS-rebinding window without rewriting the URL.
- **Validating a hostname statically and trusting `net.Dialer`'s implicit resolution at connect time** (TOCTOU / DNS rebinding). Always go through `safehttp.NewClient` so the dial-time resolver in `SafeTransport.DialContext` is authoritative.
- **Substring matching on hostnames** (`strings.Contains(host, "internal")`) — trivially bypassed by `internal.attacker.com`. Use `safehttp.HostAllowed` with exact + suffix-with-label-boundary semantics.
- **Allow-list of CIDR blocks** *instead of* hostname matching for tenant-supplied URLs — the dial-time IP policy already covers IP-range blocking; the tenant allow-list is hostname-based by design.
- **Re-using `http.Client` instances** without going through `safehttp.NewClient` — defense-in-depth assumes every outbound call has `SafeTransport` and `CheckRedirectGuard` wired in. Bare `http.Client{}` ships none of that.
- **Setting `Transport.Proxy = http.ProxyFromEnvironment`** on a client that calls tenant URLs (see "`HTTP_PROXY` / `HTTPS_PROXY` are intentionally ignored" above).

---

## Executor Catalog & Providers

Flowker distinguishes **Providers** (static catalog, compiled into the binary) from **Executor Configurations** (dynamic records, stored in MongoDB).
Expand Down
4 changes: 2 additions & 2 deletions internal/adapters/http/in/catalog/handler_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -379,10 +379,10 @@ func TestGetTemplate_EnrichesSchemaWithProviderConfigOptions(t *testing.T) {
})
require.NoError(t, err)

tracerConfig, err := model.NewProviderConfiguration("prod-tracer", nil, "tracer", map[string]any{"url": "https://tracer.example.com"})
tracerConfig, err := model.NewProviderConfiguration("prod-tracer", nil, "tracer", map[string]any{"url": "https://tracer.example.com"}, nil)
require.NoError(t, err)

midazConfig, err := model.NewProviderConfiguration("prod-midaz", nil, "midaz", map[string]any{"url": "https://midaz.example.com"})
midazConfig, err := model.NewProviderConfiguration("prod-midaz", nil, "midaz", map[string]any{"url": "https://midaz.example.com"}, nil)
require.NoError(t, err)

lister := &mockConfigLister{
Expand Down
6 changes: 6 additions & 0 deletions internal/adapters/http/in/executor_configuration/handler.go
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,12 @@ func (h *Handler) handleError(c *fiber.Ctx, err error) error {
case errors.Is(err, constant.ErrConflictStateChanged):
return libHTTP.Respond(c, fiber.StatusConflict, api.ErrorResponse{Code: constant.ErrConflictStateChanged.Error(), Title: "Conflict", Message: "resource state changed concurrently; retry with latest version"})

case errors.Is(err, constant.ErrExecutorConfigSSRFBlocked):
return libHTTP.Respond(c, fiber.StatusUnprocessableEntity, api.ErrorResponse{Code: constant.ErrExecutorConfigSSRFBlocked.Error(), Title: "Unprocessable Entity", Message: "base URL rejected by SSRF policy"})

case errors.Is(err, constant.ErrExecutorConfigHostNotAllowed):
return libHTTP.Respond(c, fiber.StatusUnprocessableEntity, api.ErrorResponse{Code: constant.ErrExecutorConfigHostNotAllowed.Error(), Title: "Unprocessable Entity", Message: "host not permitted by allowlist"})

default:
return libHTTP.Respond(c, fiber.StatusInternalServerError, api.ErrorResponse{Code: constant.ErrInternalServer.Error(), Title: "Internal Server Error", Message: "internal server error"})
}
Expand Down
6 changes: 6 additions & 0 deletions internal/adapters/http/in/provider_configuration/handler.go
Original file line number Diff line number Diff line change
Expand Up @@ -370,6 +370,12 @@ func (h *Handler) handleError(c *fiber.Ctx, err error) error {
case errors.Is(err, constant.ErrProviderConfigMissingBaseURL):
return libHTTP.Respond(c, fiber.StatusUnprocessableEntity, api.ErrorResponse{Code: constant.ErrProviderConfigMissingBaseURL.Error(), Title: "Unprocessable Entity", Message: "provider configuration config missing required field 'base_url'"})

case errors.Is(err, constant.ErrProviderConfigInvalidAllowedHost):
return libHTTP.Respond(c, fiber.StatusUnprocessableEntity, api.ErrorResponse{Code: constant.ErrProviderConfigInvalidAllowedHost.Error(), Title: "Unprocessable Entity", Message: err.Error()})

case errors.Is(err, constant.ErrProviderConfigHostNotInAllowlist):
return libHTTP.Respond(c, fiber.StatusUnprocessableEntity, api.ErrorResponse{Code: constant.ErrProviderConfigHostNotInAllowlist.Error(), Title: "Unprocessable Entity", Message: err.Error()})

case isValidationErrorWithCode(err, constant.ErrProviderConfigCannotModify.Error()):
return libHTTP.Respond(c, fiber.StatusUnprocessableEntity, api.ErrorResponse{Code: constant.ErrProviderConfigCannotModify.Error(), Title: "Unprocessable Entity", Message: "cannot modify provider configuration in current status"})

Expand Down
8 changes: 4 additions & 4 deletions internal/adapters/http/in/readyz/checker_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,16 @@ type mockMongoPinger struct {
pingErr error
}

func (m *mockMongoPinger) IsConnected() bool { return m.connected }
func (m *mockMongoPinger) IsConnected() bool { return m.connected }
func (m *mockMongoPinger) Ping(ctx context.Context) error { return m.pingErr }

type mockMongoConfig struct {
uri string
tlsCACert string
}

func (m *mockMongoConfig) GetURI() string { return m.uri }
func (m *mockMongoConfig) GetTLSCACert() string { return m.tlsCACert }
func (m *mockMongoConfig) GetURI() string { return m.uri }
func (m *mockMongoConfig) GetTLSCACert() string { return m.tlsCACert }

func TestMongoDBChecker_Name(t *testing.T) {
checker := readyz.NewMongoDBChecker(nil, nil)
Expand Down Expand Up @@ -116,7 +116,7 @@ type mockPostgresPinger struct {
pingErr error
}

func (m *mockPostgresPinger) IsConnected() bool { return m.connected }
func (m *mockPostgresPinger) IsConnected() bool { return m.connected }
func (m *mockPostgresPinger) Ping(ctx context.Context) error { return m.pingErr }

type mockPostgresConfig struct {
Expand Down
6 changes: 3 additions & 3 deletions internal/adapters/http/in/readyz/sample_bodies_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,9 @@ type sampleChecker struct {
tlsEnabled bool
}

func (s *sampleChecker) Name() string { return s.name }
func (s *sampleChecker) Ping(_ context.Context) error { return s.pingErr }
func (s *sampleChecker) IsTLSEnabled() bool { return s.tlsEnabled }
func (s *sampleChecker) Name() string { return s.name }
func (s *sampleChecker) Ping(_ context.Context) error { return s.pingErr }
func (s *sampleChecker) IsTLSEnabled() bool { return s.tlsEnabled }

type sampleTenantChecker struct {
name string
Expand Down
Loading
Loading