fix(service): record real authenticated actor identity (Taura finding)#35
Conversation
…SQL round-trip and API key path
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughCaptures principals (API-key or JWT) into request context, resolves actor (type/id/name/ip) when recording audit events, persists actor_name in the DB (with migrations/hash updates), and adds unit/integration/e2e tests plus a .gitignore entry. ChangesAudit Actor Attribution
Git Worktree Configuration
Comment |
🔒 Security Scan Results —
|
| Policy | Status |
|---|---|
| Default non-root user | ✅ Passed |
| No fixable critical/high CVEs | ✅ Passed |
| No high-profile vulnerabilities | ✅ Passed |
| No AGPL v3 licenses | ✅ Passed |
Pre-release Version Check
✅ No unstable version pins found.
📊 Unit Test Coverage Report:
|
| Metric | Value |
|---|---|
| Overall Coverage | 46.3% |
| Threshold | 85% |
Coverage by Package
| Package | Coverage |
|---|---|
github.com/LerianStudio/flowker/api |
100.0% |
github.com/LerianStudio/flowker/internal/adapters/http/in/audit |
0.0% |
github.com/LerianStudio/flowker/internal/adapters/http/in/catalog |
60.9% |
github.com/LerianStudio/flowker/internal/adapters/http/in/dashboard |
91.5% |
github.com/LerianStudio/flowker/internal/adapters/http/in/execution |
81.0% |
github.com/LerianStudio/flowker/internal/adapters/http/in/executor_configuration |
70.8% |
github.com/LerianStudio/flowker/internal/adapters/http/in/health |
100.0% |
github.com/LerianStudio/flowker/internal/adapters/http/in/middleware/testutil |
0.0% |
github.com/LerianStudio/flowker/internal/adapters/http/in/middleware |
57.9% |
github.com/LerianStudio/flowker/internal/adapters/http/in/provider_configuration |
33.1% |
github.com/LerianStudio/flowker/internal/adapters/http/in/readyz |
81.3% |
github.com/LerianStudio/flowker/internal/adapters/http/in/webhook |
87.5% |
github.com/LerianStudio/flowker/internal/adapters/http/in/workflow |
69.7% |
github.com/LerianStudio/flowker/internal/adapters/http/in |
33.3% |
github.com/LerianStudio/flowker/internal/adapters/mongodb/dashboard |
43.0% |
github.com/LerianStudio/flowker/internal/adapters/mongodb/execution |
64.8% |
github.com/LerianStudio/flowker/internal/adapters/mongodb/executor_configuration |
36.6% |
github.com/LerianStudio/flowker/internal/adapters/mongodb/provider_configuration |
63.0% |
github.com/LerianStudio/flowker/internal/adapters/mongodb/workflow |
69.2% |
github.com/LerianStudio/flowker/internal/adapters/postgresql/audit |
41.9% |
github.com/LerianStudio/flowker/internal/bootstrap |
40.5% |
github.com/LerianStudio/flowker/internal/services/command |
42.1% |
github.com/LerianStudio/flowker/internal/services/query |
11.2% |
github.com/LerianStudio/flowker/internal/services |
3.2% |
github.com/LerianStudio/flowker/internal/testutil |
63.6% |
github.com/LerianStudio/flowker/pkg/circuitbreaker |
97.6% |
github.com/LerianStudio/flowker/pkg/clock |
0.0% |
github.com/LerianStudio/flowker/pkg/condition |
87.9% |
github.com/LerianStudio/flowker/pkg/contextutil |
100.0% |
github.com/LerianStudio/flowker/pkg/executor/base |
20.0% |
github.com/LerianStudio/flowker/pkg/executor |
39.5% |
github.com/LerianStudio/flowker/pkg/executors/http/auth |
58.2% |
github.com/LerianStudio/flowker/pkg/executors/http |
49.5% |
github.com/LerianStudio/flowker/pkg/executors/midaz |
80.8% |
github.com/LerianStudio/flowker/pkg/executors/s3 |
0.0% |
github.com/LerianStudio/flowker/pkg/executors/tracer |
88.0% |
github.com/LerianStudio/flowker/pkg/executors |
57.1% |
github.com/LerianStudio/flowker/pkg/model |
60.7% |
github.com/LerianStudio/flowker/pkg/net/http |
91.3% |
github.com/LerianStudio/flowker/pkg/pagination |
0.0% |
github.com/LerianStudio/flowker/pkg/safehttp |
90.3% |
github.com/LerianStudio/flowker/pkg/ssrfopt |
100.0% |
github.com/LerianStudio/flowker/pkg/templates/tracer_midaz |
81.5% |
github.com/LerianStudio/flowker/pkg/templates |
0.0% |
github.com/LerianStudio/flowker/pkg/transformation |
67.2% |
github.com/LerianStudio/flowker/pkg/triggers/webhook |
0.0% |
github.com/LerianStudio/flowker/pkg/triggers |
0.0% |
github.com/LerianStudio/flowker/pkg/webhook |
100.0% |
github.com/LerianStudio/flowker/pkg |
90.3% |
Generated by Go PR Analysis workflow
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/adapters/http/in/middleware/apikey.go`:
- Around line 45-48: Trim cfg.Label before checking for empty so whitespace-only
labels aren't treated as valid; replace the current label assignment (label :=
cfg.Label ... if label == "" { label = defaultAPIKeyLabel }) with logic that
does label = strings.TrimSpace(cfg.Label) and then fall back to
defaultAPIKeyLabel when the trimmed label is empty, updating any import/use in
the same file (apikey.go) to include strings where needed and keeping the
variable name label unchanged for downstream audit attribution.
In `@internal/services/command/record_audit_event.go`:
- Around line 170-176: The current logic in the contextutil.GetPrincipal
handling keeps p.ID/p.Name when Principal.Type is unrecognized, causing
misattribution; change the branch in the record-audit-actor logic so that if
model.AuditActorType(p.Type) is not valid you do not reuse p.ID or p.Name but
instead return the canonical system actor (use model.AuditActorTypeSystem) —
i.e., detect invalid actorType and directly return the canonical system/flowker
actor via model.NewAuditActor(...) rather than rewriting only the type while
keeping p.ID/p.Name; update the code around contextutil.GetPrincipal(...) and
model.NewAuditActor(...) accordingly.
In `@migrations/000002_audit_actor_name.down.sql`:
- Around line 44-119: The DOWN migration drops actor_name but the restored
verify_audit_hash_chain() function recomputes hashes without actor_name, which
will invalidate rows that were created when actor_name was included; before
dropping actor_name in the ALTER TABLE, either (A) detect any audit_events rows
that were created/hashed with actor_name (e.g. non-null actor_name or created_at
>= the 000002 migration activation time) and abort the downgrade by RAISE
EXCEPTION with a clear message, or (B) recompute and rewrite the chain hashes
and previous_hash values for the affected range so hashes match the legacy
algorithm (use the same hash_input used in the restored verify_audit_hash_chain
to generate expected_hash and update hash and previous_hash in audit_events,
walking the chain in order). Ensure changes reference verify_audit_hash_chain,
audit_events, actor_name, hash and previous_hash so the migration either blocks
when affected rows exist or remaps stored hashes before dropping actor_name.
In `@migrations/000002_audit_actor_name.up.sql`:
- Around line 35-57: The verifier currently recomputes every row with the new
actor_name term which will invalidate pre-migration rows; either (A) preserve
the legacy formula for historical rows by changing the verifier
(verify_audit_hash_chain / the internal compute hash function used by it) to
branch: if actor_name IS NULL (or id <= N if you capture the migration cutoff)
compute the legacy hash, else compute the new hash that includes actor_name, or
(B) perform an explicit re-baseline in this migration: add a data-migration step
in the .up.sql that recomputes and updates stored hashes in audit_events for all
existing rows using the new formula (and mark them as rebaselined), and ensure
.down.sql restores the legacy verifier consistently; pick one of these fixes and
implement it in the verifier function or the migration data-update so historical
rows remain verifiable.
In `@tests/e2e/audit_actor_e2e_test.go`:
- Around line 235-251: The polling loop in the test currently breaks only when
hasEventType(entries, "EXECUTION_COMPLETED") is true, so executions that
terminate with "EXECUTION_FAILED" wait until timeout; update the loop condition
inside the for loop that checks entries (the block that calls
findExecutionAuditEntries(t, client, execID)) to break when either
hasEventType(entries, "EXECUTION_COMPLETED") OR hasEventType(entries,
"EXECUTION_FAILED") is true, leaving the subsequent assertions (require.NotEmpty
and assert.True checks) unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: c236ea50-03b4-4e84-8c46-ed6d1a8f3d93
📒 Files selected for processing (23)
.gitignoreinternal/adapters/http/in/middleware/apikey.gointernal/adapters/http/in/middleware/apikey_test.gointernal/adapters/http/in/middleware/auth_guard.gointernal/adapters/http/in/middleware/auth_guard_test.gointernal/adapters/http/in/middleware/jwt_claims.gointernal/adapters/postgresql/audit/repository.gointernal/adapters/postgresql/audit/repository_test.gointernal/bootstrap/config.gointernal/services/command/record_audit_event.gointernal/services/command/record_audit_event_test.gomigrations/000002_audit_actor_name.down.sqlmigrations/000002_audit_actor_name.up.sqlpkg/constant/errors.gopkg/contextutil/context.gopkg/contextutil/context_test.gopkg/model/audit_entry.gopkg/model/audit_entry_output.gopkg/model/audit_entry_test.gotests/e2e/audit_actor_e2e_test.gotests/integration/audit_actor_test.gotests/integration/audit_repository_sql_test.gotests/integration/main_test.go
🔍 Bug-Fix Evidence — Audit actor identity (Taura finding)Evidence captured from a real end-to-end execution against Postgres + MongoDB testcontainers booted by this PR's integration harness. Reproduces the exact scenario described in the Taura security finding ( Path A — Unauthenticated request (system fallback, preserved)Request: POST /v1/workflows HTTP/1.1
Host: 127.0.0.1:59103
Content-Type: application/jsonResulting audit row: {
"eventType": "WORKFLOW_CREATED",
"action": "CREATE",
"result": "SUCCESS",
"resourceId": "a7d432d7-ddf8-4866-9b27-67bf1585f90e",
"resourceType": "workflow",
"actor": {
"type": "system",
"id": "flowker",
"name": "",
"ipAddress": "127.0.0.1"
}
}
Path B — X-API-Key authenticated request (the bug fix in action)Request: POST /v1/workflows HTTP/1.1
Host: 127.0.0.1:59104
X-API-Key: test-api-key-fixed-value
Content-Type: application/jsonResulting audit row: {
"eventType": "WORKFLOW_CREATED",
"action": "CREATE",
"result": "SUCCESS",
"resourceId": "f8dcddab-ba2c-4cde-af3f-d652774ea037",
"resourceType": "workflow",
"actor": {
"type": "api_key",
"id": "test-label",
"name": "",
"ipAddress": "127.0.0.1"
}
}Side-by-side comparison
Coverage of the third path (JWT user)The JWT path (
Persistent regression coverageThe same scenarios above are run automatically on every CI execution via:
Reproducing locallymake up # boots Postgres + Mongo + Flowker via docker-compose
curl -X POST http://localhost:3000/v1/workflows \
-H "X-API-Key: ${API_KEY}" \
-H "Content-Type: application/json" \
-d '{...}'
curl http://localhost:3000/v1/audit-events?limit=1 | jq '.items[0].actor'
# Expected: { "type": "api_key", "id": "<API_KEY_LABEL>", "ipAddress": "<real IP>", ... } |
🔍 Bug-Fix Evidence — JWT user actions (CREATE + UPDATE workflow)Complementing the previous evidence comment which covered the system-fallback and X-API-Key paths, this captures real user actions authenticated via a Bearer JWT — the path Taura's finding most directly targets ("ações realizadas por usuários privilegiados"). The integration harness was extended in-test with a third Flowker server ( Action 1 — Workflow CREATERequest: POST /v1/workflows HTTP/1.1
Host: 127.0.0.1:62041
Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJlbWFpbCI6Imdh.....Zf8VffHIGtkj8ZNKja9m...
Content-Type: application/json
# JWT claims (decoded):
# sub: f0a8c1d2-3e4b-4f5a-9c6d-7e8f9a0b1c2d
# preferred_username: alex.garzao
# email: garzao@lerian.studioResulting audit row ( {
"eventType": "WORKFLOW_CREATED",
"action": "CREATE",
"result": "SUCCESS",
"resourceId": "7065e468-18d0-415c-a490-229718c3526d",
"resourceType": "workflow",
"actor": {
"type": "user",
"id": "f0a8c1d2-3e4b-4f5a-9c6d-7e8f9a0b1c2d",
"name": "alex.garzao",
"ipAddress": "127.0.0.1"
}
}Action 2 — Workflow UPDATE (same Bearer token)Request: PUT /v1/workflows/7065e468-18d0-415c-a490-229718c3526d HTTP/1.1
Host: 127.0.0.1:62041
Authorization: Bearer <same JWT as above>
Content-Type: application/jsonResulting audit row: {
"eventType": "WORKFLOW_UPDATED",
"action": "UPDATE",
"result": "SUCCESS",
"resourceId": "7065e468-18d0-415c-a490-229718c3526d",
"resourceType": "workflow",
"actor": {
"type": "user",
"id": "f0a8c1d2-3e4b-4f5a-9c6d-7e8f9a0b1c2d",
"name": "alex.garzao",
"ipAddress": "127.0.0.1"
}
}Verdict
Scope of this evidence
Closing the Taura findingBefore this PR, every audit row showed
All three paths capture the real client IP via |
Per OIDC Core 1.0 sec.2 and RFC 9068 sec.2.2, 'sub' is REQUIRED on ID Tokens and JWT access tokens (including client_credentials). A token that parses but lacks 'sub' cannot be attributed to a principal, so accepting it would silently record the audit row with the generic system/flowker actor — the structural failure mode the Taura audit originally flagged. extractPrincipalFromBearer now returns (rejected, err) so the outer guard can short-circuit after the 401 response is written, instead of falling through to lib-auth.Authorize. Behavior preserved for the 'no Bearer header' and 'malformed token' paths — lib-auth still handles those.
🛡️ Closing scenario 2 (JWT sem `sub`) — strict rejectApós investigar a recomendação da spec OIDC/RFC 9068 e a postura de IdPs maduros (Keycloak, Auth0, Okta, Entra), o comportamento permissivo anterior (aceita token + log warning + fallback para `system/flowker`) recriava o exato anti-pattern que a Taura reportou. Endereçado neste mesmo PR antes do merge. Commit: 8220710 — `fix(auth): reject JWT without 'sub' claim with 401` Mudança de comportamento
Justificativa (fontes autoritativas)
Cobertura de testeSubstituiu `TestPrincipal_FromJWT_NoSub_NoPrincipal` (que asseverava o comportamento antigo permissivo) por `TestPrincipal_FromJWT_NoSub_Rejects401`:
`make pre-merge` continua verde: 1589 unit tests, lint 0, sec 0. Risco de availability (mitigado pelo contexto)A pesquisa apontou risco de availability: se o IdP em produção parar de emitir `sub` em alguma classe de token (precedente real: Keycloak issue #31082), a aplicação rejeitaria em massa. Como o Flowker não tem dados em produção ainda, esse risco está mitigado por design — qualquer regressão futura do IdP seria caught antes do deploy. Follow-up (escopo separado)O padrão "service principal nomeado" via allow-list de `client_id` (alinhado ao que o Tracer já usa: `svc_tracer` / "Tracer Limit Manager") fica como melhoria futura quando/se houver M2M legítimo emitindo token sem `sub`. Não é necessário para fechar a finding da Taura — a rejeição estrita já elimina a ambiguidade no audit log. |
Adds two persistent integration tests against a third Flowker server booted in-test with PLUGIN_AUTH_ENABLED=true + mock Access Manager: - TestAuditActor_JWT_WithSub_StampsUserActor — positive path proving a Bearer token with sub/preferred_username produces actor.type=user with the JWT identity in the audit row. - TestAuditActor_JWT_WithoutSub_Returns401_NoAuditRow — strict reject path proving a Bearer without sub returns 401 UNAUTHORIZED_MISSING_SUB AND writes no audit row (no silent fallback to system/flowker). Server is bootstrapped via sync.Once so multiple tests share the same mock Access Manager + Flowker listener. Mongo + Postgres are reused from the primary harness. E2E is intentionally not extended: the JWT auth boundary is exercised at the same layer (Fiber listener + lib-auth round-trip + audit DB) that the new integration tests already cover. Adding E2E would duplicate coverage without exercising a new layer.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/integration/audit_actor_jwt_test.go`:
- Around line 73-111: Snapshotting of environment into the originals map is done
but restoration only occurs on the success path; to ensure env is always
restored even if bootstrap.InitServers() or waitForServer(...) fail, immediately
call a deferred restore right after creating the originals map (use defer to
iterate over originals and call os.Setenv for each key/value) so that regardless
of returns or errors in setEnvForApp, bootstrap.InitServers, or waitForServer
the environment is reverted; ensure the defer runs before any early returns and
keep am.Close() calls as needed in error paths.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5fb3d62d-c10c-4c17-8fbc-88233fd93b26
📒 Files selected for processing (1)
tests/integration/audit_actor_jwt_test.go
…return paths The JWT-server bootstrap helper snapshotted PLUGIN_AUTH_* and friends before mutating them, but restored only on the success path. If bootstrap.InitServers() or waitForServer() failed, the env stayed mutated and subsequent integration tests would read a PLUGIN_AUTH_ADDRESS pointing at a closed httptest.Server — silent test-suite poisoning. Move the restore into a defer right after the snapshot so it runs on every exit path, including the early returns on bootstrap/readiness failure.
Pull Request Checklist
Pull Request Type
Checklist
Additional Notes
Context — Taura security finding
The security firm Taura identified a structural flaw shared by Flowker (this PR) and Tracer (out of scope): every audit event was recorded with a hardcoded generic actor (
type="system",id="flowker",ipAddress="0.0.0.0"), regardless of which authenticated user or application triggered the action. Per the finding: "ações realizadas por usuários privilegiados tornam-se indistinguíveis de atividades legítimas esperadas da plataforma, inviabilizando a responsabilização individual e reduzindo significativamente a efetividade de processos de investigação e auditoria pós-incidente."Root cause:
internal/services/command/record_audit_event.go:85(pre-fix), whereNewAuditActor(System, "flowker", "")was unconditional.What changed (per commit)
build(gitignore): ignore .worktrees/ for parallel branches— tooling.feature(contextutil): add Principal for authenticated identity— newPrincipal{Type, ID, Name}carried viacontext.Context.Typekept asstringto avoid an import cycle intopkg/model.fix(audit): record real authenticated actor identity in events— extendsAuditActorwithname, new errorFLK-0613, replaces hardcoded system actor withresolveActor(ctx)capturing the Principal before the fire-and-forget goroutine. Includes migration000002_audit_actor_namethat adds the column and updates bothcalculate_audit_event_hash()andverify_audit_hash_chain()to includeactor_namein the canonical hash input (see migration header for the re-baseline trade-off — applies cleanly to environments without pre-existing audit rows).feature(auth): propagate JWT and API key identity to request context— newjwt_claims.goparses Bearer viajwt.ParseUnverified(same primitive lib-auth uses internally; trust still gated by lib-auth'sAuthorize); API key middleware stampsPrincipal{api_key, <API_KEY_LABEL>}. New env varAPI_KEY_LABEL(defaultflowker-default).test(audit): integration test for actor identity end-to-end— system-fallback integration test.test(integration): expand audit actor coverage— table-driven coverage of every workflow event type (CREATE/UPDATE/ACTIVATE/DEACTIVATE/DRAFT/DELETE), SQL round-trip foractor_name, and dual-server harness exercising the realapi_keypath withX-API-Key.test(e2e): audit assertion in webhook execution flow— full pipeline: webhook → execution → audit row with the expected actor.fix(security): suppress gosec G101 false positives on audit actor label— narrow// #nosec G101on theflowker-defaultconstant (audit identifier, not a credential), matching the existing pattern used in this repo.Resulting behavior
type=user,id=<sub>,name=<preferred_username or email>,ipAddress=<real client IP>type=api_key,id=<API_KEY_LABEL>,ipAddress=<real client IP>type=system,id=flowker(preserved fallback for genuine system events)Test results (local)
go build ./...— cleango vet ./...— cleanmake lint— 0 issuesmake sec— 0 issuesmake pre-merge— greenConscious deviations / follow-ups
docs/PROJECT_RULES.mdnot updated with the new actor semantics — acceptance criterion in the plan, deferred to a follow-up PR to keep this one focused on code + tests.actor_name. Pre-migration rows (hashed under the old formula) will failverify_audit_hash_chainafter the migration. The migration's.up.sqldocuments this explicitly; recommended in environments without production audit data (Flowker's current posture).lib-auth/v2claims: lib-auth v2.7.0 validates JWTs but does not expose claims to the consumer. The middleware usesParseUnverifiedto extract identity after lib-auth's validation gate runs — same primitive lib-auth uses internally for its own authorization round-trip, so the security envelope is unchanged.Obs: Please, always remember to target your PR to develop branch instead of main.