feat(authz): add OTel tracing to authorization decision path and DB queries#3307
feat(authz): add OTel tracing to authorization decision path and DB queries#3307pflynn-virtru wants to merge 3 commits intomainfrom
Conversation
… queries Add internal span instrumentation to break down authorization decision latency into visible phases (policy fetch, entity resolution, evaluation) and add automatic database query tracing via otelpgx. Authorization spans: NewJustInTimePDP (with cache.hit attribute), JustInTimePDP.GetDecision, EvaluateObligations, ResolveEntities (with entity_identifier.type), and EvaluateDecision — applied to both GetDecision and GetEntitlements flows. DB tracing: set otelpgx.NewTracer() on pgx ConnConfig to instrument all SQL queries across all services with zero callsite changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Paul Flynn <pflynn-virtru@users.noreply.github.com>
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 48 minutes and 53 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughOpenTelemetry tracing is introduced across the access control subsystem. The Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances observability within the authorization service by implementing granular OpenTelemetry tracing. By breaking down complex authorization decision processes into distinct, traceable phases and adding automatic database query instrumentation, it significantly improves the ability to diagnose latency and performance bottlenecks in the authorization decision path. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. The spans now trace the path we tread, / From policy fetch to DB spread. / With latency clear and errors caught, / The hidden bugs are finally sought. Footnotes
|
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Paul Flynn <pflynn-virtru@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This pull request integrates OpenTelemetry tracing into the JustInTimePDP service and database operations using otelpgx. Spans have been added to track the lifecycle of policy decisions, entity resolution, and entitlement retrieval. The review feedback highlights several instances where spans are closed without recording errors or setting error statuses, which could lead to incomplete observability in the tracing backend. Specifically, parent spans in GetDecision and GetEntitlements, as well as specific error paths in entity resolution and decision evaluation, should be updated to include RecordError and SetStatus calls to ensure failures are correctly captured in traces.
Benchmark results, click to expandBenchmark authorization.GetDecisions Results:
Benchmark authorization.v2.GetMultiResourceDecision Results:
Benchmark Statistics
Bulk Benchmark Results
TDF3 Benchmark Results:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@service/internal/access/v2/just_in_time_pdp.go`:
- Around line 271-278: The early-return branches around evalSpan ending (where
entityRepresentationDecision == nil and the other similar branch) end the span
without setting error status or recording the error; update both branches in
just_in_time_pdp.go to call evalSpan.RecordError(err) (or
RecordError(fmt.Errorf("nil decision") for the nil case),
evalSpan.SetStatus(codes.Error, err.Error() or a descriptive message) before
calling evalSpan.End(), then return the formatted error; ensure the same fix is
applied to the other occurrence that mirrors this logic (the branch around
entityRepresentationDecision nil at the later block).
- Around line 81-83: The code calls store.IsEnabled() and store.IsReady(ctx)
without checking for nil which can panic when a nil store is allowed; update the
cache probe to guard against nil by evaluating store != nil first (e.g., compute
cacheUsed as store != nil && store.IsEnabled() && store.IsReady(ctx)), then call
span.SetAttributes(attribute.Bool("cache.hit", cacheUsed)); ensure any
subsequent fallback initialization path (the branch after if !cacheUsed) still
handles nil store correctly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8cd8ba8c-4e77-4d5a-8158-4e2563673a01
⛔ Files ignored due to path filters (1)
service/go.sumis excluded by!**/*.sum
📒 Files selected for processing (3)
service/go.modservice/internal/access/v2/just_in_time_pdp.goservice/pkg/db/db.go
Benchmark results, click to expandBenchmark authorization.GetDecisions Results:
Benchmark authorization.v2.GetMultiResourceDecision Results:
Benchmark Statistics
Bulk Benchmark Results
TDF3 Benchmark Results:
|
Ensure all error return paths in GetDecision and GetEntitlements record errors on both the child span (resolveSpan/evalSpan/oblSpan) and the parent span, so failed decisions are correctly surfaced in trace search and filtering without requiring drill-down into child spans. Also fixes the nil-decision early return to use a descriptive error message instead of wrapping a nil error. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Paul Flynn <pflynn-virtru@users.noreply.github.com>
Benchmark results, click to expandBenchmark authorization.GetDecisions Results:
Benchmark authorization.v2.GetMultiResourceDecision Results:
Benchmark Statistics
Bulk Benchmark Results
TDF3 Benchmark Results:
|
|
Summary
GetDecisionMultiResourcetrace into visible phases: policy fetch, obligation evaluation, entity resolution, and decision evaluationotelpgx(pgx v5 native tracer interface)Context
Companion to #3295 (trace context propagation across Connect RPC boundaries). That PR ensures trace context flows between services; this PR adds the internal spans that make the latency breakdown visible within the authorization service.
Without this change,
GetDecisionMultiResourceshows a single 2.3s span with no children. With it, the trace hierarchy becomes:DB queries appear as spans with standard semantic conventions (
db.system,db.statement) viaotelpgx.Changes
service/internal/access/v2/just_in_time_pdp.go— Addtracerfield (from global provider), instrumentNewJustInTimePDP,GetDecision, andGetEntitlementswith spans and attributesservice/pkg/db/db.go— Setotelpgx.NewTracer()on pgxConnConfig.TracerinbuildConfig()service/go.mod— Addgithub.com/exaring/otelpgx v0.10.0Test plan
go build ./...compilesgo test ./internal/access/v2/...— all tests pass (noop tracer, zero-cost)go test ./authorization/v2/...— all tests passgo test ./pkg/db/...— all tests passgolangci-lint runon all affected packages — 0 issues🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes