Skip to content

feat(authz): add OTel tracing to authorization decision path and DB queries#3307

Open
pflynn-virtru wants to merge 3 commits intomainfrom
feat/otel-authorization-db-tracing
Open

feat(authz): add OTel tracing to authorization decision path and DB queries#3307
pflynn-virtru wants to merge 3 commits intomainfrom
feat/otel-authorization-db-tracing

Conversation

@pflynn-virtru
Copy link
Copy Markdown
Member

@pflynn-virtru pflynn-virtru commented Apr 15, 2026

Summary

  • Adds internal OTel spans to the authorization decision path, breaking down the opaque GetDecisionMultiResource trace into visible phases: policy fetch, obligation evaluation, entity resolution, and decision evaluation
  • Adds automatic database query tracing across all services via otelpgx (pgx v5 native tracer interface)

Context

Companion to #3295 (trace context propagation across Connect RPC boundaries). That PR ensures trace context flows between services; this PR adds the internal spans that make the latency breakdown visible within the authorization service.

Without this change, GetDecisionMultiResource shows a single 2.3s span with no children. With it, the trace hierarchy becomes:

GetDecisionMultiResource
├── NewJustInTimePDP           [cache.hit=bool]
│   ├── ListAttributes         (otelconnect, on cache miss)
│   ├── ListSubjectMappings    (otelconnect, on cache miss)
│   ├── ListRegisteredResources(otelconnect, on cache miss)
│   └── ListObligations        (otelconnect, on cache miss)
└── JustInTimePDP.GetDecision  [resource.count=N]
    ├── EvaluateObligations
    ├── ResolveEntities        [entity_identifier.type=token|entity_chain|...]
    │   ├── CreateEntityChainsFromTokens (otelconnect)
    │   └── ResolveEntities RPC          (otelconnect)
    └── EvaluateDecision       [entity_representation.count=N]

DB queries appear as spans with standard semantic conventions (db.system, db.statement) via otelpgx.

Changes

  • service/internal/access/v2/just_in_time_pdp.go — Add tracer field (from global provider), instrument NewJustInTimePDP, GetDecision, and GetEntitlements with spans and attributes
  • service/pkg/db/db.go — Set otelpgx.NewTracer() on pgx ConnConfig.Tracer in buildConfig()
  • service/go.mod — Add github.com/exaring/otelpgx v0.10.0

Test plan

  • go build ./... compiles
  • go test ./internal/access/v2/... — all tests pass (noop tracer, zero-cost)
  • go test ./authorization/v2/... — all tests pass
  • go test ./pkg/db/... — all tests pass
  • golangci-lint run on all affected packages — 0 issues
  • Manual: run with OTLP exporter, verify span hierarchy in trace viewer

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features
    • Added OpenTelemetry tracing to authorization decisions and database operations, providing enhanced observability with metrics for cache usage, entity resolution, and decision evaluation.

… queries

Add internal span instrumentation to break down authorization decision
latency into visible phases (policy fetch, entity resolution, evaluation)
and add automatic database query tracing via otelpgx.

Authorization spans: NewJustInTimePDP (with cache.hit attribute),
JustInTimePDP.GetDecision, EvaluateObligations, ResolveEntities
(with entity_identifier.type), and EvaluateDecision — applied to both
GetDecision and GetEntitlements flows.

DB tracing: set otelpgx.NewTracer() on pgx ConnConfig to instrument all
SQL queries across all services with zero callsite changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Paul Flynn <pflynn-virtru@users.noreply.github.com>
@pflynn-virtru pflynn-virtru requested review from a team as code owners April 15, 2026 16:29
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 15, 2026

Warning

Rate limit exceeded

@pflynn-virtru has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 48 minutes and 53 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 48 minutes and 53 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e0a6dd4e-d185-428d-a40f-54c66b639f76

📥 Commits

Reviewing files that changed from the base of the PR and between 60f7146 and d01127e.

📒 Files selected for processing (1)
  • service/internal/access/v2/just_in_time_pdp.go
📝 Walkthrough

Walkthrough

OpenTelemetry tracing is introduced across the access control subsystem. The service/go.mod adds the github.com/exaring/otelpgx dependency. The JustInTimePDP receives a tracer field and comprehensive span instrumentation around its decision-making stages. The database connection pool is configured to use pgx's native OpenTelemetry tracer for query-level observability.

Changes

Cohort / File(s) Summary
Dependencies
service/go.mod
Added github.com/exaring/otelpgx v0.10.0 as a direct module requirement.
Tracing Instrumentation
service/internal/access/v2/just_in_time_pdp.go
Added tracer field to JustInTimePDP and wrapped major operations (NewJustInTimePDP, GetDecision, obligation evaluation, entity resolution, decision consolidation, GetEntitlements) with OpenTelemetry spans. Included error recording, status codes, and attributes (cache hits, resource/entity counts, resolution types).
Database Configuration
service/pkg/db/db.go
Imported github.com/exaring/otelpgx and configured pgx connection pool to use otelpgx.NewTracer() for native query instrumentation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

size/s

Suggested reviewers

  • jakedoublev
  • elizabethhealy

Poem

🐰 A tracer hops through every span,
Observing flows with careful scan,
Database whispers, access decides,
Now visibility guides the tide! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title accurately and concisely summarizes the main change: adding OpenTelemetry tracing to the authorization decision path and database queries, which aligns with all three modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/otel-authorization-db-tracing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances observability within the authorization service by implementing granular OpenTelemetry tracing. By breaking down complex authorization decision processes into distinct, traceable phases and adding automatic database query instrumentation, it significantly improves the ability to diagnose latency and performance bottlenecks in the authorization decision path.

Highlights

  • Authorization Tracing: Instrumented the authorization decision path in JustInTimePDP with OpenTelemetry spans to provide visibility into policy fetching, obligation evaluation, entity resolution, and decision evaluation phases.
  • Database Tracing: Integrated otelpgx to enable automatic OpenTelemetry tracing for all database queries performed via pgx.
  • Dependency Update: Added github.com/exaring/otelpgx v0.10.0 to go.mod to support native database query instrumentation.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.


The spans now trace the path we tread, / From policy fetch to DB spread. / With latency clear and errors caught, / The hidden bugs are finally sought.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@pflynn-virtru pflynn-virtru changed the title feat(service): add OTel tracing to authorization decision path and DB queries feat(authz): add OTel tracing to authorization decision path and DB queries Apr 15, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Paul Flynn <pflynn-virtru@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request integrates OpenTelemetry tracing into the JustInTimePDP service and database operations using otelpgx. Spans have been added to track the lifecycle of policy decisions, entity resolution, and entitlement retrieval. The review feedback highlights several instances where spans are closed without recording errors or setting error statuses, which could lead to incomplete observability in the tracing backend. Specifically, parent spans in GetDecision and GetEntitlements, as well as specific error paths in entity resolution and decision evaluation, should be updated to include RecordError and SetStatus calls to ensure failures are correctly captured in traces.

Comment thread service/internal/access/v2/just_in_time_pdp.go
Comment thread service/internal/access/v2/just_in_time_pdp.go
Comment thread service/internal/access/v2/just_in_time_pdp.go
Comment thread service/internal/access/v2/just_in_time_pdp.go
Comment thread service/internal/access/v2/just_in_time_pdp.go
Comment thread service/internal/access/v2/just_in_time_pdp.go
@github-actions
Copy link
Copy Markdown
Contributor

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric Value
Approved Decision Requests 1000
Denied Decision Requests 0
Total Time 188.821749ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric Value
Approved Decision Requests 1000
Denied Decision Requests 0
Total Time 95.107358ms

Benchmark Statistics

Name № Requests Avg Duration Min Duration Max Duration

Bulk Benchmark Results

Metric Value
Total Decrypts 100
Successful Decrypts 100
Failed Decrypts 0
Total Time 411.722019ms
Throughput 242.88 requests/second

TDF3 Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 5000
Failed Requests 0
Concurrent Requests 50
Total Time 42.085368253s
Average Latency 418.630257ms
Throughput 118.81 requests/second

@github-actions
Copy link
Copy Markdown
Contributor

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@service/internal/access/v2/just_in_time_pdp.go`:
- Around line 271-278: The early-return branches around evalSpan ending (where
entityRepresentationDecision == nil and the other similar branch) end the span
without setting error status or recording the error; update both branches in
just_in_time_pdp.go to call evalSpan.RecordError(err) (or
RecordError(fmt.Errorf("nil decision") for the nil case),
evalSpan.SetStatus(codes.Error, err.Error() or a descriptive message) before
calling evalSpan.End(), then return the formatted error; ensure the same fix is
applied to the other occurrence that mirrors this logic (the branch around
entityRepresentationDecision nil at the later block).
- Around line 81-83: The code calls store.IsEnabled() and store.IsReady(ctx)
without checking for nil which can panic when a nil store is allowed; update the
cache probe to guard against nil by evaluating store != nil first (e.g., compute
cacheUsed as store != nil && store.IsEnabled() && store.IsReady(ctx)), then call
span.SetAttributes(attribute.Bool("cache.hit", cacheUsed)); ensure any
subsequent fallback initialization path (the branch after if !cacheUsed) still
handles nil store correctly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8cd8ba8c-4e77-4d5a-8158-4e2563673a01

📥 Commits

Reviewing files that changed from the base of the PR and between 63aac72 and 60f7146.

⛔ Files ignored due to path filters (1)
  • service/go.sum is excluded by !**/*.sum
📒 Files selected for processing (3)
  • service/go.mod
  • service/internal/access/v2/just_in_time_pdp.go
  • service/pkg/db/db.go

Comment thread service/internal/access/v2/just_in_time_pdp.go
Comment thread service/internal/access/v2/just_in_time_pdp.go Outdated
@github-actions
Copy link
Copy Markdown
Contributor

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric Value
Approved Decision Requests 1000
Denied Decision Requests 0
Total Time 183.194537ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric Value
Approved Decision Requests 1000
Denied Decision Requests 0
Total Time 87.867222ms

Benchmark Statistics

Name № Requests Avg Duration Min Duration Max Duration

Bulk Benchmark Results

Metric Value
Total Decrypts 100
Successful Decrypts 100
Failed Decrypts 0
Total Time 439.615467ms
Throughput 227.47 requests/second

TDF3 Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 5000
Failed Requests 0
Concurrent Requests 50
Total Time 41.216708339s
Average Latency 410.922338ms
Throughput 121.31 requests/second

Ensure all error return paths in GetDecision and GetEntitlements record
errors on both the child span (resolveSpan/evalSpan/oblSpan) and the
parent span, so failed decisions are correctly surfaced in trace search
and filtering without requiring drill-down into child spans.

Also fixes the nil-decision early return to use a descriptive error
message instead of wrapping a nil error.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Paul Flynn <pflynn-virtru@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric Value
Approved Decision Requests 1000
Denied Decision Requests 0
Total Time 204.024317ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric Value
Approved Decision Requests 1000
Denied Decision Requests 0
Total Time 98.944994ms

Benchmark Statistics

Name № Requests Avg Duration Min Duration Max Duration

Bulk Benchmark Results

Metric Value
Total Decrypts 100
Successful Decrypts 100
Failed Decrypts 0
Total Time 387.588701ms
Throughput 258.01 requests/second

TDF3 Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 5000
Failed Requests 0
Concurrent Requests 50
Total Time 41.248721543s
Average Latency 410.97311ms
Throughput 121.22 requests/second

@github-actions
Copy link
Copy Markdown
Contributor

⚠️ Govulncheck found vulnerabilities ⚠️

The following modules have known vulnerabilities:

  • examples
  • sdk
  • service
  • lib/fixtures
  • tests-bdd

See the workflow run for details.

@pflynn-virtru pflynn-virtru enabled auto-merge April 16, 2026 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants