Skip to content

Add support for dynamically querying multiple tenants in trace domain#229

Draft
shwetaap wants to merge 1 commit intoopenshift:mainfrom
shwetaap:trace-tenant
Draft

Add support for dynamically querying multiple tenants in trace domain#229
shwetaap wants to merge 1 commit intoopenshift:mainfrom
shwetaap:trace-tenant

Conversation

@shwetaap
Copy link
Copy Markdown
Contributor

No description provided.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 20, 2026

Walkthrough

This pull request introduces trace store configuration management for Korrel8r. It adds a new StoreConfig interface and replaceTraceStore function to the client, creates utility functions to extract trace context from URL parameters and build store configurations, introduces Redux state management for trace context, and integrates trace store updates into the Korrel8rPanel component.

Changes

Cohort / File(s) Summary
Korrel8r Client API
web/src/korrel8r-client.ts
Introduces StoreConfig interface with domain, optional tempoStack and certificateAuthority properties. Adds replaceTraceStore() async function that sends HTTP PUT to /stores/trace endpoint with CSRF token header and error handling.
Trace Utilities
web/src/utils/traceStoreUtils.ts
Creates new module exporting TraceContext interface and two utility functions: extractTraceContext() parses URL query parameters (namespace, name, tenant), and buildTraceStoreConfig() constructs a store config with computed Tempo endpoint URL and Kubernetes CA certificate path.
Redux State Management
web/src/redux-actions.ts, web/src/redux-reducers.ts
Adds SetTraceContext action type and corresponding action creator setTraceContext(). Extends reducer to initialize and handle traceContext state mutations.
Component Integration
web/src/components/Korrel8rPanel.tsx
Rewires correlation-fetch effect to conditionally update trace store before executing graph request. Refactors effect logic to extract performFetch() helper, adds Promise.race with 3s timeout for store replacement, and updates dependency list to include dispatch and stored trace context.
Documentation
web/src/korrel8r/trace.ts
Replaces TODO comments with NOTE explaining Tempo constants and their relationship to trace context sourced from URL parameters and store configuration updates.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot requested review from alanconway and zhuje March 20, 2026 18:02
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 20, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: shwetaap

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 20, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web/src/hooks/useTroubleshootingPanel.tsx`:
- Around line 16-35: The open callback currently awaits replaceTraceStore(...)
which can stall and prevent dispatch(openTP()) from running; update
useTroubleshootingPanel's open function so the replaceTraceStore call is bounded
by a timeout/fallback (e.g., Promise.race with a short timeout or a cancellable
promise) and ensure dispatch(openTP()) always runs after the timeout/fallback
rather than waiting indefinitely; locate the open function and adjust the logic
around extractTraceContext(), buildTraceStoreConfig(), replaceTraceStore(), and
dispatch(openTP()) so failures or timeouts in replaceTraceStore do not block
opening the panel.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0c473d50-d4ce-48e4-a5ee-e449dc6660de

📥 Commits

Reviewing files that changed from the base of the PR and between 877f363 and 09fb0ee.

📒 Files selected for processing (3)
  • web/src/hooks/useTroubleshootingPanel.tsx
  • web/src/korrel8r-client.ts
  • web/src/korrel8r/trace.ts

Comment on lines 16 to 35
const open = React.useCallback(async () => {
// Check if we're on a traces page with tenant information
const traceContext = extractTraceContext();

if (traceContext) {
try {
// Update korrel8r's trace store to match the selected tenant
const storeConfig = buildTraceStoreConfig(traceContext);
await replaceTraceStore(storeConfig);
// eslint-disable-next-line no-console
console.log('Trace store updated for tenant:', traceContext.tenant);
} catch (error) {
// Log error but don't block panel from opening
// The panel will still work with other domains even if trace store update fails
// eslint-disable-next-line no-console
console.error('Failed to update trace store:', error);
}
}

dispatch(openTP());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Panel opening can hang indefinitely on a stalled network call.

At Line 24, await replaceTraceStore(...) runs before dispatch(openTP()), and there is no timeout. If the request never resolves, the panel never opens.

Suggested fix (bound wait time before fallback)
   const open = React.useCallback(async () => {
@@
     if (traceContext) {
       try {
         // Update korrel8r's trace store to match the selected tenant
         const storeConfig = buildTraceStoreConfig(traceContext);
-        await replaceTraceStore(storeConfig);
+        await Promise.race([
+          replaceTraceStore(storeConfig),
+          new Promise((_, reject) =>
+            setTimeout(() => reject(new Error('Timed out replacing trace store')), 3000),
+          ),
+        ]);
         // eslint-disable-next-line no-console
         console.log('Trace store updated for tenant:', traceContext.tenant);
       } catch (error) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const open = React.useCallback(async () => {
// Check if we're on a traces page with tenant information
const traceContext = extractTraceContext();
if (traceContext) {
try {
// Update korrel8r's trace store to match the selected tenant
const storeConfig = buildTraceStoreConfig(traceContext);
await replaceTraceStore(storeConfig);
// eslint-disable-next-line no-console
console.log('Trace store updated for tenant:', traceContext.tenant);
} catch (error) {
// Log error but don't block panel from opening
// The panel will still work with other domains even if trace store update fails
// eslint-disable-next-line no-console
console.error('Failed to update trace store:', error);
}
}
dispatch(openTP());
const open = React.useCallback(async () => {
// Check if we're on a traces page with tenant information
const traceContext = extractTraceContext();
if (traceContext) {
try {
// Update korrel8r's trace store to match the selected tenant
const storeConfig = buildTraceStoreConfig(traceContext);
await Promise.race([
replaceTraceStore(storeConfig),
new Promise((_, reject) =>
setTimeout(() => reject(new Error('Timed out replacing trace store')), 3000),
),
]);
// eslint-disable-next-line no-console
console.log('Trace store updated for tenant:', traceContext.tenant);
} catch (error) {
// Log error but don't block panel from opening
// The panel will still work with other domains even if trace store update fails
// eslint-disable-next-line no-console
console.error('Failed to update trace store:', error);
}
}
dispatch(openTP());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/hooks/useTroubleshootingPanel.tsx` around lines 16 - 35, The open
callback currently awaits replaceTraceStore(...) which can stall and prevent
dispatch(openTP()) from running; update useTroubleshootingPanel's open function
so the replaceTraceStore call is bounded by a timeout/fallback (e.g.,
Promise.race with a short timeout or a cancellable promise) and ensure
dispatch(openTP()) always runs after the timeout/fallback rather than waiting
indefinitely; locate the open function and adjust the logic around
extractTraceContext(), buildTraceStoreConfig(), replaceTraceStore(), and
dispatch(openTP()) so failures or timeouts in replaceTraceStore do not block
opening the panel.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
web/src/hooks/useTroubleshootingPanel.tsx (1)

16-35: ⚠️ Potential issue | 🟠 Major

Panel open is still vulnerable to indefinite wait on network stall.

dispatch(openTP()) is gated behind await replaceTraceStore(...) (Line 24). If that request hangs, the panel open action can hang too.

Suggested fix (bounded wait before fallback)
   const open = React.useCallback(async () => {
@@
     if (traceContext) {
       try {
         // Update korrel8r's trace store to match the selected tenant
         const storeConfig = buildTraceStoreConfig(traceContext);
-        await replaceTraceStore(storeConfig);
+        await Promise.race([
+          replaceTraceStore(storeConfig),
+          new Promise((_, reject) =>
+            setTimeout(() => reject(new Error('Timed out replacing trace store')), 3000),
+          ),
+        ]);
         // eslint-disable-next-line no-console
         console.log('Trace store updated for tenant:', traceContext.tenant);
       } catch (error) {

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/hooks/useTroubleshootingPanel.tsx` around lines 16 - 35, The open
callback can hang because replaceTraceStore is awaited with no timeout; change
open (the async function using extractTraceContext, buildTraceStoreConfig and
replaceTraceStore) to bound the wait by wrapping replaceTraceStore in a timeout
(e.g., Promise.race between replaceTraceStore(...) and a short reject/resolve
timeout) or run it without awaiting by letting it proceed in background, but
ensure dispatch(openTP()) is always invoked after the bounded wait or in a
finally block so the panel opens even if replaceTraceStore stalls; locate the
replaceTraceStore call inside open and implement the timeout/fallback logic
there.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web/src/utils/traceStoreUtils.ts`:
- Around line 17-23: The query params namespace, name and tenant are used raw
when building the Tempo endpoint (symbols: namespace, name, tenant) which can
create malformed hosts/paths; validate that namespace and name conform to
DNS/host-label rules (e.g., allowed chars and length) and return null on invalid
input, and encode the tenant when interpolating into the path using
path-encoding (e.g., encodeURIComponent) so special chars are safe; update the
code that builds the host/path to perform these checks and encoding before
returning the endpoint (fail closed by returning null if validation fails).

---

Duplicate comments:
In `@web/src/hooks/useTroubleshootingPanel.tsx`:
- Around line 16-35: The open callback can hang because replaceTraceStore is
awaited with no timeout; change open (the async function using
extractTraceContext, buildTraceStoreConfig and replaceTraceStore) to bound the
wait by wrapping replaceTraceStore in a timeout (e.g., Promise.race between
replaceTraceStore(...) and a short reject/resolve timeout) or run it without
awaiting by letting it proceed in background, but ensure dispatch(openTP()) is
always invoked after the bounded wait or in a finally block so the panel opens
even if replaceTraceStore stalls; locate the replaceTraceStore call inside open
and implement the timeout/fallback logic there.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: df6ff5bf-7c89-4f8d-85b1-222702475269

📥 Commits

Reviewing files that changed from the base of the PR and between 09fb0ee and 98032c9.

📒 Files selected for processing (4)
  • web/src/hooks/useTroubleshootingPanel.tsx
  • web/src/korrel8r-client.ts
  • web/src/korrel8r/trace.ts
  • web/src/utils/traceStoreUtils.ts
✅ Files skipped from review due to trivial changes (1)
  • web/src/korrel8r/trace.ts

Comment on lines +17 to +23
const namespace = searchParams.get('namespace');
const name = searchParams.get('name');
const tenant = searchParams.get('tenant');

// If any of these are missing, return null (not on traces page or no tenant selected)
if (!namespace || !name || !tenant) {
return null;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate/encode URL-derived context before building the Tempo endpoint.

namespace, name, and tenant come from query params, but Line 37 interpolates them raw into host/path. This can produce malformed targets; tenant should be path-encoded, and host labels should be validated.

Suggested fix (input guard + path encoding)
 export const extractTraceContext = (): TraceContext | null => {
   const searchParams = new URLSearchParams(window.location.search);
+  const isDnsLabel = (value: string): boolean =>
+    /^[a-z0-9]([-a-z0-9]*[a-z0-9])?$/.test(value);

@@
-  if (!namespace || !name || !tenant) {
+  if (!namespace || !name || !tenant || !isDnsLabel(namespace) || !isDnsLabel(name)) {
     return null;
   }

   return { namespace, name, tenant };
 };
@@
 export const buildTraceStoreConfig = (context: TraceContext): StoreConfig => {
   const { namespace, name, tenant } = context;
+  const encodedTenant = encodeURIComponent(tenant);

@@
-  const tempoStackURL = `https://tempo-${name}-gateway.${namespace}.svc.cluster.local:8080/api/traces/v1/${tenant}/tempo/api/search`;
+  const tempoStackURL = `https://tempo-${name}-gateway.${namespace}.svc.cluster.local:8080/api/traces/v1/${encodedTenant}/tempo/api/search`;

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

Also applies to: 37-37

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/utils/traceStoreUtils.ts` around lines 17 - 23, The query params
namespace, name and tenant are used raw when building the Tempo endpoint
(symbols: namespace, name, tenant) which can create malformed hosts/paths;
validate that namespace and name conform to DNS/host-label rules (e.g., allowed
chars and length) and return null on invalid input, and encode the tenant when
interpolating into the path using path-encoding (e.g., encodeURIComponent) so
special chars are safe; update the code that builds the host/path to perform
these checks and encoding before returning the endpoint (fail closed by
returning null if validation fails).

Copy link
Copy Markdown
Contributor

@alanconway alanconway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was assuming there was a persistent notion of the "current" tempo instance, but it looks like the user has to pick the instance every time they open the panel.
Maybe korrel8r should create a store for all the tempo instances from the tracing UIPlugin instead. In that case, maybe its a job for the COO, not the panel.
I'm getting a bit nervous about making korrel8r responsible for turning transient user actions as persistent state.

Signed-off-by: Shweta Padubidri <spadubid@redhat.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
web/src/components/Korrel8rPanel.tsx (1)

178-185: Consider whether failed store updates should be retried.

Currently, dispatch(setTraceContext(currentTraceContext)) is called in the catch block, which marks the trace context as "processed" even when the backend update fails. This prevents retries on subsequent effect runs, which may be intentional to avoid retry loops, but also means transient failures won't auto-recover.

If the intent is to retry on next user action (e.g., refresh), the current approach is fine. If automatic retry for transient failures is desired, consider keeping storedTraceContext unchanged on failure so the next effect run retries.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/components/Korrel8rPanel.tsx` around lines 178 - 185, The catch block
currently dispatches setTraceContext(currentTraceContext) which marks the trace
as processed even when the backend update fails; change the flow so that
dispatch(setTraceContext(...)) is only called on successful store updates (i.e.,
move/remove the dispatch from the .catch handler in Korrel8rPanel.tsx), leaving
storedTraceContext unchanged on failure so the next effect run will retry; if
you want explicit retries instead, implement a limited retry mechanism around
performFetch or the update call rather than marking the context as processed on
error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@web/src/components/Korrel8rPanel.tsx`:
- Around line 178-185: The catch block currently dispatches
setTraceContext(currentTraceContext) which marks the trace as processed even
when the backend update fails; change the flow so that
dispatch(setTraceContext(...)) is only called on successful store updates (i.e.,
move/remove the dispatch from the .catch handler in Korrel8rPanel.tsx), leaving
storedTraceContext unchanged on failure so the next effect run will retry; if
you want explicit retries instead, implement a limited retry mechanism around
performFetch or the update call rather than marking the context as processed on
error.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8898c933-2a00-46d6-a922-65c3ba45ca65

📥 Commits

Reviewing files that changed from the base of the PR and between 98032c9 and 47a6d61.

📒 Files selected for processing (6)
  • web/src/components/Korrel8rPanel.tsx
  • web/src/korrel8r-client.ts
  • web/src/korrel8r/trace.ts
  • web/src/redux-actions.ts
  • web/src/redux-reducers.ts
  • web/src/utils/traceStoreUtils.ts
✅ Files skipped from review due to trivial changes (1)
  • web/src/korrel8r/trace.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • web/src/utils/traceStoreUtils.ts
  • web/src/korrel8r-client.ts

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 23, 2026

@shwetaap: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@PeterYurkovich
Copy link
Copy Markdown
Contributor

I agree, before moving forward with this PR lets figure out how to handle the backend and the selection of the tempo instance

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 24, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 24, 2026

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@alanconway
Copy link
Copy Markdown
Contributor

@PeterYurkovich can you make this PR a draft? Apparently I have "asked for permission" to use this feature via some automated github something but I can't do it.

/draft

@PeterYurkovich PeterYurkovich marked this pull request as draft March 25, 2026 14:25
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 25, 2026
@PeterYurkovich
Copy link
Copy Markdown
Contributor

Moved it to draft

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants