Logging failures to refresh metadata by cbb330 · Pull Request #385 · linkedin/openhouse

cbb330 · 2025-10-27T04:29:47Z

Summary

problem - when an exception throws under super.refreshFromMetadataLocation , there is no failure log which shows how long it took to fail. Adding logging here for the total duration Helps us measure how long a request takes when it fails to retrieve metadata from hdfs. This is a common cause for 504 gateway timeouts. Without this, we can only guess the total duration by correlating it with metrics. The metrics don’t have a failure/success dimension, and also cannot be isolated to a single request/table metadata file.

solution - try/catch and log before throwing the existing error. There is another PR which aims to reduce the total duration via better configured timeouts+retries at hdfs layer #386

At the same time, adding a missing log message for catalog update, for future operations

Changes

Testing Done

existing tests should cover this

Manually Tested on local docker setup. Please include commands ran, and their output.
Added new tests for the changes made.
Updated existing tests to reflect the changes made.
No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
Some other form of testing like staging or soak time in production. Please explain.

Additional Information

Breaking Changes
Deprecations
Large PR broken into smaller PRs, and PR plan linked in the description.

abhisheknath2011

Thanks @cbb330 for adding logs.

adding logs

11c9543

cbb330 force-pushed the fix-timeouts branch from 041df2b to 11c9543 Compare October 27, 2025 05:52

cbb330 changed the title ~~WIP: addressing frequent storage timeouts~~ logging failures for catalog refresh Oct 27, 2025

cbb330 changed the title ~~logging failures for catalog refresh~~ Logging failures to refresh metadata Oct 27, 2025

abhisheknath2011 approved these changes Oct 27, 2025

View reviewed changes

cbb330 merged commit daed157 into linkedin:main Oct 27, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logging failures to refresh metadata#385

Logging failures to refresh metadata#385
cbb330 merged 1 commit intolinkedin:mainfrom
cbb330:fix-timeouts

cbb330 commented Oct 27, 2025 •

edited

Loading

Uh oh!

abhisheknath2011 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cbb330 commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing Done

Additional Information

Uh oh!

abhisheknath2011 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cbb330 commented Oct 27, 2025 •

edited

Loading