Skip to content

Reduce redundant refreshMetadata and HTS lookups in PUT request#509

Draft
cbb330 wants to merge 1 commit intolinkedin:mainfrom
cbb330:chbush/reduce-refresh-metadata
Draft

Reduce redundant refreshMetadata and HTS lookups in PUT request#509
cbb330 wants to merge 1 commit intolinkedin:mainfrom
cbb330:chbush/reduce-refresh-metadata

Conversation

@cbb330
Copy link
Collaborator

@cbb330 cbb330 commented Mar 21, 2026

Summary

A single PUT /tables request for an existing table was triggering 4 doRefresh calls (each hitting HTS + HDFS) and ~7 HTS GET lookups, adding ~200ms+ of unnecessary latency per request (~451ms total observed in Jaeger trace with 42 spans).

This PR reduces redundant IO in three ways:

  • Pass pre-loaded Table into save(): The service layer already loads the table via findById, then save() called existsById + catalog.loadTable() again. Now the raw Iceberg Table from the initial lookup is passed through and reused. (Addresses the FIXME at TablesServiceImpl)
  • Cache FileIO resolution: resolveFileIO() in OpenHouseInternalCatalog was doing an HTS lookup on every call. Added a 30s TTL Guava cache since storage type is immutable per table.
  • Cache committed metadata location: After doCommit saves to HTS, the post-commit doRefresh was doing another HTS lookup to get the metadata location we just wrote. Now we cache it and skip the lookup.

Expected reduction: ~7 HTS GETs → ~2-3, ~4 HDFS reads → ~1-2 (~50% latency reduction)

Test plan

  • Compilation passes for internalcatalog and tables modules
  • Existing integration tests pass (TablesControllerTest, SnapshotsControllerTest)
  • Capture Jaeger trace of PUT request and verify doRefresh count drops from 4 to 2
  • Verify total span count drops significantly from 42

During a single PUT /tables request for an existing table, there were 4
doRefresh calls (each hitting HTS + HDFS) and ~7 HTS GET lookups. This
was adding ~200ms+ of unnecessary latency per request.

Changes:
- Pass pre-loaded Iceberg Table from service layer into save() to skip
  redundant existsById and catalog.loadTable calls (addresses the FIXME
  at TablesServiceImpl line 136)
- Cache FileIO resolution in OpenHouseInternalCatalog with a 30s TTL
  Guava cache to avoid repeated HTS lookups for storage type
- Cache committed metadata location in OpenHouseInternalTableOperations
  so post-commit doRefresh skips the HTS lookup

Expected reduction: ~7 HTS GETs -> ~2-3, ~4 HDFS reads -> ~1-2
@cbb330 cbb330 force-pushed the chbush/reduce-refresh-metadata branch from fe80df8 to b646513 Compare March 21, 2026 01:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant