Skip to content

feat(plugin-iceberg): Bound MV refresh#27774

Open
tdcmeehan wants to merge 5 commits into
prestodb:masterfrom
tdcmeehan:iceseq_mvr
Open

feat(plugin-iceberg): Bound MV refresh#27774
tdcmeehan wants to merge 5 commits into
prestodb:masterfrom
tdcmeehan:iceseq_mvr

Conversation

@tdcmeehan
Copy link
Copy Markdown
Contributor

@tdcmeehan tdcmeehan commented May 12, 2026

Description

Adds a bounded incremental-refresh mode for Iceberg materialized views. A new max_snapshots_per_refresh MV property (with a matching session/config default) caps how far each base table can advance in a single REFRESH MATERIALIZED VIEW, using the _last_updated_sequence_number row-lineage column to limit the refresh scan. Remaining backlog is picked up by subsequent refreshes.

Motivation and Context

On busy tables an MV can fall far behind, and a single REFRESH may be too large to complete in one statement. This lets operators chunk catch-up into bounded steps. V2 tables, which lack row lineage, fall back to unbounded refresh.

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

Iceberg Connector Changes
* Add ``max_snapshots_per_refresh`` materialized view property to bound how far each base table advances per ``REFRESH MATERIALIZED VIEW``. Defaults to ``0`` (unbounded). Requires Iceberg V3 row lineage; V2 tables fall back to unbounded refresh.
* Add ``iceberg.materialized-view-default-max-snapshots-per-refresh`` config property and matching session property to set the default bound.

Summary by Sourcery

Add bounded incremental refresh support for Iceberg materialized views, including configuration, planning, and metadata handling to cap the number of snapshots consumed per refresh and integrate with row lineage.

New Features:

  • Introduce a max_snapshots_per_refresh materialized view property for Iceberg to limit how far each base table can advance per refresh.
  • Add an Iceberg catalog/session configuration property to set the default maximum snapshots per refresh when the MV does not override it.

Enhancements:

  • Track per-base snapshot watermarks in Iceberg view properties and derive bounded target snapshots using row lineage where supported.
  • Extend materialized view status and planner logic to carry incremental-refresh predicates that are applied only during refresh, while keeping stitching reads unbounded.
  • Ensure statistics and partition-change detection ignore metadata-column predicates that Iceberg cannot bind, and gate lineage-based features on format version support.

Tests:

  • Add REST materialized view integration tests covering bounded refresh behavior, multi-base independence, compaction, rollback, V2 fallback, session defaults, overrides, partitioned MVs, and stitching semantics.
  • Add planner tests verifying that bounded refreshes push _last_updated_sequence_number predicates into table scans, while unbounded and V2 refreshes and stitched reads do not.
  • Add unit tests for Iceberg MV properties and resolution of max_snapshots_per_refresh, including validation rules and session/default interactions.
  • Update Iceberg config tests to cover the new materialized-view-default-max-snapshots-per-refresh setting.

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label May 12, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented May 12, 2026

Reviewer's Guide

Implements bounded incremental refresh for Iceberg materialized views using a new max_snapshots_per_refresh MV property plus matching config/session default, threads the bound through metadata/status computation and the planner so that refresh scans are capped per base table using Iceberg V3 row lineage, and adds extensive tests around bounded behavior, lineage, compaction, rollback, partitioned MVs, and configuration resolution.

Sequence diagram for bounded Iceberg MV refresh planning

sequenceDiagram
    actor User
    participant Session
    participant IcebergMetadata as IcebergAbstractMetadata
    participant Status as MaterializedViewStatus
    participant Planner as IncrementalRefreshRule
    participant Rewriter as IncrementalRefreshPredicateRewriter
    participant Scan as TableScanNode

    User->>Session: REFRESH MATERIALIZED VIEW
    Session->>IcebergMetadata: getMaterializedViewStatus(session, mvName)
    IcebergMetadata->>IcebergMetadata: resolveMaxSnapshotsPerRefresh(session, viewProperties)
    IcebergMetadata->>IcebergMetadata: chooseTargetSnapshot(baseTable, watermark, maxSnapshots)
    IcebergMetadata-->>Session: MaterializedViewStatus(partitionDisjuncts, incrementalRefreshPredicate)

    Session->>Planner: apply(RefreshMaterializedViewNode)
    Planner->>Status: getPartitionsFromBaseTables()
    Planner->>Planner: buildDeltaPlanForRefresh(...)
    Planner->>Planner: applyIncrementalRefreshPredicates(plan, incrementalRefreshPredicates, ...)
    Planner->>Rewriter: SimplePlanRewriter.rewriteWith(perBasePredicates, plan)
    Rewriter->>Scan: visitTableScan(TableScanNode)
    Rewriter-->>Planner: FilterNode(TableScanNode)
    Planner-->>Session: plan with bounded base table scans
Loading

File-Level Changes

Change Details Files
Introduce max_snapshots_per_refresh materialized view property, default resolution, and persistence in Iceberg metadata and configuration/session settings.
  • Add MAX_SNAPSHOTS_PER_REFRESH MV property with validation and accessor in IcebergMaterializedViewProperties and persist it as PRESTO_MATERIALIZED_VIEW_MAX_SNAPSHOTS_PER_REFRESH when creating views.
  • Add IcebergConfig and IcebergSessionProperties plumbing for iceberg.materialized-view-default-max-snapshots-per-refresh, including default value, validation, and session accessor getMaterializedViewDefaultMaxSnapshotsPerRefresh.
  • Implement resolveMaxSnapshotsPerRefresh in IcebergAbstractMetadata to pick the persisted MV property if present or fall back to the session default, and wire this into getMaterializedViewStatus and finishRefreshMaterializedView.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergMaterializedViewProperties.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergConfig.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergSessionProperties.java
presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergConfig.java
presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergMaterializedViewProperties.java
Add bounded refresh snapshot selection and watermark management using Iceberg V3 row lineage, with rollback/ancestry safety and V2 fallback.
  • Introduce MIN_FORMAT_VERSION_FOR_ROW_LINEAGE and use it to gate lineage-based behavior, including early returns for tables without lineage in validateTableForPresto and chooseTargetSnapshot.
  • Implement chooseTargetSnapshot to compute a bounded target snapshot between the stored watermark and HEAD using ancestorIdsBetween, honoring max_snapshots_per_refresh, skipping when watermark is off HEAD’s ancestry, or when the table lacks lineage.
  • Update getMaterializedViewStatus to compute per-base targetSnapshotId, detect rollbacks where watermarks are off ancestry and mark the MV NOT_MATERIALIZED, and build MaterializedDataPredicates with an incrementalRefreshPredicate over _last_updated_sequence_number when a bound applies.
  • Update finishRefreshMaterializedView to calculate new base watermarks per table using chooseTargetSnapshot and the previous watermark, ensuring that bounds are respected and rollbacks lead to full catch-up to the new head snapshot.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergUtil.java
Extend MaterializedViewStatus and the planner to push incremental refresh predicates on _last_updated_sequence_number into base table scans only during refresh, without affecting stitching plans.
  • Extend MaterializedViewStatus.MaterializedDataPredicates to carry an incrementalRefreshPredicate TupleDomain in addition to partition disjuncts and key column names.
  • In IncrementalRefreshRule, extract any non-trivial incrementalRefreshPredicate per base table and introduce applyIncrementalRefreshPredicates to wrap delta or full-refresh plans, or fallback full-refresh plans, with base-table filters derived from these predicates.
  • Implement IncrementalRefreshPredicateRewriter, a SimplePlanRewriter that finds TableScanNodes by SchemaTableName, ensures all predicate columns are available (adding hidden columns to scan outputs as needed), builds a RowExpression from the TupleDomain, and wraps the scan in a FilterNode plus optional ProjectNode to restore original outputs.
  • Ensure that stitching queries (e.g., reading the MV with USE_STITCHING) do not see or apply the incrementalRefreshPredicate, so stale-read stitching still reads all needed base rows.
presto-spi/src/main/java/com/facebook/presto/spi/MaterializedViewStatus.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/materializedview/IncrementalRefreshRule.java
Adjust Iceberg statistics and constraint handling so lineage predicates are honored for refresh pruning but not pushed into metadata paths that cannot handle metadata columns.
  • Update TableStatisticsMaker.getDataTableSummary to strip metadata-column filters by intersecting with non-metadata constraints via IcebergUtil.getNonMetadataColumnConstraints before converting to an Iceberg expression.
  • Rely on the new incrementalRefreshPredicate plumbing so that _last_updated_sequence_number predicates are applied at the scan/filter level instead of in manifest-level or metadata-only filters that would otherwise reject metadata columns.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/TableStatisticsMaker.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergUtil.java
Add comprehensive tests for bounded MV refresh behavior, lineage, config resolution, and planning of sequence predicates for Iceberg REST and optimizer.
  • Extend TestIcebergRestMaterializedViews with a RESTCatalog, schema constant, helpers to read head snapshot, current snapshot, and base watermarks, and a suite of tests covering bounded single/multi-base refresh, compaction, V2 fallback, session defaults vs table overrides, partitioned MVs, stitching semantics, rollback handling, property persistence, invalid bounds, and hidden sequence column usage.
  • Add TestIcebergMaterializedViewOptimizer tests asserting that bounded refresh pushes a _last_updated_sequence_number <= constant predicate into the layout and scan constraints, while unbounded refresh, V2 bounded refresh, and stitched queries do not carry sequence predicates on the base scan.
  • Add TestIcebergMaterializedViewProperties to validate the new MV property’s decoding, validation, and resolveMaxSnapshotsPerRefresh behavior, including invalid values, session-default behavior, and precedence of persisted properties.
  • Update TestIcebergConfig to cover the new materialized-view-default-max-snapshots-per-refresh config property in defaults and explicit mappings.
presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergRestMaterializedViews.java
presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergMaterializedViewOptimizer.java
presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergMaterializedViewProperties.java
presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergConfig.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Advance each base's watermark to the N-th-oldest ancestor in
(watermark, HEAD] and push a _last_updated_sequence_number bound into
the refresh scan via existing V3 row-lineage pushdown.
@tdcmeehan tdcmeehan changed the title feat(plugin-iceberg): Bound MV refresh via max_snapshots_per_refresh feat(plugin-iceberg): Bound MV refresh May 14, 2026
@tdcmeehan tdcmeehan marked this pull request as ready for review May 14, 2026 15:43
@prestodb-ci prestodb-ci requested review from a team, aaneja and wanglinsong and removed request for a team May 14, 2026 15:43
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In IcebergAbstractMetadata.getMaterializedViewStatus, the rollback/branch guard uses isAncestorOf(baseIcebergTable, currentSnapshotId, recordedSnapshotId) but SnapshotUtil.isAncestorOf expects (ancestor, descendant); since recordedSnapshotId is older, this arg order appears reversed and will incorrectly force NOT_MATERIALIZED whenever the base advances, effectively disabling incremental refresh after the first change.
  • chooseTargetSnapshot already contains an ancestry guard (isAncestorOf(baseTable, head, watermark)), but getMaterializedViewStatus adds a separate ancestry check with different parameter ordering and semantics; consider centralizing this logic in chooseTargetSnapshot (or a shared helper) to avoid divergence and subtle bugs in rollback/branch handling.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `IcebergAbstractMetadata.getMaterializedViewStatus`, the rollback/branch guard uses `isAncestorOf(baseIcebergTable, currentSnapshotId, recordedSnapshotId)` but `SnapshotUtil.isAncestorOf` expects `(ancestor, descendant)`; since `recordedSnapshotId` is older, this arg order appears reversed and will incorrectly force `NOT_MATERIALIZED` whenever the base advances, effectively disabling incremental refresh after the first change.
- `chooseTargetSnapshot` already contains an ancestry guard (`isAncestorOf(baseTable, head, watermark)`), but `getMaterializedViewStatus` adds a separate ancestry check with different parameter ordering and semantics; consider centralizing this logic in `chooseTargetSnapshot` (or a shared helper) to avoid divergence and subtle bugs in rollback/branch handling.

## Individual Comments

### Comment 1
<location path="presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergRestMaterializedViews.java" line_range="596-604" />
<code_context>
+        }
+    }
+
+    private static void assertAdvanceLeq(long oldWatermark, long newWatermark, long head, int bound, String baseTable)
+    {
+        if (oldWatermark == newWatermark) {
+            fail("expected watermark to advance from " + oldWatermark + " on base " + baseTable);
+        }
+        if (newWatermark == head) {
+            return;
+        }
+        assertNotEquals(newWatermark, oldWatermark);
+    }
 }
</code_context>
<issue_to_address>
**issue (testing):** assertAdvanceLeq helper does not actually enforce the "<= bound" invariant

The helper’s behavior doesn’t match its name or call-site intent: it only checks that the watermark advanced (and possibly hit `head`), but never that it advanced by *no more than* `bound` snapshots. A change that advances by 10 when `bound=2` would still pass. Please tighten this by explicitly asserting the step count is `<= bound` (e.g., via `ancestorIdsBetween(head, oldWatermark, ...)` and checking `newWatermark` is within `bound` steps), or by exposing a small test hook that reveals the chosen target snapshot so the boundedness can be asserted directly.
</issue_to_address>

### Comment 2
<location path="presto-docs/src/main/sphinx/connector/iceberg.rst" line_range="551" />
<code_context>
-                                                      is hashed to a particular node when determining the which worker to
-                                                      assign a split to. Splits which read data from the same file within
-                                                      the same chunk will hash to the same node. A smaller chunk size will
-                                                      result in a higher probability splits being distributed evenly across
-                                                      the cluster, but reduce locality. 
-                                                      See :ref:`develop/connectors:Node Selection Strategy`.
</code_context>
<issue_to_address>
**issue (typo):** Insert "of" after "probability" for correct phrasing.

```suggestion
                                                      result in a higher probability of splits being distributed evenly across
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +596 to +604
private static void assertAdvanceLeq(long oldWatermark, long newWatermark, long head, int bound, String baseTable)
{
if (oldWatermark == newWatermark) {
fail("expected watermark to advance from " + oldWatermark + " on base " + baseTable);
}
if (newWatermark == head) {
return;
}
assertNotEquals(newWatermark, oldWatermark);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (testing): assertAdvanceLeq helper does not actually enforce the "<= bound" invariant

The helper’s behavior doesn’t match its name or call-site intent: it only checks that the watermark advanced (and possibly hit head), but never that it advanced by no more than bound snapshots. A change that advances by 10 when bound=2 would still pass. Please tighten this by explicitly asserting the step count is <= bound (e.g., via ancestorIdsBetween(head, oldWatermark, ...) and checking newWatermark is within bound steps), or by exposing a small test hook that reveals the chosen target snapshot so the boundedness can be asserted directly.

is hashed to a particular node when determining the which worker to
assign a split to. Splits which read data from the same file within
the same chunk will hash to the same node. A smaller chunk size will
result in a higher probability splits being distributed evenly across
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (typo): Insert "of" after "probability" for correct phrasing.

Suggested change
result in a higher probability splits being distributed evenly across
result in a higher probability of splits being distributed evenly across

steveburnett
steveburnett previously approved these changes May 14, 2026
@tdcmeehan
Copy link
Copy Markdown
Contributor Author

@steveburnett I refactored the docs a bit and added more user facing docs, let me know what you think!

@tdcmeehan tdcmeehan requested a review from agrawalreetika May 14, 2026 18:57
Comment thread presto-docs/src/main/sphinx/connector/iceberg.rst Outdated
steveburnett
steveburnett previously approved these changes May 14, 2026
Copy link
Copy Markdown
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants