POC Row event sequence stamping for submissions by brontolosone · Pull Request #1699 · getodk/central-backend

brontolosone · 2025-11-28T18:16:26Z

Related: getodk/central#1439 . With the note that here we are not necessarily going for fast revalidation, explained below.

This is the DB part of per-row-eventstamps for submissions. We can do other tables as well but since this is a POC, I'm leaving it at this for now.

I've chosen to try to do it with an application-transparent approach. The appeal is that it "just" works, as evident from the absence of any application modifications.

But to make that bliss possible, I went down the path of using a deferred constraint trigger to apply the stamps automatically at commit time, and there's some trickyness around that which led to some unexpected code. It probably needs an in-person walkthrough to answer all the "why not just..." type questions that would also jump to my mind if I hadn't just tried to "just ..." and found out why one can't "just ...".

Approach

For max(event) (+row count, to account for deletions) on a projection to work as an etag (or cursor), we need strict ordering, so we need to prevent later-committed-rows having an earlier-generated (or, well, "sorted-earlier") eventstamp. Can't use a sequence. Can't use timestamps.
The only way to do so is to have a critical section that effectively serializes commits (really, in PostgreSQL, IIUC, commits are always per-table serialized, but here we want to control the order.
So the critical section should last from acquiring the new event sequence number until transaction commit. When the transaction commits or rolls back, the critical section would unlock, and then the next transaction (which can be a concurrent transaction that was blocking on the critical section) can do the same thing. Corollary: whichever transaction enters the critical section first (A), commits before any other transaction (B), as (B) can not commit until it itself enters the critical section, but that is locked until (A) commits (or rolls back). This is what we want! 🥳
For performance under concurrency, we want that critical section window to be as small as possible. So we must be able to fuss around inserting/modifying rows, and only enter the critical section at the very last moment.
The primary locked resource is an bigint event sequence counter (just a 1-column row in a 1-row table) that is incremented by 1 for every transaction commit event. This yields 2 additional properties for the "event tokens" that are not actually required for etags or cursors, but that are, I suppose, aesthetically pleasing:
1. Not-required property that's there anyway 1: They are monotonically incremented, and we shouldn't see gaps. If some transaction fails (and rolls back), the increment is also rolled back.
2. Not-required property that's there anyway 2: Anything part of the same externally-visible-effect-group (anything committed in 1 transaction) has the same event number. This property may yet show its use. For instance, if projects and forms are also eventstamped from the same counter, then if you want to know "did the project change since the last change to this collection of submissions", then you can simply check whether the project has a higher event stamp than the highest event stamp in your collection of submissions 👏 .
Preemptively Asked Questions on alternative tokens (now that we have a critical section in which we can synthesize or derive those anyway):
1. "Hey why not use a timestamp" — timestamps are not guaranteed to be distinct, especially with the truncation to millisecond-resolution that we're doing (which I find questionable since it's been done out of entanglement with a particular frontend expectation, but that's another story)
2. "Hey why not use the xid8 transaction IDs — two reasons: main one being that the semantics of those are that they can wrap around, vacuuming can do that. Transaction IDs are only required to be unique among all currently ongoing transactions, not among all transactions that ever happened. So we'd get duplication pretty quickly I fear. Second (and minor) reason is that they're not pretty.
3. "Hey shouldn't we start the counting at MIN_BIGINT (some negative number) so that we can do twice the number of (committed, observable) events before running into trouble" — Yeah we could but we'd forever be staring down very large negative numbers, so, trumped by aesthetics/DX I suppose?
If I want to be application-transparent, and want to postpone entering the critical section until the very last moment, AFAIK the only thing that lends itself to being used for that is a deferred constraint trigger. This forces our hand:
1. A deferred constraint trigger is always a per-row trigger. So whatever function it'll call, that function will be called for every row inserted/mutated in the transaction. This forces our hand if we want the (not strictly required!) property of "if it becomes visible at the same time, then it has the same event ID" because we'll then need to increment the event counter exactly once in the transaction whereas our trigger function may be called multiple times. So we'll need some extra logic to handle that. I've implemented that but we could kill it if we don't think it's worth it. But I think it's worth it, see "Not-required property that's there anyway 2" a bit higher up. This is where the business with the advisory lock comes from — it's taken upon processing the first row (first incantation of the deferred trigger function) and checked in every incantation to deduct if the increment has already happened in this transaction. The great thing about this is that similar triggers on separate tables can coordinate this way so that indeed should we choose to have eventstamps on projects, forms, whatever — then we can easily have them while preserving the "if it becomes visible at the same time, then it has the same event ID" property across all those tables. 🥳
2. As it is a constraint trigger, not a "normal" trigger, it does nothing with any modifications you make to NEW. But we can make it have side effects (booooo!), thus writing a table anyway, which is how the event stamps are actually set. It's a bit of a perversion but I haven't found any alternatives. This, again, forces our hand:
  1. Performance wise it's not ideal to run a query (though indexed, with guaranteed 1 result) for every row added/modified. But what can I do 🤷‍♂️
  2. And we need some trick to avoid recursion in the trigger. The trick I employ is to have two normal (non-deferred) row triggers that set the event column to NULL for any insert or update, and then let the deferred trigger only fire (and set the event) for those rows where the event is NULL. 👏 So that's where the two triggers (plus one trigger function) (blank_submissions_event_*) that superficially may seem like excessive machinery come from.

Performance impact

A quick & dirty peek (with 10000 submission POSTs, with the worker system turned off):

This branch,      1 thread:  216 submissions/s
                  8 threads: 617 submissions/s
Master (9183257), 1 thread:  253 submissions/s
                  8 threads: 624 submissions/s

Some impact is to be expected, there's no free lunch, we're doing more stuff. I'm not panicking.

I've clocked the overhead at 89µs per row by just timing updating 10_000 submission rows with a new timestamp, comparing master with this branch. But anyway we don't usually do such large updates.

What has been done to verify that this works as intended?

CI

Why is this the best possible solution? Were any other approaches considered?

There are many ways to stroke a cat. When it comes to building blocks for etags, another approach was (is?) in #1654. For the etag use case this approach here has a different granularity tradeoff, namely, it chooses maximum granularity (at the row level), whereas #1654 was granular to the actee level. Consequently with the approach here validating the etag of a projection requires reprojection (rerunning the query with its filters, and calculating the max(event) of the submissions selected), whereas in #1654 the etag is validated through a small index (and doesn't even hit the heap!).

Besides granularity for etags, there is the topic of suitability as a cursor. The hope is that we can use this safely as a cursor. IIUC we can (safely, under concurrency) but I'd feel better if some others come to to that same conclusion independently.

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

Does this change require updates to the API documentation? If so, please update docs/api.yaml as part of this PR.

Before submitting this PR, please make sure you have:

run make test and confirmed all checks still pass OR confirm CircleCI build passes
verified that any code from external sources are properly credited in comments or that everything is internally sourced

sadiqkhoja · 2025-12-16T03:43:23Z

+CREATE TABLE current_event (
+    event bigint NOT NULL
+)


When we will add this mechanism for other tables then what will be the value of this? I am assuming it will be just existing value + number of rows in that table

Up to us. We can choose to pretend that all rows in that other table were added at the same time (which is not what I did in the migration), in which case it'd be existing value + 1, or we can indeed do what you suggest.

Another thing we could do is to use 0 for everything that needs to be retroactively eventstamped. But then we'd impact the ability to use it as a cursor - the first chunk of a collection, for retro-applied eventstamps (existing deployments), for event 0, will be potentially huge. On the plus side we'd gain some measure of causal consistency, eg a submission won't exist "before" the project it belongs to. But that causal ordering can break anyway since only few of our tables are append-only (a project can be modified, in which case its eventstamp is going to be higher than that of any of the submissions that have been made for it, until those get modified).

brontolosone marked this pull request as ready for review November 28, 2025 18:27

brontolosone requested a review from sadiqkhoja November 28, 2025 18:55

brontolosone added this to ODK Central Nov 28, 2025

github-project-automation Bot moved this to 🕒 backlog in ODK Central Nov 28, 2025

brontolosone moved this from 🕒 backlog to ✏️ in progress in ODK Central Nov 28, 2025

brontolosone force-pushed the eventcounter-by-row branch 2 times, most recently from d9952fa to f673225 Compare November 29, 2025 12:58

sadiqkhoja reviewed Dec 3, 2025

View reviewed changes

Comment thread lib/model/migrations/20251127-01-submission-event-stamping-02.up.sql Outdated

sadiqkhoja reviewed Dec 3, 2025

View reviewed changes

Comment thread lib/model/migrations/20251127-01-submission-event-stamping-02.up.sql Outdated

sadiqkhoja reviewed Dec 3, 2025

View reviewed changes

Comment thread lib/model/migrations/20251127-01-submission-event-stamping-02.up.sql

Comment thread lib/model/migrations/20251127-01-submission-event-stamping-02.up.sql

sadiqkhoja reviewed Dec 3, 2025

View reviewed changes

Comment thread lib/model/migrations/20251127-01-submission-event-stamping-02.up.sql

Comment thread lib/model/migrations/20251127-01-submission-event-stamping-01.up.sql Outdated

sadiqkhoja approved these changes Dec 3, 2025

View reviewed changes

This was referenced Dec 10, 2025

Fast eventcounter etags for geoextracts #1657

Closed

Event counter migrations, towards getodk/central#1439 #1654

Closed

brontolosone force-pushed the eventcounter-by-row branch from f673225 to 93c842b Compare December 10, 2025 13:52

brontolosone requested a review from sadiqkhoja December 10, 2025 14:06

brontolosone mentioned this pull request Dec 14, 2025

Preview: per-row eventstamp-based etags for submissions-geojson #1718

Closed

2 tasks

sadiqkhoja reviewed Dec 16, 2025

View reviewed changes

brontolosone force-pushed the eventcounter-by-row branch from 93c842b to 1629c8e Compare December 17, 2025 14:24

Row event stamping for submissions table

8dcf59d

brontolosone force-pushed the eventcounter-by-row branch from 8e0eb2a to 4eb43e3 Compare December 18, 2025 14:57

Basic test for row event stamping

0186bf1

brontolosone force-pushed the eventcounter-by-row branch from 4eb43e3 to 0186bf1 Compare December 18, 2025 15:01

brontolosone merged commit 1b27a5f into getodk:master Dec 19, 2025
7 checks passed

github-project-automation Bot moved this from ✏️ in progress to ✅ done in ODK Central Dec 19, 2025

matthew-white removed this from ODK Central Jan 8, 2026

brontolosone mentioned this pull request Jan 15, 2026

Current per-transaction event IDs lead to over-invalidation of paginated API responses getodk/central#1609

Closed

brontolosone mentioned this pull request Jan 22, 2026

Eventstamp based synchronization protocols for OpenRosa / REST APIs getodk/central#1615

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POC Row event sequence stamping for submissions#1699

POC Row event sequence stamping for submissions#1699
brontolosone merged 2 commits into
getodk:masterfrom
brontolosone:eventcounter-by-row

brontolosone commented Nov 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sadiqkhoja Dec 16, 2025

Uh oh!

brontolosone Dec 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

brontolosone commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approach

Performance impact

What has been done to verify that this works as intended?

Why is this the best possible solution? Were any other approaches considered?

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

Does this change require updates to the API documentation? If so, please update docs/api.yaml as part of this PR.

Before submitting this PR, please make sure you have:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sadiqkhoja Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

brontolosone Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

brontolosone commented Nov 28, 2025 •

edited

Loading

brontolosone Dec 16, 2025 •

edited

Loading