Skip to content

Event counter migrations, towards getodk/central#1439#1654

Closed
brontolosone wants to merge 2 commits into
getodk:masterfrom
brontolosone:1439_event-counters
Closed

Event counter migrations, towards getodk/central#1439#1654
brontolosone wants to merge 2 commits into
getodk:masterfrom
brontolosone:1439_event-counters

Conversation

@brontolosone
Copy link
Copy Markdown
Contributor

@brontolosone brontolosone commented Oct 25, 2025

Towards getodk/central#1439

Per-actee event counters that spring into action upon create/update of specific audits rows.

Important assumptions:

  1. Every operation that meaningfully modifies an actee or its constituents (eg a form and its body of submissions) has an associated auditlog entry.
  2. Effects of these operations are committed together with their associated auditlog entry.

I've been assured that this is the case for inserts, and I've understood this to also be the case for the updates of an existing audits row when a "worker" task completes — the worker task's work is committed together with the update to the processed timestamp.

Furthermore,

  1. The list of form-or-dataset-actee-mutating-actions will be kept up to date.

Using the event counter as an ETag

PR #1657 contains an example for using it (for forms, and datasets).
It raises a question: do we need to make explicit what types of actees are expected to work?
The way the counter is queried in #1657 always returns a result for the actee, so if someone blindly copies that exact code (coalescing null to 0) to something to do with another actee type (say, user), they might get the impression that they're doing things correctly (but the event count is always 0). Wrong expectation?

What has been done to verify that this works as intended?

Manual testing.

Why is this the best possible solution? Were any other approaches considered?

See issue getodk/central#1439 and the dev doc linked therefrom.
Just doing counts on aggregates over the audit log is not great, on my modest laptop (still much faster than a budget VPS) it's ~50 ms for some, with just 300K audit rows.

So, the solution in this PR does things a different way.

It keeps per-dataset/form actee event counters in a table. Of course, if there'd be just one holy row per actee, there'd be heavy lock contention on updating those counters, effectively serializing all mutations for a single dataset/form actee. So we don't do that. Instead, via a trigger, for each audit log addition (of the right type) there's simply a single row inserted, for the respective actee, with event count: 1. That's contention-free.

But we don't want to sum over millions of rows with event count 1.

Thus, on every trigger run, with probability 0.01, a squashing function is called. This function takes all rows in the eventcounters table for the actee, and replaces them with a single row containing the aggregate sum of the evt_cnt field of those rows. This does not get in the way of any concurrent auditlog inserts (or, indeed, in the way of the ensuing eventcounters row creation). It does lock out concurrent incantations of itself (for the same actee), to no ill effect other than serializing concurrent auditlog insertions affecting the same actee with probability 0.01 * 0.01. It doesn't lock out any readers of the counters. It's fast! I measured an added cost of the trigger of 30µs (µs, not ms!) per call (with the squashing function's cost amortized).

To read a counter, use a query like such:

select sum(evt_cnt) from eventcounters where "acteeId" = 'fb971665-07a0-450d-83b2-b99261e12dff'

On average (on busy servers) this will be a sum over about 100 rows. Due to the way the index is constructed, with inclusion of the evt_cnt values, this query will be an index-only scan that doesn't hit the heap (the table content, as it were) in most cases (in a nutshell, depending on whether PostgreSQL has marked the pages where the rows reside as visible in all transactions, which depends on how "settled" the situation is). Anyway, that one's plenty fast too, 2ms or so on my machine.

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

Not really.

Does this change require updates to the API documentation? If so, please update docs/api.yaml as part of this PR.

No update required.

Before submitting this PR, please make sure you have:

  • run make test and confirmed all checks still pass OR confirm CircleCI build passes
  • verified that any code from external sources are properly credited in comments or that everything is internally sourced

@github-project-automation github-project-automation Bot moved this to 🕒 backlog in ODK Central Oct 25, 2025
@brontolosone brontolosone requested a review from alxndrsn October 25, 2025 17:11
@brontolosone brontolosone moved this from 🕒 backlog to ✏️ in progress in ODK Central Oct 25, 2025
AFTER INSERT OR UPDATE OF processed
ON "public"."audits"
FOR EACH ROW
WHEN (NEW.action = ANY(ARRAY['form.update.publish', 'submission.create', 'submission.update', 'submission.update.version', 'submission.attachment.update', 'submission.delete', 'submission.restore', 'dataset.update', 'dataset.update.publish', 'entity.create', 'entity.update.version', 'entity.update.resolve', 'entity.delete', 'entity.restore', 'entity.bulk.delete', 'entity.bulk.restore']))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important assumptions:

  1. The list of form-or-dataset-actee-mutating-actions will be kept up to date.

I'm a little concerned about this assumption. New actions aren't added very frequently, but they are added with some regularity, especially for datasets. I think it'd be easy to forget to add an action to this list, in which case there will be over-caching, which could be hard to catch.

What do you think about using a blacklist instead? I think it's a good assumption that most actions will change the associated actee and should bump the counter. Actions for which that isn't the case are the exception to the rule. Example: form.submission.export.

By the way, this comment doesn't need to block merge or anything like that. I could file it as a follow-up issue. But I do think it'd be good to discuss at some point. My apologies if the idea has already been discussed and discarded, I know I'm mostly an interested observer to this work. 😅

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR Blocklist sounds good, but a registry sounds even better, skip towards the end ;-)

More avenues:

  1. The DB-hardliner "proper" solution to the bookkeeping problem is to make the implicit information explicit. Actions should be first-class entities in the DB (thus get a table), and be FK-referred to by ID from the audits table.
    Then rows in the hypothetical audit_actions table would have a boolean column modifies_actee, and then there'd be one place to register actions and their effects.
    A side benefit is that on the audits table, the more compact FK IDs in place of the current strings should also make that table itself, and the indices using that column, much more compact, which makes things faster for operations on that table and those indices, but also for other things in the DB as other heaps/indices are now going to fit in the page cache.

  2. Another way to achieve consistency (with some tradeoffs) through centralized bookkeeping, is to formulate the definitions of actions in JS, as constants of some kind or another, and then dynamically update a view in the DB to be consistent with that definition any time migrations are run. Optionally, to maintain a form of referential integrity, because one can't have FKs to a view, we'd similarly define a dynamic PostgreSQL domain (basically a reusable CHECK constraint) for the now-pseudo-FK action column of audits. This would come with quite a bit of tooling (which exists) for the benefit of the JS side of things leading with definitions, and the DB side following.

  3. Yet another way is the "light" version of the above: a centralized registry at the JS side, which would be a good thing in itself for inventory purposes. Syncing that to the DB is a matter of manually writing a migration, but, a test can query the state in the DB, and compare it to the current state of things on the JS side, and thus alert to changes, so that developers are prompted to reflect mutations to the JS-inventory of actions in the DB. We have something similar already since db: default to declare-side indexes for foreign keys #1550.

  4. And yet another way to go about things, if we really don't fully understand/trust our events, or don't want to have to, is to run a count-incrementing trigger on any mutation on any row of various relevant tables. It'll likely lead to under-caching (over-invalidation), and for each table there'll need to be SQL code that gets an acteeId based on the affected row, which means the trigger will do DB queries with potentially quite a few joins, for every insert, which is going to absolutely tank performance. Just mentioning this bad option as present but bad, and on an extreme side in the spectrum of "how much trust do we have in the actions".

  • Separately, an alternative to using FKs to an audit_actions table registry, would be to make the audits.action column a PostgreSQL enum. This is fortunately possible as our longest "action" string so far has string length 37 AFAIK (empirical, just based on querying the staging DB — now if we'd only have a centralized registry of some sorts...) so, that'll fit in the enum constraints. In the DB, the action column would, under the hood, become of type integer, which has benefits stated earlier, but the big upside is that we probably won't need to modify as many queries as we'd have to do when the string representation of an action starts to live a table away from audits. And we could still compare the enum member state to the JS state in a test as described under option number 3.

I think in our case option number 3, combined with a PostgreSQL enum, is the most pragmatic one. We have some somewhat similar work to do if we want to pursue actees-as-uuids, see this comment.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts from me:

  • I'm convinced that a registry would be better than a blocklist. 👍 I like the idea of developers being required to update the registry, e.g., via a test that compares what's in the JS to what's in the DB. I suggested a blocklist partly thinking that people would forget to update the list, but your suggestion means that people wouldn't forget (or at least would be reminded). It'd be good when people add a new action for them to actively decide at that time whether the action should increment the event counter.
    • I guess someone could still forget to update both the DB and the JS. Is that right? Maybe that's one way in which your option (1) is better. In (1), there's no way of avoiding adding an action to the in-DB registry.
  • Another decision that we currently have to make whenever we add a new audit action is whether the action is "verbose", i.e., whether or not it should be shown in the frontend audit log (nonverbose = show in Frontend). Actions on the level of an individual submission or entity are excluded. That logic is implemented in actionCondition() in lib/model/query/audits.js. So I'm thinking, whenever a new action is added, the dev should decide (1) is the action verbose, and (2) should the action increment the event counter. I.e., eventually it'd be great if the registry could also include a property or column about whether each action is verbose. That's not needed for the work here — probably the focus should be on the event counter — but I wanted to point to a similar pattern/problem. Actually, one comment above actionCondition() seems quite relevant: "TODO: some sort of central table for all these things, not here."
  • Option (3) sounds nice to me.
    • Existing queries won't have to change that much.
    • A registry in JS means that we can automatically check that the trigger in the DB is up-to-date.
    • The enum sounds like a nice performance gain to me. But I don't really see it as being necessary for event counters. It sounds like a separate change that could be layered on afterwards. I think it'd be ideal for this PR to be as small as possible, so maybe (3) without enums for now, then enums come later as an additional performance gain.
  • The reasoning behind option (1) makes sense to me. I'm certainly intrigued by the option. If the DB needs to reference an ever-changing list, maybe the best approach is to put that list in the DB. Your point about performance also makes sense to me. Maybe (1) is also simpler in some ways than trying to keep a trigger and an enum up-to-date via a registry in JS.
    • The main downside is that a bunch of queries would have to change, right?
    • I guess theoretically we might someday want to access the registry in the JS code to drive some logic there. If the registry were just in the DB, then we wouldn't be able to do that. That seems like a speculative concern though.
  • My initial feeling about (2) is that the tooling required would increase the overall complexity of the system/codebase too much.
  • (4): Let's try to make our actions trustworthy!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple other thoughts!

One is that there's a list of audit log actions in the API docs (under "Server Audit Logs"). I doubt it's exhaustive, but there's a lot in there in case that's helpful.

The main downside is that a bunch of queries would have to change, right?

Now I'm wondering, maybe I'm overestimating the number of queries that reference audits.action. Maybe it's not such a big number.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enum sounds like a nice performance gain to me. But I don't really see it as being necessary for event counters. It sounds like a separate change that could be layered on afterwards. I think it'd be ideal for this PR to be as small as possible, so maybe (3) without enums for now, then enums come later as an additional performance gain.

The enum is not mainly for perf, the enum would also be

  1. a guarantee that no rogue string values will be used/found in the action column
  2. much easier to check for in-syncness with the JS registry than the body of a trigger (though we'd still have to check the body of the trigger anyway, to check whether the declared-mutating actions are all present).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning behind option (1) makes sense to me. I'm certainly intrigued by the option. If the DB needs to reference an ever-changing list, maybe the best approach is to put that list in the DB. Your point about performance also makes sense to me. Maybe (1) is also simpler in some ways than trying to keep a trigger and an enum up-to-date via a registry in JS.

The downside is that unless we autogenerate/autoupdate the trigger (tooling for that is waiting in the wings, if we want that) or autotest-and-manually-update the trigger to embed the appropriate actions in its code (as it has in its current form), then the trigger would have to query that table of actions on every insert, which I think would make things significantly slower.
Avoiding that lookup forces our hand into rewriting the trigger body to embed the action values. We can either do it manually, manually-with-crutches (prompted by failing test that assert sync), or automatically (with appropriate tooling).

The main downside is that a bunch of queries would have to change, right?

Also that! But perhaps we want them to change. Currently we're basically using constant values (bare strings) which is a bit like using magic numbers.

I guess theoretically we might someday want to access the registry in the JS code to drive some logic there. If the registry were just in the DB, then we wouldn't be able to do that.

And that, exactly!

Typically you want the enum/table/registry to live on both sides. So that in JS code you can do

sql`insert into audits ([...], action) values ([...], ${actions.SUBMISSION_ATTACHMENT_UPDATE})`

with a string value of submission.attachment.update for actions.SUBMISSION_ATTACHMENT_UPDATE), this would allow a rolling update to our queries where we replace the old literal strings with references into that actions registry. Under the hood this
Another option would be to use integer values (or PG enums, so strings again for the humans both in JS and SQL, but integers under the hood for the DB) there that are foreign keys to that hypothetical actions table expanded upon above, so we'd gain compactness. But then we'd need to update all our queries at once to switch to using the registry, but perhaps that's as much trouble as I fear.
Why would one prefer actions.SUBMISSION_ATTACHMENT_UPDATE over developers just throwing bare strings around? Well, it makes it clear to the writer (or copy-paste-adapter of code) that there is a registry, and makes it easy for readers to look up meta-info on an action (does it meaningfully mutate its actee, do we log it, ...), IDEs may autocomplete, typos are averted, etc etc. Might as well get it over with and clean up those bare strings.

So that's the JS side of things.
On the on the SQL side of things, you also want ergonomics, so you'd have a PG enum, or type, table, or domain that gives you a similar "registry", that is, something that defines exactly the values that are our actions, to be used in queries there. Using a table is best for when you need metadata (such as the hypothetical bumps_eventcounter boolean). Using an enum as the PK (and thus elsewhere as FKs) on such a table would be ideal: the metadata benefits of a table, the compactness of integers (enums are integers under the hood), and the DX of being able to use string labels instead of those integers, and the domain restriction benefits that comes with both.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside is that unless we autogenerate/autoupdate the trigger (tooling for that is waiting in the wings, if we want that) or autotest-and-manually-update the trigger to embed the appropriate actions in its code (as it has in its current form), then the trigger would have to query that table of actions on every insert, which I think would make things significantly slower.

That makes sense to me that we need a trigger then. 👍

Avoiding that lookup forces our hand into rewriting the trigger body to embed the action values. We can either do it manually, manually-with-crutches (prompted by failing test that assert sync), or automatically (with appropriate tooling).

I think my preference would be to start with manually-with-crutches. If we were to do it manually without crutches, there's a risk that people would forget to update the trigger. Re: doing it automatically, I think we could always add that tooling later.

Currently we're basically using constant values (bare strings) which is a bit like using magic numbers.

Why would one prefer actions.SUBMISSION_ATTACHMENT_UPDATE over developers just throwing bare strings around? Well, it makes it clear to the writer (or copy-paste-adapter of code) that there is a registry, and makes it easy for readers to look up meta-info on an action (does it meaningfully mutate its actee, do we log it, ...), IDEs may autocomplete, typos are averted, etc etc. Might as well get it over with and clean up those bare strings.

I see what you're saying, but I think I'd prefer not to replace the bare strings as part of this initial work. It feels tangential to the main goal of improving caching. If we need to replace the bare strings in order to achieve the main goal, then let's do that. But if it's more of a refactor or improvement to DX, I'd say let's consider doing it separately. If we used an enum, we could leave the bare strings for now, right?

The enum is not mainly for perf, the enum would also be

  1. a guarantee that no rogue string values will be used/found in the action column
  2. much easier to check for in-syncness with the JS registry than the body of a trigger (though we'd still have to check the body of the trigger anyway, to check whether the declared-mutating actions are all present).

It still sounds to me that unlike the trigger, the enum isn't strictly necessary for the event counter. To be clear, I quite like the sound of an enum. Performance + extra validation sound great. 👍 What I'm more proposing is that the addition of event counters and of an audit action registry be completed in a series of PRs rather than one large one. Maybe you're also thinking that, given that you filed getodk/central#1462 as its own issue. I was thinking that the following might be a sequence that'd work well. What do you think?

  • Merge this PR with the trigger pretty much as-is, with the hard-coded list of audit actions.
  • PR 2: Add a JS registry and use it to validate the trigger (i.e., add crutches).
  • PR 3: Add the enum.
  • PR 4: Maybe replace bare strings.
  • PR 5: Maybe something that implements the action=nonverbose filter using the registry.

One approach we could take is that Alex finishes up the review of this PR, then you and I continue discussing the registry idea. I'd be happy to review follow-up PRs related to the registry if you wanted me to.


--- create: public.audits.eventcounter_bump_trigger ---
CREATE TRIGGER "eventcounter_bump_trigger"
AFTER INSERT OR UPDATE OF processed
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more a curious question than anything else. Why would a change to processed increment the event counter? If processing an event has an effect on the actee, I would think that that would already be indicated by the insertion of new row(s) into audits.

  • Example 1: Whenever a form is published, we check whether the form has an enketoId. If it does, we do nothing (job ends). There's no need to increment the counter in that case. If the form doesn't have an enketoId, then we try to get one. If that's successful, there will be a new form.update event logged. The logging of that event alone should increment the counter.
  • Example 2: Whenever a submission is created, it is processed for entities. But I don't think such entity processing will affect the actee of the submission.create event (the form): it will only affect a dataset actee. That change will be indicated by the insertion of new entity event(s).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counterexample for submission.create: generation of secondary information which influences how submissions.csv looks — see 8cc3ab4 for what goes on there.
This information is added later (potentially much later; there's a race condition that I've shown when I demoed the pure-SQL insertion path, we'll clear that bug when we adopt the pure-SQL submission insertion path) in a separate transaction after insertion. And it manifests itself in the audits not with a new event, but only with setting the processed value of the existing row.

TLDR: Different outcomes for submissions.csv, but no sign in the audits other than that the processed column receives a value.

So I want to err on the cautionary side and interpret "something is processed" as an event. There may be cases where this results in over-invalidation, probably specific to the event action. I know I need it for submission.create, as I've shown above, but yeah there may be others where it's overzealous. We could take stock and follow up to make things more specific (but then, also think about how we're going to maintain bookkeeping, we don't want to miss events and potentially over-cache when the code changes...).

Copy link
Copy Markdown
Member

@matthew-white matthew-white Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, good counterexample. I thought about that case, but for some reason, I thought that any unprocessed submissions would be processed on the fly at request time for additional "select multiple" values. But looking at lib/data/briefcase.js, that doesn't seem to be the case. The CSV headers seem to be finalized before any data rows are processed or written (which makes sense). I must have been thinking about the aggregated client audit CSV instead. Submission attachments are processed for client audit rows after a submission.attachment.update event, but they're also processed on the fly if there are any client audit attachments that the worker hasn't processed. So the processed flag on the submission.attachment.update event doesn't affect the response. But it seems like that's not the case for submission.create. 👍

One counterexample is all we need, so I'm convinced that it makes sense to increment the event counter when processed is updated. I agree that we don't want extra bookkeeping in this area, so that logic shouldn't depend on whether it's a submission.create event vs. something else.

;

--- sign: eventcounter_squash(actee_id varchar(36)) ---
COMMENT ON FUNCTION "public"."eventcounter_squash"(actee_id varchar(36)) IS '{"dbsamizdat": {"version": 1, "definition_hash": "6697e80c4e5a2e69a33eb4aa6ccbea3d"}}';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the significance of dbsamizdat?

Copy link
Copy Markdown
Contributor Author

@brontolosone brontolosone Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use it to work with these things, letting the computer do the work of deciding whether they need updating (based on definitions I maintain out-of-tree, on my side, for now, but I'm happy to share them) and if so, making sure the DB objects get torn down and reinstated in the right dependency order, and writing out a migration file that does so which I then check into VCS, which is what you're looking at here. It's just a bit of dev tooling. But it needs those signatures (stored in comments, and as such tied to the lifecycle of the managed DB objects). Are they in your way?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you'll find those signatures also in master already, example. And we have it on good authority that that's fine.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation 🙏

And we have it on good authority that that's fine.

The link refers to the legality of mentioning a GPL-licensed library from code.

Note that you'll find those signatures also in master already, example.

It's great if this tool helps to develop new database migrations. For people not using this tool, these comments could be considered noise when committed to master. As a user of the tool, do they have ongoing value for you after you've finished working on a PR and it's ready to merge?

Copy link
Copy Markdown
Contributor Author

@brontolosone brontolosone Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, those tags are the thing that make the state diffing tick. Without state diffing one might as well write migrations by hand.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migrations ideally don't change after they've been merged to master, and especially not once released, so with that in mind, do the comments have ongoing value for you after you've finished working on a PR and it's ready to merge?

Copy link
Copy Markdown
Contributor Author

@brontolosone brontolosone Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the signatures in the DB have ongoing value, that's what makes the diffing tick. For instance this migration (trivial example, but an example nevertheless) was autogenerated, based on the diff, based on the signatures, as read from the DB, as put there by the previous migration.
A month or two ago I've shown how this works to @ktuite and @sadiqkhoja (was in the context of the geoprocessing functions). Let me know if you'd like to know more about how this works; this PR is probably not the best venue.

@alxndrsn
Copy link
Copy Markdown
Contributor

I measured an added cost of the trigger of 30µs (µs, not ms!) per call (with the squashing function's cost amortized).

Any idea what the worst case is likely to be?

@brontolosone
Copy link
Copy Markdown
Contributor Author

Any idea what the worst case is likely to be?

Probably the 1 in 10_000 probability (more under concurrency) that two audits mutations get serialized because both hit the lucky number to run the squashing trigger, as I've explained in the intro. In that case it won't be so much the squashing itself that'll cause noticable delay, but the serialization may well cause delay. For instance, my submission insertion could be stuck behind yours (for the same form) if we both get janitor duty, so if you delay your insertion (somehow), you'll also be delaying mine. Yet other concurrent insertions, even for the same form, will happily run without such delay (unless they also hit the 0.01 probability of getting janitor duty).

AFTER INSERT OR UPDATE OF processed
ON "public"."audits"
FOR EACH ROW
WHEN (NEW.action = ANY(ARRAY['form.update.publish', 'submission.create', 'submission.update', 'submission.update.version', 'submission.attachment.update', 'submission.delete', 'submission.restore', 'dataset.update', 'dataset.update.publish', 'entity.create', 'entity.update.version', 'entity.update.resolve', 'entity.delete', 'entity.restore', 'entity.bulk.delete', 'entity.bulk.restore']))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should form.attachment.update be in the list?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe the question is more general - why is form.update.publish in the list, but other form.* actions are not?

Copy link
Copy Markdown
Contributor Author

@brontolosone brontolosone Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Why is form.attachment.update not there — Take the case of submissions.csv. You can't update an attachment of a form that is published, which is why I left it out; I don't want to invalidate a giant costly submissions.csv just because someone is futzing around with some drafts of that same form. Consequently, probably don't use event counters as an etag on .csv exports of a draft form.

  • Why is form.update.publish there — Again take the case of submissions.csv. Publishing a form can alter the CSV-"view" on its existing submission-corpus (eg, extra column), so updates to the form should result in cache invalidation.

I agree that ideally we'd have semantics where any audit action accurately invalidates its actee for any purpose without false positives, but that's just not the reality of Central as it is currently, evident from the two examples above.

We can decide to not capture the finesses and instead just capture everything, which will make the counters yolo-safe to use, that is, whenever you generate data just etag it with whatever counter you find for the actee, without further consideration or thought. That'd result in over-invalidation/under-caching. It's one side of the spectrum as laid out in the design doc, but I'm not sure if it's attainable, I think there'll always be some thinking required.

Another ideal, and over on the other side of the spectrum (as laid out in the design doc) would be to be super precise, super granular, with the counters. For instance, if you're not listing columns with attachment counts in your .csv, then adding an attachment to a submission listed in your .csv doesn't need to invalidate its cache. So we'd really need segmented counters, and then the user of the etag will have to find the right combination of counters depending on their use case ("ok I'm doing something with attachments, but not something with drafts, so I need to combine counter X and Y for this form, ...."). Increased cognitive load (and thus risk of getting it wrong) for the benefit of decreased under-caching.

Whether you like my exclusion of form.attachment.update depends
a) on where you stand on that spectrum ideologically,
b) and separately, on how transparent this all is for a developer so that they can make an well-informed decision on whether they can use a counter as an etag.

a) is a matter of concensus and if you don't agree with the middle-ground position taken here in this code, it'd be good to bring it up, perhaps in the team Meet later today, so that we as a team can reach concensus on this.

To address b), @matthew-white and I think setting up a registry of actions will offer inroads to making this system more legible.

@alxndrsn
Copy link
Copy Markdown
Contributor

alxndrsn commented Oct 29, 2025

It raises a question: do we need to make explicit what types of actees are expected to work?

I'd say yes please, and enforced in the code 👍

@brontolosone
Copy link
Copy Markdown
Contributor Author

It raises a question: do we need to make explicit what types of actees are expected to work?

I'd say yes please, and enforced in the code 👍

Part of getodk/central#1462 ! I can't promise it'll be enforced in the code. Supposedly there could be a DB function get_event_count(actee uuid) of sorts that would cross-check whether the actee is of the right kind, but that would involve an extra query (or join) to the actees table. And even then one would still need to reason per-endpoint about whether the event selection is appropriate for the purpose of invalidation. But that can at least be made legible, which is the topic of getodk/central#1462.

@brontolosone brontolosone force-pushed the 1439_event-counters branch 2 times, most recently from cb5c5da to 3aa0871 Compare November 4, 2025 14:34
@brontolosone brontolosone marked this pull request as ready for review November 4, 2025 15:28
@lognaturel lognaturel marked this pull request as draft November 24, 2025 19:44
@lognaturel
Copy link
Copy Markdown
Member

Converted back to draft while we explore an alternative concept.

@brontolosone
Copy link
Copy Markdown
Contributor Author

Closing, we're going to pursue the #1699 approach first and foremost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🕒 backlog

Development

Successfully merging this pull request may close these issues.

4 participants