Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@

--- drop: eventcounter_bump_triggerfunction ---
DROP FUNCTION IF EXISTS "public"."eventcounter_bump_triggerfunction"() CASCADE;

--- drop: eventcounter_squash(actee_id character varying) ---
DROP FUNCTION IF EXISTS "public"."eventcounter_squash"(actee_id character varying) CASCADE;

--- drop: public.audits.eventcounter_bump_trigger ---
DROP TRIGGER IF EXISTS eventcounter_bump_trigger ON "public"."audits" CASCADE;
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
CREATE TABLE eventcounters (
"acteeId" varchar(36) REFERENCES actees (id) ON DELETE CASCADE NOT NULL,
evt_count integer NOT NULL DEFAULT 1
);

-- Embedding the evt_count into the index makes the aggregate sum(evt_count) for a given actor
-- possible with an index-only scan.
CREATE INDEX idx_eventcounters ON eventcounters USING btree ("acteeId") INCLUDE (evt_count);
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
DROP TABLE eventcounters;
59 changes: 59 additions & 0 deletions lib/model/migrations/20251025-01-auditlog-eventcounters-02.up.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@

--- create: eventcounter_squash(actee_id varchar(36)) ---
CREATE FUNCTION "public"."eventcounter_squash"(actee_id varchar(36))
RETURNS integer
AS
$BODY$
WITH deleted AS (
DELETE FROM eventcounters WHERE "acteeId" = actee_id RETURNING evt_count
)
INSERT INTO eventcounters ("acteeId", evt_count) VALUES (actee_id, (SELECT COALESCE(SUM(evt_count), 0) FROM deleted)) RETURNING evt_count
$BODY$
LANGUAGE sql
VOLATILE
STRICT
PARALLEL UNSAFE
;

--- sign: eventcounter_squash(actee_id varchar(36)) ---
COMMENT ON FUNCTION "public"."eventcounter_squash"(actee_id varchar(36)) IS '{"dbsamizdat": {"version": 1, "definition_hash": "b264d1502e124c331ad7d11b20b8fca2"}}';

--- create: eventcounter_bump_triggerfunction ---
CREATE FUNCTION "public"."eventcounter_bump_triggerfunction"()
RETURNS trigger
AS
$BODY$
BEGIN
INSERT INTO eventcounters ("acteeId") VALUES (NEW."acteeId");
IF
(random() < 0.01)
THEN
PERFORM eventcounter_squash(NEW."acteeId");
END IF;
RETURN NULL;
END;
$BODY$
LANGUAGE plpgsql
VOLATILE
STRICT
PARALLEL UNSAFE
;

--- sign: eventcounter_bump_triggerfunction ---
COMMENT ON FUNCTION "public"."eventcounter_bump_triggerfunction"() IS '{"dbsamizdat": {"version": 1, "definition_hash": "2ddd8bde4e781812dd87338f21b024e8"}}';

--- create: public.audits.eventcounter_bump_trigger ---
CREATE TRIGGER "eventcounter_bump_trigger"
AFTER INSERT OR UPDATE OF processed
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more a curious question than anything else. Why would a change to processed increment the event counter? If processing an event has an effect on the actee, I would think that that would already be indicated by the insertion of new row(s) into audits.

  • Example 1: Whenever a form is published, we check whether the form has an enketoId. If it does, we do nothing (job ends). There's no need to increment the counter in that case. If the form doesn't have an enketoId, then we try to get one. If that's successful, there will be a new form.update event logged. The logging of that event alone should increment the counter.
  • Example 2: Whenever a submission is created, it is processed for entities. But I don't think such entity processing will affect the actee of the submission.create event (the form): it will only affect a dataset actee. That change will be indicated by the insertion of new entity event(s).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counterexample for submission.create: generation of secondary information which influences how submissions.csv looks — see 8cc3ab4 for what goes on there.
This information is added later (potentially much later; there's a race condition that I've shown when I demoed the pure-SQL insertion path, we'll clear that bug when we adopt the pure-SQL submission insertion path) in a separate transaction after insertion. And it manifests itself in the audits not with a new event, but only with setting the processed value of the existing row.

TLDR: Different outcomes for submissions.csv, but no sign in the audits other than that the processed column receives a value.

So I want to err on the cautionary side and interpret "something is processed" as an event. There may be cases where this results in over-invalidation, probably specific to the event action. I know I need it for submission.create, as I've shown above, but yeah there may be others where it's overzealous. We could take stock and follow up to make things more specific (but then, also think about how we're going to maintain bookkeeping, we don't want to miss events and potentially over-cache when the code changes...).

Copy link
Copy Markdown
Member

@matthew-white matthew-white Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, good counterexample. I thought about that case, but for some reason, I thought that any unprocessed submissions would be processed on the fly at request time for additional "select multiple" values. But looking at lib/data/briefcase.js, that doesn't seem to be the case. The CSV headers seem to be finalized before any data rows are processed or written (which makes sense). I must have been thinking about the aggregated client audit CSV instead. Submission attachments are processed for client audit rows after a submission.attachment.update event, but they're also processed on the fly if there are any client audit attachments that the worker hasn't processed. So the processed flag on the submission.attachment.update event doesn't affect the response. But it seems like that's not the case for submission.create. 👍

One counterexample is all we need, so I'm convinced that it makes sense to increment the event counter when processed is updated. I agree that we don't want extra bookkeeping in this area, so that logic shouldn't depend on whether it's a submission.create event vs. something else.

ON "public"."audits"
FOR EACH ROW
WHEN (
(NEW."acteeId" IS NOT NULL)
AND
(NEW.action = ANY(ARRAY['form.update.publish', 'submission.create', 'submission.update', 'submission.update.version', 'submission.attachment.update', 'submission.delete', 'submission.restore', 'dataset.update', 'dataset.update.publish', 'entity.create', 'entity.update.version', 'entity.update.resolve', 'entity.delete', 'entity.restore', 'entity.bulk.delete', 'entity.bulk.restore']))
)
EXECUTE PROCEDURE eventcounter_bump_triggerfunction()
;

--- sign: public.audits.eventcounter_bump_trigger ---
COMMENT ON TRIGGER "eventcounter_bump_trigger" ON "public"."audits" IS '{"dbsamizdat": {"version": 1, "definition_hash": "45fddf42c072bc6d04104bbaed5ac120"}}';
10 changes: 10 additions & 0 deletions lib/model/migrations/20251025-01-auditlog-eventcounters.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
// Copyright 2025 ODK Central Developers
// See the NOTICE file at the top-level directory of this distribution and at
// https://github.com/getodk/central-backend/blob/master/NOTICE.
// This file is part of ODK Central. It is subject to the license terms in
// the LICENSE file found in the top-level directory of this distribution and at
// https://www.apache.org/licenses/LICENSE-2.0. No part of ODK Central,
// including this file, may be copied, modified, propagated, or distributed
// except according to the terms contained in the LICENSE file.

module.exports = require('../pure-sql-migration')(__filename);