Fix/add connection pools to fix db hangup #731

coodos · 2026-01-27T20:52:51Z

Description of change

Issue Number

closes #638

Type of change

Breaking (any change that would cause existing functionality to not work as expected)
New (a change which implements a new feature)
Update (a change which updates existing functionality)
Fix (a change which fixes an issue)
Docs (changes to the documentation)
Chore (refactoring, build scripts or anything else that isn't user-facing)

How the change has been tested

Change checklist

I have ensured that the CI Checks pass locally
I have removed any unnecessary logic
My code is well documented
I have signed my commits
My code follows the pattern of the application
I have self reviewed my code

Summary by CodeRabbit

Release Notes

Bug Fixes
- Improved webhook participant loading to avoid hangs and timeouts, with safer per-item handling and better failure isolation.
- Reworked change handling to reliably reload and send updated entities after commits, reducing missed or inconsistent notifications.
Performance
- Tuned database connection pooling and timeouts to improve resource utilization and responsiveness.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

The subscriber's afterUpdate was using event.entity which only contains changed fields (partial entity), not the full entity with charter. When groups were updated via webhooks, the charter field was often absent in the partial entity, causing Cerberus to lose track of group charters over time. After restart, charters would be loaded fresh from DB. Root cause: TypeORM's afterUpdate event provides partial entities when using repository.save(entity). The code was not reloading the full entity with all fields (including charter) from the database after the transaction committed. Changes: - Refactored afterUpdate to pass metadata (entityId, relations) instead of using the partial entity from event.entity - Created handleChangeWithReload that schedules entity reload for after transaction commit (inside setTimeout with 50ms delay) - Created executeReloadAndSend that does the actual findOne with all relations AFTER transaction commits, ensuring charter and all fields are loaded - Groups and messages sync with 50ms delay (fast, ensures commit) This is the same transaction timing issue fixed in file-manager-api and dreamsync-api. Now Cerberus will maintain full group data (including charters) indefinitely without requiring restarts.

The group webhook processing could hang indefinitely when loading participant users, causing Cerberus to appear stuck after running for some time. The last log was "Extracted userId" with no progress. Root causes: 1. Promise.all blocks if any getUserById call hangs (DB lock, timeout, etc.) 2. No timeout protection - hangs wait forever 3. No error handling - failures block entire webhook 4. Loading unnecessary relations (followers/following) added complexity Changes: - Use Promise.allSettled instead of Promise.all to handle failures gracefully - Add 5-second timeout per user lookup using Promise.race - Wrap each participant load in try-catch with detailed error logging - Load users without heavy relations in webhook context (don't need followers/following) - Add indexed logging to identify which participant causes issues - Log success/failure counts for transparency Benefits: - Webhook completes even if some participants fail to load - 5s timeout prevents indefinite hangs - Better diagnostics via indexed logging - Reduced DB load by skipping unnecessary relations The webhook will now respond within ~5 seconds even if all participant lookups fail, preventing Cerberus from getting stuck.

coderabbitai · 2026-01-27T20:53:22Z

📝 Walkthrough

Walkthrough

Enhances webhook participant loading with per-participant timeouts to avoid hangs, adds connection-pool tuning to multiple Postgres DataSource configs, and refactors the subscriber afterUpdate flow to reload, enrich, and debounce post‑commit entity notifications.

Changes

Cohort / File(s)	Summary
Webhook participant loading `platforms/cerberus/src/controllers/WebhookController.ts`	Replaced prior per-participant async loading with a per-item async function that safely extracts userId, wraps each repository call in a 5s `Promise.race` timeout, and uses `Promise.allSettled` to collect valid participants; adds per-index error logging.
Subscriber reload-and-notify workflow `platforms/cerberus/src/web3adapter/watchers/subscriber.ts`	Reworked `afterUpdate` handling to derive `entityId` from multiple sources and schedule a debounced post‑commit reload. Added private `handleChangeWithReload` and `executeReloadAndSend` methods that reload/enrich the entity, convert to plain data, skip junctions/non-system messages, and call `adapter.handleChange`. Includes additional guards and logging.
Database connection pool configs `platforms/cerberus/src/database/data-source.ts`, `infrastructure/evault-core/src/config/database.ts`, `platforms/dreamsync-api/src/database/data-source.ts`, `platforms/eCurrency-api/src/database/data-source.ts`, `platforms/eReputation-api/src/database/data-source.ts`, `platforms/emover-api/src/database/data-source.ts`, `platforms/esigner-api/src/database/data-source.ts`, `platforms/evoting-api/src/database/data-source.ts`, `platforms/file-manager-api/src/database/data-source.ts`, `platforms/group-charter-manager-api/src/database/data-source.ts`, `platforms/pictique-api/src/database/data-source.ts`, `platforms/registry/src/config/database.ts`	Added an `extra` block to DataSource/DataSource options across multiple platforms with connection-pool and timeout settings (`max: 10`, `min: 2`, `idleTimeoutMillis: 30000`, `connectionTimeoutMillis: 5000`, `statement_timeout: 10000`). Review for consistent naming/typing and environment compatibility.

Sequence Diagram(s)

sequenceDiagram
    participant Event as Change Event
    participant Sub as Subscriber
    participant DB as Database
    participant Adapt as Adapter

    Event->>Sub: afterUpdate(event)
    Sub->>Sub: derive entityId (event.entity.id / databaseEntity?.id / common id fields)
    alt id missing
        Sub-->>Event: log warning & exit
    else id present
        Sub->>Sub: schedule handleChangeWithReload (debounced per table)
        Note right of Sub: debounced timer per table (skip junctions)
        Sub->>DB: reload entity by id after commit
        Note over Sub,DB: reload uses repository.findOne and enrichment
        DB-->>Sub: enriched entity/plain data
        Sub->>Sub: validate (skip locked/non-system where applicable)
        Sub->>Adapt: adapter.handleChange(envelope)
        Adapt-->>Sub: acknowledgement
        Sub->>Sub: log envelope
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Chore/file manager and esigner bug fixes #729: Implements a similar reload-and-notify (handleChangeWithReload/executeReloadAndSend) debounce/post‑commit flow in subscriber code.
fix: issue with cerberus only triggering after an edit #433: Changes WebhookController async handling toward non-blocking charter/webhook processing; relates to the per‑participant timeout changes here.
Feat/evoting #273: Modifies DataSource configuration for Postgres pooling—related to the added extra pool settings.

Suggested reviewers

Bekiboo
sosweetham

Poem

🐰
I hopped through logs and timeouts bright,
Per‑participant naps cut short at night,
I fetch, I reload, I debounce with care,
Group chats chirp now — no hangs in the air! ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 inconclusive)

Check name	Status	Explanation	Resolution
Linked Issues check	❓ Inconclusive	The PR addresses issue `#638` (Cerberus group chat failures) through connection pool additions and webhook/subscriber improvements, but the connection between specific code changes and the expected behavior is not explicitly documented.	Clarify in the PR description how the connection pool and webhook timeout changes specifically address the notification and charter violation processing failures described in issue `#638`.
Out of Scope Changes check	❓ Inconclusive	The PR contains mostly in-scope changes (database pool configurations and webhook/subscriber enhancements) related to fixing database hangups, but includes additional changes to subscriber reload logic that may extend beyond the immediate scope of connection pooling.	Review whether the subscriber reload-and-notify workflow (155 lines added) is necessary for the core hangup fix or if it should be separated into a distinct PR for better clarity.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Fix/add connection pools to fix db hangup' directly relates to the main changes in the PR, which add connection pool configurations across multiple database data sources to address database hangup issues.
Description check	✅ Passed	The PR description follows the provided template structure with Issue Number (`#638`) and lists all change type options, but the 'Type of change' section is not properly checked and 'How the change has been tested' is empty.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coodos added 2 commits January 28, 2026 02:02

chore: bunp connection pool size

c05d7d0

coodos force-pushed the fix/add-connection-pools-to-fix-db-hangup branch from 83dce76 to c05d7d0 Compare January 27, 2026 20:59

ananyayaya129 approved these changes Jan 27, 2026

View reviewed changes

coodos merged commit 46b062e into main Jan 27, 2026
7 checks passed

coodos deleted the fix/add-connection-pools-to-fix-db-hangup branch January 27, 2026 21:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/add connection pools to fix db hangup #731

Fix/add connection pools to fix db hangup #731

coodos commented Jan 27, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 27, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix/add connection pools to fix db hangup #731

Fix/add connection pools to fix db hangup #731

Conversation

coodos commented Jan 27, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of change

Issue Number

Type of change

How the change has been tested

Change checklist

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coodos commented Jan 27, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 27, 2026 •

edited

Loading