Skip to content

feat(infra): Lock *Async API — pure CF chain, no task.get()#1304

Merged
zbnerd merged 18 commits into
developfrom
feature/lock-async-api
Jun 19, 2026
Merged

feat(infra): Lock *Async API — pure CF chain, no task.get()#1304
zbnerd merged 18 commits into
developfrom
feature/lock-async-api

Conversation

@zbnerd

@zbnerd zbnerd commented Jun 18, 2026

Copy link
Copy Markdown
Owner

Summary

Adds async-returning methods to the Lock port. Eliminates all 5 task.get() blocking sites in PostgresAdvisoryLockStrategy + OrderedLockExecutor (and the hidden task.get() in AbstractLockStrategy reached by PostgresLockStrategy). Migrates all 5 module-infra direct Lock callers to the new *Async API.

Spec / Audit

  • Audit: docs/05_Reports/2026-06-18-blocking-audit.md
  • Plan: docs/superpowers/plans/2026-06-18-lock-async.md

Decisions (grill-me Q1-Q6)

  • Q1=B: poll + timeout → LockAcquisitionException
  • Q2=A: poll at 100ms (VT-friendly Thread.sleep inside CompletableFuture.supplyAsync)
  • Q3=A: PG NOTIFY preserved for leader/follower broadcast (implemented as poll-probe matching existing xact-scoped path)
  • Q4=A: full interface coverage (3 impls migrated: PostgresAdvisoryLockStrategy, PostgresLockStrategy, GuavaLockStrategy, OrderedLockExecutor)
  • Q5=A: pg_try_advisory_lock (session-scoped) + explicit pg_advisory_unlock in whenComplete
  • Q6=A: full module-infra migration; module-app legacy = follow-up PR

Changes

module-infra/lock/

  • LockStrategy.kt: added 5 *Async methods (executeWithLockAsync × 2, tryLockImmediatelyAsync, unlockAsync, executeWithOrderedLocksAsync). Sync methods marked @Deprecated. Default executeWithOrderedLocks preserved (composite-key fallback for legacy callers).
  • LeaderElectionStrategy.kt: added executeWithLeaderElectionAsync. Sync marked @Deprecated.
  • AbstractLockStrategy.kt: implements *Async template-style via 2 abstract helper methods.
  • PostgresAdvisoryLockStrategy.kt: implements *Async (session-scoped PG lock + explicit unlock). Uses @Qualifier("defaultAsyncExecutor") for executor injection.
  • PostgresLockStrategy.kt: implements *Async (mirror pattern). Also fixed pre-existing unlockInternal registry leak.
  • GuavaLockStrategy.kt: implements *Async (Striped.tryLock + lock.unlock). Test-only CachedThreadPool (no Spring DI).
  • OrderedLockExecutor.kt: added executeWithOrderedLocksAsync + 5 internal async methods. Migrated 2 task.get() sites. Sync methods marked @Deprecated.

module-infra/aop/

  • LockAspect.kt: AOP wrapper now uses executeWithLockAsync. .get() at AOP boundary (documented exception per async-patterns.md). Fixed ExecutionException.cause unwrap for DistributedLockException detection.

module-infra/ (callers — 5 of 6 plan files migrated)

  • BatchJobRecoveryScheduler.kt, MonitoringReportJob.kt, MonitoringAlertService.kt, PopularCharacterWarmupScheduler.kt, BulkLoaderService.kt: migrated to *Async API.
  • RuleBasedAnalyzer.kt: excluded (no actual Lock API usage, only a string literal).

module-app/ (test mock follow-ups)

  • MonitoringAlertServiceUnitTest.java: mocks now stub tryLockImmediatelyAsync(...) returning CompletableFuture<Boolean>.
  • PopularCharacterWarmupSchedulerTest.java: mocks now stub executeWithLockAsync(...) returning CompletableFuture<T> via Function0 supplier.

CI gate

New test LockBlockingPrimitiveGateTest greps module-infra/lock/ for task.get(), runBlocking, Thread.sleep. Allowlist for legacy @Deprecated sync methods (multi-line span detection) and VT-friendly Thread.sleep inside CompletableFuture.supplyAsync blocks. Failure = build red.

Verification

  • ./gradlew compileKotlin compileJava --continue clean
  • ./gradlew test clean
  • CI grep gate: 0 violations

Out of scope (follow-up PR)

  • module-app legacy migration (1 file: ExpectationBatchWriteScheduler.java)
  • MonitoringAlertService lock leak (4-sec lease, pre-existing behavior preserved)

zbnerd and others added 18 commits June 18, 2026 13:23
- 28 real blocking sites across module-infra + module-external-api
- original plan (docs/superpowers + ADR) was based on wrong code assumptions
- 5 recommended sub-PRs for future work
- no code changes
Migrated MonitoringAlertService and PopularCharacterWarmupScheduler to
Lock *Async methods in feature/lock-async-api. The unit test mocks in
module-app still stubbed the legacy sync API (executeWithLock,
tryLockImmediately), so the production code's calls to *Async methods
returned null on the mocks, causing NPE in whenComplete.

Update both test classes to mock the new async signatures and return
CompletableFuture values.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ab1c151a06

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

waitTime: Long,
leaseTime: Long,
supplier: () -> CompletableFuture<T>,
): CompletableFuture<T> = tryAcquireSessionLockWithPollAsync(key, waitTime, leaseTime)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep session advisory locks on one connection

This async path switches callers from the old transaction-scoped lock to pg_try_advisory_lock, but the acquisition is executed through JdbcTemplate as a standalone query; Spring returns that connection to Hikari immediately after the query, while PostgreSQL session advisory locks remain bound to that physical connection. The later pg_advisory_unlock can run on a different pooled connection, so it may not release the lock, and a later caller that happens to reuse the original connection can re-enter the same lock, breaking mutual exclusion for the schedulers/bulk loader now using executeWithLockAsync. Hold a single connection for acquire/work/release, or keep using transaction-scoped advisory locks for this API.

Useful? React with 👍 / 👎.

Comment on lines +84 to +85
supplier().whenComplete { _, _ ->
releaseSessionLock(key, lockId)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Release the lock when the supplier throws synchronously

If the protected supplier throws before it returns a CompletableFuture, the whenComplete callback is never attached and releaseSessionLock is skipped. Several migrated callers wrap synchronous work as CompletableFuture.completedFuture(doWork()), so exceptions from report generation, recovery, CSV reads, or warmup happen exactly in this window and leave the distributed lock held/leaked instead of releasing it like the old transaction-scoped path did. Wrap supplier() so release is registered in a finally-equivalent path even for synchronous throws.

Useful? React with 👍 / 👎.

@zbnerd zbnerd merged commit 59ae20f into develop Jun 19, 2026
@zbnerd

zbnerd commented Jun 19, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@zbnerd zbnerd deleted the feature/lock-async-api branch June 19, 2026 05:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant