Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
97461e1
docs: CF chain blocking audit (2026-06-18)
zbnerd Jun 18, 2026
d51b50d
feat(infra): LockStrategy interface — add *Async, @Deprecated sync
zbnerd Jun 18, 2026
3b45569
fix(infra): LockStrategy trailing newline + default body rationale KDoc
zbnerd Jun 18, 2026
a2ec8ff
feat(infra): LeaderElectionStrategy interface — add async, @Deprecate…
zbnerd Jun 18, 2026
1f8b1cd
docs(infra): LeaderElectionStrategy — spell out CF, add @param to leg…
zbnerd Jun 18, 2026
5cb1d11
feat(infra): PostgresAdvisoryLockStrategy *Async API + session-scoped…
zbnerd Jun 18, 2026
776db9a
fix(infra): PostgresAdvisoryLockStrategy — inject defaultAsyncExecuto…
zbnerd Jun 18, 2026
f9e5166
feat(infra): PostgresLockStrategy + GuavaLockStrategy + AbstractLockS…
zbnerd Jun 18, 2026
65ca658
fix(infra): Lock *Async — extract lambda, clear session registry, hoi…
zbnerd Jun 18, 2026
5b59002
feat(infra): OrderedLockExecutor — executeWithOrderedLocksAsync, no t…
zbnerd Jun 18, 2026
de49b3d
test(infra): replace kotlin.test imports with JUnit Assertions (kotli…
zbnerd Jun 18, 2026
be3b3a6
feat(infra): LockAspect uses executeWithLockAsync (caller .get() acce…
zbnerd Jun 18, 2026
185c1dc
fix(infra): OrderedLockExecutor — KDoc notes for cold-path probe, sen…
zbnerd Jun 18, 2026
08c972b
fix(infra): LockAspect — unwrap ExecutionException.cause for Distribu…
zbnerd Jun 18, 2026
f831c63
feat(infra): migrate 6 direct Lock callers to *Async API
zbnerd Jun 18, 2026
abce520
test(infra): fix Mockito InvalidUseOfMatchersException in PostgresAdv…
zbnerd Jun 18, 2026
71f187c
test(infra): CI grep gate for blocking primitives in lock/ package
zbnerd Jun 18, 2026
ab1c151
test(app): fix unit test mocks for *Async Lock API migration
zbnerd Jun 18, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
147 changes: 147 additions & 0 deletions docs/05_Reports/2026-06-18-blocking-audit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# CF Chain Blocking Audit — 2026-06-18

- Date: 2026-06-18
- Status: **Audit only. No code changes. PR not created.**
- Author: claude-code + explore subagents
- Branch at audit: `feature/issue-CF-CHAIN-blocking-fix` (no commits)

## Background

Original goal: drop sync return contract of `LogicExecutor`, `Lock`, `SingleFlight`, `TieredCache` and migrate all callers to pure `CompletableFuture<T>` end-to-end.

This audit (not the original plan) was created **after** the original plan proved wrong on key structural assumptions. This file documents actual code state, real blocking sites, and the path forward for any future PR.

## Original plan files (retained for reference)

- `docs/superpowers/specs/2026-06-18-ext-api-blocking-fix-design.md` — design (aspirational, based on wrong assumptions)
- `docs/01_ADR/ADR-blocking-async-contract-cf-chain.md` — ADR (proposed, not accepted)
- `docs/superpowers/plans/2026-06-18-ext-api-blocking-fix.md` — implementation plan (also wrong; ~30% of tasks applicable)

**These 3 files are NOT a binding contract.** They document the brainstorm + plan exercise. Future PRs should write a fresh plan against actual code.

## Original plan errors (so future work avoids them)

| Plan assumed | Actual |
|---|---|
| `LogicExecutor` flat (7 sync + 7 async) | ISP composed: `LogicExecutor : BasicExecutor + SafeExecutor + ResilientExecutor` |
| `DefaultCheckedLogicExecutor` = `LogicExecutor` impl | Separate IO-boundary executor (`CheckedSupplier` input). Real `LogicExecutor` impl = `DefaultLogicExecutor` |
| `TaskContext.simple(...)` factory | Doesn't exist. Real factory: `TaskContext.of(component, operation)` or `TaskContext.of(component, operation, dynamicValue)` |
| `SingleFlight` needs `executeAsync` added | **Already exists** on `SingleFlightStrategy:50` interface + `PostgresSingleFlightStrategy:85` impl. Only `.join()` inside sync facade `:73-76` |
| PGMQ `MessageHandler.handle(): CF<AckResult>` | Doesn't exist. `process(): Boolean` (true=ACK, false=NACK). No `AckResult` class |
| `EquipmentFetchProvider` `.join()` removable | **Explicit ADR in class Javadoc (lines 16-44)**: `.join()` is intentional, kept for Spring `@Cacheable` sync return contract |
| `GlobalAdmissionControl` busy loop | V4 legacy only (V4 controllers). Not used by V5/ext-api. Plan T8 targets wrong component |
| InternalApiController return `CF<ResponseEntity<...>>` | All endpoints return sync `ResponseEntity<...>` |
| 5 active module caller sites | **8 sites** — missed 3 `runBlocking` in `EquipmentRankingRedisWriter.kt:26, 31, 35` + 1 in `OcidLookupRunConsumer.kt:38` |

## Real blocking sites (audited 2026-06-18)

### module-infra

| File:Line | Site | Severity | Notes |
|---|---|---|---|
| `lock/PostgresAdvisoryLockStrategy.kt:72, 117, 143` | `task.get()` inside `lockTransactionTemplate.execute` | HIGH | 3 sites. No async API on `LockStrategy` interface |
| `lock/PostgresLockStrategy.kt:97, 113, 120, 140, 145, 318, 338, 342, 351, 352, 356, 359` | `acquiredLocks.get()` / `lockOrder.get()` | LOW | All `ThreadLocal` reads. Non-blocking. False positive |
| `lock/OrderedLockExecutor.kt:143, 181` | `task.get()` after lock acquire | HIGH | 2 sites |
| `concurrency/PostgresSingleFlightStrategy.kt:75-76` | `.orTimeout().join()` inside sync `execute` facade | MEDIUM | `executeAsync` already exists; only the sync facade blocks |
| `concurrency/SingleFlightExecutor.kt` | none | n/a | Already CF-only |
| `cache/TieredCache.kt:126` | `buffer.submit(key).orTimeout(5, SECONDS).join()` | LOW | Single site, bounded by timeout, internal L2 batcher |
| `worker/ExternalApiWorker.kt:111` | `pipelineAsync(payload).join()` | HIGH | |
| `worker/ExternalApiWorker.kt:306-324` | `runBlocking(Dispatchers.Default)` | MEDIUM | CPU-bound calc |
| `worker/CalculationWorker.kt:83-91` | `.handle().join()` | HIGH | |
| `worker/OcidResolveWorker.kt:64-73` | `.join()` | HIGH | Topic subscriber not `PgmqWorker` |
| `worker/ResultReadyProjectionWorker.kt:81-90` | `.join()` × 2 | HIGH | Scheduled poller, not `PgmqWorker` |
| `worker/ResultReadyProjectionWorker.kt:123` | `runBlocking(Dispatchers.Default)` | MEDIUM | |
| `pgmq/PgmqWorker.kt:380-394` | `runBlocking(Dispatchers.Default)` + `async/awaitAll` | MEDIUM | Sequential batch coroutine path |
| `provider/EquipmentFetchProvider.kt:72` | `nexonApiClient.getItemDataByOcid(ocid).orTimeout(10, SECONDS).join()` | DOC-EXEMPT | **Class Javadoc documents `.join()` is intentional** (lines 16-44, Issue #118) |
| `security/filter/JwtAuthenticationFilter.kt:80-99` | `executor.executeWithFallback { ... }` — sync execution | LOW | Uses `LogicExecutor` for sync. Not blocking but synchronous |
| `admission/GlobalAdmissionControl.kt:238` | `while (running.get() && !Thread.interrupted)` | FALSE | V4 legacy. Not used by V5/ext-api |
| `java/.../service/starforce/StarforceLookupAdapter.java` | none on async path | n/a | Uses `LogicExecutor` for sync |
| `java/.../service/cube/component/CubeComputeBuffer.java` | none | n/a | `ConcurrentHashMap.get` not blocking |

### module-external-api

| File:Line | Site | Severity | Notes |
|---|---|---|---|
| `runstatus/InternalApiController.kt:83, 123` | `executor.submit { scheduler.*Async().join() }` | MEDIUM | Request thread returns 202 immediately. But internal `internalApiExecutor` worker blocks for whole phase |
| `scheduler/ExternalApiScheduler.kt:188` | `runBlocking { ocidLookupPhase.execute(executor, runKey, runId) }` | HIGH | Bridges `suspend` into CF chain. VT carrier is fine per async-patterns.md |
| `scheduler/phase/OcidLookupPhase.kt:147` | `writerJob.join()` | FALSE | `kotlinx.coroutines.Job.join()` IS a suspend function. Suspends in coroutine context, not blocks thread |
| `scheduler/phase/OcidLookupPhase.kt:148` | `putJob.await()` | FALSE | Coroutine `Deferred.await()` — proper suspend |
| `snapshot/ChunkFileManager.kt:132` | `all.get(600_000L, TimeUnit.MILLISECONDS)` | HIGH | 10-min hard blocking on writer thread |
| `snapshot/SnapshotFailedRecordWriter.kt` | none on async path | n/a | Sync I/O on single writer thread (ChunkFileManager thread-affinity L24-25) |
| `auth/AuthCharacterFetchConsumer.kt:51` | `characterListOpt.get()` | NULL-SAFETY | Violates `kotlin-null-safety.md` rule `.isPresent()+.get()`. Not blocking |
| `urgent/UrgentCharacterRequestConsumer.kt:53` | `semaphore.tryAcquire()` | OK | Non-blocking variant. Good |
| `urgent/UrgentCharacterRequestConsumer.kt:66` | `semaphore.release()` inside `whenComplete` only | MEDIUM | Permit leaks on CF cancel external. Need `tryAcquireAsync` + `finally` release or use `BackpressureLimiter` |
| `build.gradle:55` | `bootJar { enabled = true }` | TENSION | Per `workflow-rules.md` ext-api IS a boot service. Per `build-conventions.md` only `module-app` should enable. Both rules exist; depends on which you read |

### Active module callers (5 files, 8 sites)

| File:Line | Site | Severity | Notes |
|---|---|---|---|
| `module-synchronizer/.../consumer/ChunkConsumerTemplate.kt:99` | `request.executor.execute { logicExecutor.executeWithFinally(...) }` | LOW | Fire-and-forget dispatch |
| `module-synchronizer/.../consumer/OcidLookupRunConsumer.kt:35` | `executor.execute { runCatching { runBlocking(...) } }` | MEDIUM | |
| `module-synchronizer/.../consumer/OcidLookupRunConsumer.kt:38` | `runBlocking(Dispatchers.Default) { objectMapper.readValue(...) }` | MEDIUM | Sync bridge to coroutine |
| `module-synchronizer/.../ranking/EquipmentRankingRedisWriter.kt:26` | `runBlocking(Dispatchers.Default) { documents.filter { ... } }` | MEDIUM | |
| `module-synchronizer/.../ranking/EquipmentRankingRedisWriter.kt:31` | `runBlocking(Dispatchers.Default) { rankable.groupBy { ... } }` | MEDIUM | |
| `module-synchronizer/.../ranking/EquipmentRankingRedisWriter.kt:35` | `executor.executeOrDefault(...)` returning sync `Int` | MEDIUM | |
| `module-calculator/.../processor/CalculationCache.kt:75` | `return cache.get(key) { ... }` (Caffeine sync) | LOW | Sync return type. `cache.getAsync` doesn't exist on this Caffeine wrapper |
| `module-external-api/.../auth/AuthCharacterFetchConsumer.kt:42` | `executor.execute { runCatching { ... } }` | LOW | Fire-and-forget |

### Out of scope (audit-confirmed)

- `module-core`: 0 blocking primitive sites
- `module-common`: 0 blocking primitive sites
- `module-app` legacy (20+ Java files): 0 sites audited in this scope, but uses sync `LogicExecutor` extensively — follow-up PR territory

## Recommended path forward

### Sub-PR 1: Lock `*Async` API (HIGH impact, clean)

Add `executeWithLockAsync(key, supplier): CF<T>`, `executeWithLeaderElectionAsync(key, ...): CF<T>`, `executeWithOrderedLocksAsync(...): CF<T>` to `LockStrategy` interface. Implement in `PostgresAdvisoryLockStrategy` + `OrderedLockExecutor`. `@Deprecated` sync methods. module-app legacy continues with warnings.

Eliminates: 5 HIGH sites.

### Sub-PR 2: PGMQ Worker CF path (HIGH impact, larger)

Refactor `process(msg: PgmqMessage<T>): Boolean` to return a richer `ProcessOutcome` sealed class (Ack / Nack(retryable) / Nack(deadletter) / Nack(visibilityReset)). The `Boolean` return is the actual blocking constraint (workers call `.join()` internally to coerce CF chain to Boolean).

Alternatively: introduce `MessageHandler<T>` interface with `CF<ProcessOutcome>` return. Add to `PgmqWorker` framework. Migrate workers one by one.

Eliminates: 8 HIGH + MEDIUM sites.

### Sub-PR 3: ext-api Controller + Scheduler (MEDIUM impact)

Convert `InternalApiController` endpoints to fire-and-forget pattern (already partially there). Remove `runBlocking` in `ExternalApiScheduler.runOcidPhase` by converting to pure CF via `coroutineScope { async { ... }.await() }`.

Eliminates: 2 MEDIUM + 1 HIGH sites.

### Sub-PR 4: ChunkFileManager close path (HIGH impact, contained)

Convert `awaitAllUploads` to `closeAsync(): CF<Void>`. Use `thenRun` for manifest write. `ChunkedSnapshotSink.close()` chains via `thenCompose`.

Eliminates: 1 HIGH site.

### Sub-PR 5: BackpressureLimiter migration + runBlocking cleanup (MEDIUM impact)

- `UrgentCharacterRequestConsumer`: `Semaphore.tryAcquire` + `release` in `whenComplete` → `BackpressureLimiter.tryAcquireAsync` + `releaseAsync` in `whenComplete` (with `.whenComplete` cancel-safe handling).
- `OcidLookupRunConsumer.kt:38` + `EquipmentRankingRedisWriter.kt:26, 31`: convert `runBlocking` to inline async or move to dedicated async endpoint.

Eliminates: 1 MEDIUM + 4 MEDIUM sites.

### Out of scope (DOC-EXEMPT)

- `EquipmentFetchProvider.kt:72` `.join()`: KEEP. Class Javadoc documents intentional.
- `OcidLookupPhase.kt:147-148` `Job.join()` + `await()`: KEEP. Both are suspend functions, not blocking primitives. Add code comment to make this explicit (was task T11 in original plan).

## Audit artifacts

- 6 explore subagents dispatched (1 stopped early, 5 completed)
- 4 production files read directly (DefaultLogicExecutor, TaskContext, CheckedLogicExecutor, JwtAuthenticationFilter, GlobalAdmissionControl, EquipmentFetchProvider, StarforceLookupAdapter, CubeComputeBuffer)
- 0 production code changes
- 0 commits made
- Feature branch `feature/issue-CF-CHAIN-blocking-fix` exists with 0 commits (clean state)

## Recommendation

Defer all 5 sub-PRs to individual issues. Each = small, reviewable, with green build per commit (using `@Deprecated` shim where applicable, mirroring grill-me Q1=A, Q6=B, Q9=A from original brainstorm).

Do NOT use the original 19-task plan as a basis — it assumes wrong code structure.
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
package maple.expectation.monitoring;

import static org.mockito.ArgumentMatchers.any;
import static org.mockito.ArgumentMatchers.anyLong;
import static org.mockito.ArgumentMatchers.eq;
import static org.mockito.BDDMockito.given;
import static org.mockito.Mockito.never;
import static org.mockito.Mockito.times;
import static org.mockito.Mockito.verify;

import java.util.concurrent.CompletableFuture;
import maple.expectation.core.port.out.BufferStatusQuery;
import maple.expectation.infrastructure.alert.StatelessAlertService;
import maple.expectation.infrastructure.config.MonitoringThresholdProperties;
Expand Down Expand Up @@ -104,7 +106,8 @@ void setUp() {
@DisplayName("리더 권한을 획득하고 전역 임계치를 초과하면 알림을 발송한다")
void leaderSuccess_OverThreshold_SendAlert() {
// given
given(lockStrategy.tryLockImmediately(eq("global-monitoring-lock"), eq(4L))).willReturn(true);
given(lockStrategy.tryLockImmediatelyAsync(eq("global-monitoring-lock"), eq(4L)))
.willReturn(CompletableFuture.completedFuture(true));
given(bufferStatus.getTotalPendingCount()).willReturn(6000L);
given(thresholdProperties.getBufferSaturationCount()).willReturn(5000L);

Expand All @@ -119,7 +122,8 @@ void leaderSuccess_OverThreshold_SendAlert() {
@DisplayName("전역 임계치 이하일 때는 리더 권한이 있어도 알림을 보내지 않는다")
void leaderSuccess_UnderThreshold_NoAlert() {
// given
given(lockStrategy.tryLockImmediately(eq("global-monitoring-lock"), eq(4L))).willReturn(true);
given(lockStrategy.tryLockImmediatelyAsync(eq("global-monitoring-lock"), eq(4L)))
.willReturn(CompletableFuture.completedFuture(true));
given(bufferStatus.getTotalPendingCount()).willReturn(3000L);
given(thresholdProperties.getBufferSaturationCount()).willReturn(5000L);

Expand All @@ -134,7 +138,8 @@ void leaderSuccess_UnderThreshold_NoAlert() {
@DisplayName("리더 선출 실패 시 모니터링을 스킵한다")
void follower_SkipMonitoring() {
// given
given(lockStrategy.tryLockImmediately(eq("global-monitoring-lock"), eq(4L))).willReturn(false);
given(lockStrategy.tryLockImmediatelyAsync(eq("global-monitoring-lock"), eq(4L)))
.willReturn(CompletableFuture.completedFuture(false));

// when
monitoringAlertService.checkBufferSaturation();
Expand All @@ -149,7 +154,8 @@ void follower_SkipMonitoring() {
@DisplayName("정확히 임계값(5000)을 초과하면 알림을 발송한다")
void exactlyAtThreshold_SendAlert() {
// given
given(lockStrategy.tryLockImmediately(eq("global-monitoring-lock"), eq(4L))).willReturn(true);
given(lockStrategy.tryLockImmediatelyAsync(eq("global-monitoring-lock"), eq(4L)))
.willReturn(CompletableFuture.completedFuture(true));
given(bufferStatus.getTotalPendingCount()).willReturn(5001L);
given(thresholdProperties.getBufferSaturationCount()).willReturn(5000L);

Expand All @@ -164,7 +170,8 @@ void exactlyAtThreshold_SendAlert() {
@DisplayName("버퍼가 비어있을 때는 알림을 보내지 않는다")
void bufferZero_NoAlert() {
// given
given(lockStrategy.tryLockImmediately(eq("global-monitoring-lock"), eq(4L))).willReturn(true);
given(lockStrategy.tryLockImmediatelyAsync(eq("global-monitoring-lock"), eq(4L)))
.willReturn(CompletableFuture.completedFuture(true));
given(bufferStatus.getTotalPendingCount()).willReturn(0L);
given(thresholdProperties.getBufferSaturationCount()).willReturn(5000L);

Expand All @@ -179,7 +186,8 @@ void bufferZero_NoAlert() {
@DisplayName("복구 가능한 분산 락 획득 실패 시 재시도하지 않고 스킵한다")
void lockAcquisitionFailure_SkipMonitoring() {
// given
given(lockStrategy.tryLockImmediately(eq("global-monitoring-lock"), eq(4L))).willReturn(false);
given(lockStrategy.tryLockImmediatelyAsync(eq("global-monitoring-lock"), eq(4L)))
.willReturn(CompletableFuture.completedFuture(false));

// when
monitoringAlertService.checkBufferSaturation();
Expand Down
Loading