Skip to content

recordSpend() silently skipped when hasSession returns false (cost dropped, no NeedAuth sent) #600

@Gekko-dot-ETH

Description

@Gekko-dot-ETH

While reading packages/node/src/seller-request-handler.ts (was looking at the NeedAuth flow for unrelated reasons), I noticed a guard that silently swallows cost-recording for served requests, and wanted to flag it.

Lines 371–399 in seller-request-handler.ts:

// Record spend and send NeedAuth with cost data after every request.
// The buyer validates the cost independently and responds with SpendingAuth.
if (spm?.hasSession(buyerPeerId)) {
  const usage = responseUsage;
  const costUsdc = computeCostUsdc(...);
  const session = spm.getChannelByPeer(buyerPeerId);
  if (session) {
    spm.recordSpend(session.sessionId, costUsdc);
    ...
    this._sendNeedAuthBestEffort(...);
  }
}

hasSession(buyerPeerId) is this._activeBuyers.has(buyerPeerId) (seller-payment-manager.ts:1253). _activeBuyers is mutated in several places — added on hydrate/activate, deleted on evict/disconnect/timeout.

If at the moment a response completes the buyer is no longer in _activeBuyers (or getChannelByPeer returns null), the seller has already served the request, but:

  • recordSpend is never called → _spent doesn't grow
  • No NeedAuth is sent → buyer is never asked for higher cumulative
  • On the next request, validateAndAcceptAuth sees accepted >= spent trivially true (both still at the previous value)
  • This continues until channel deadline, at which point close() settles only the prior amount; the rest of the deposit refunds to the buyer
  • The seller has done the inference for free

There's no log, alert, or counter when the guard fails. It looks silent by design rather than instrumented.

Two related questions:

  1. Is the guard intentional? If hasSession returns false but a response was still served, what state did the seller think it was in?
  2. If it can legitimately return false (eviction race, restart hydrate window, etc.), shouldn't the seller at minimum log or refuse to serve the next request on that channel?

A small debugWarn inside the else branch (or a counter you can scrape) would be enough to surface this in production.

Related: #529 covered the racing-update-loss case in the same area; this is a different failure mode (no recordSpend call at all), but shares the under-counted-spend outcome.

Happy to PR an instrumentation change or a hardening fix if maintainers can confirm the intended invariant.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions