refactor(sdk): rewrite NonceCache with LRU eviction, drift detection, and structured errors by lklimek · Pull Request #3111 · dashpay/platform

lklimek · 2026-02-19T12:21:43Z

Issue being fixed or feature implemented

The SDK's nonce cache had several design issues that could cause retry loops to fail after a broadcast error:

Nonce regression: refresh_identity_nonce() deleted cache entries entirely, causing re-fetch of stale nonces from Platform and recreation of identical (failing) state transitions.
Raw upper bits in cache: Platform returns nonces with "missing revisions" bits set in the upper portion. These were stored raw, making comparisons and increments unreliable.
Unbounded drift: Repeated bump_first calls without successful broadcasts could push the cached nonce arbitrarily ahead of Platform with no re-fetch trigger.
Lock held across I/O: The nonce cache mutex was held during async Platform fetches, blocking all other callers during network round-trips.
Duplicated logic: get_identity_nonce and get_identity_contract_nonce contained identical cache logic.

What was done?

Rewrite: Dedicated `NonceCache` struct

Extracted all nonce caching logic from Sdk into a self-contained NonceCache struct backed by an LRU cache (bounded to 1 000 entries). The struct owns its mutex and exposes a clean async API.

Three-phase locking

get_or_fetch_nonce no longer holds the mutex across the async fetch_from_platform call. The function is structured as: lock → check → unlock → fetch (no lock) → lock → update → unlock. This prevents blocking during network round-trips.

Drift detection

Each cache entry tracks last_platform_nonce. When the cached nonce drifts ≥ MAX_MISSING_IDENTITY_REVISIONS ahead, a re-fetch is forced and Platform's response becomes the source of truth — while still preserving the higher of cached vs Platform nonce to avoid regression.

Upper-bit masking at fetch

The IDENTITY_NONCE_VALUE_FILTER mask is applied once, at fetch time. The cache only holds plain nonce values, eliminating redundant masking on every return path.

Stale marking instead of removal

refresh_identity_nonce() sets cache entry timestamps to 0 (stale) instead of removing entries. This forces a re-fetch on next use while preserving the cached nonce value.

Overflow guard

Added NonceOverflow error variant — returned when a nonce reaches u64::MAX instead of silently wrapping to zero.

IdentityNonceNotFound error

Platform returning no nonce for a queried identity now returns IdentityNonceNotFound instead of silently defaulting to nonce 0. This is a retriable error — the DAPI node may not have indexed the identity yet.

wasm-sdk error handling

Added NonceOverflow and IdentityNonceNotFound variants to WasmSdkErrorKind and the From<SdkError> conversion in the wasm-sdk, fixing a non-exhaustive match compilation failure.

Cleanup

Removed unused Sdk::parse_proof and Sdk::parse_proof_with_metadata (dead code).
Unified duplicated nonce fetch logic into a single get_or_fetch_nonce helper.

How Has This Been Tested?

Extensive unit tests covering: cache hits/misses, stale refresh, drift detection and re-fetch, overflow guard, upper-bit masking, LRU eviction, concurrent access under contention
cargo check -p dash-sdk -p wasm-sdk passes cleanly
cargo clippy -p dash-sdk passes cleanly
cargo test -p dash-sdk passes cleanly
All CI workflows pass (Tests, rs-sdk-ffi build, gRPC coverage, PR linter)

Breaking Changes

None — the NonceCache is an internal implementation detail. Public Sdk API is unchanged.

Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have added or updated relevant unit/integration/functional/e2e tests
I have added "!" to the title and described breaking changes in the corresponding section if my code contains any
I have made corresponding changes to the documentation if needed

When broadcast() fails, refresh_identity_nonce() previously removed nonce cache entries entirely. If the TX had partially reached some Tenderdash nodes (entered their mempool) before the error, the next retry would re-fetch the old nonce from Platform (TX not yet processed), derive the same nonce, create identical TX bytes, and get rejected with "tx already exists in cache". Fix by marking cache entries as stale (timestamp=0) instead of removing them. This forces a re-fetch from Platform on the next use while preserving the cached nonce value. The existing MAX(platform, cached) comparison then ensures the nonce always advances, preventing the SDK from recreating an identical state transition. Closes dashpay/dash-evo-tool#588 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-02-19T12:21:54Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Refactors nonce caching to store triples (insert_nonce, timestamp, platform_nonce); centralizes nonce retrieval into a shared get_or_fetch_nonce flow with drift-aware re-fetch and masked platform nonces; removes two pub(crate) proof-parsing helpers; refresh now marks entries stale by zeroing timestamps.

Changes

Cohort / File(s)	Summary
SDK core changes `packages/rs-sdk/src/sdk.rs`	Removed `pub(crate)` helpers `parse_proof` and `parse_proof_with_metadata`. Introduced/use shared `get_or_fetch_nonce` flow; adapted identity and identity-contract nonce logic to the 3-tuple cache state and masking via `IDENTITY_NONCE_VALUE_FILTER`; refresh semantics updated to zero timestamps for staleness.
Internal cache shape `packages/rs-sdk/src/internal_cache/mod.rs`	Added `pub(crate) type NonceCacheEntry = (IdentityNonce, LastQueryTimestamp, IdentityNonce)`; switched `identity_nonce_counter` and `identity_contract_nonce_counter` to store `NonceCacheEntry`; updated `Default` init and related mutex/BTreeMap types.
Tests & validation `packages/rs-sdk/src/...`	Added/updated tests covering nonce cache behavior: drift, bumps, error propagation, concurrent updates, refresh semantics, masking and isolation.
Docs/comments & manifests `packages/rs-sdk/src/...`, `Cargo.toml`	Updated comments/documentation to describe new 3-tuple cache, masking, and drift logic; manifest lines changed (additions/removals) reflected in Cargo.toml diffs.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant SDK
    participant InternalCache
    participant Platform

    Client->>SDK: request nonce for identity / identity+contract
    SDK->>InternalCache: read (insert_nonce, ts, platform_nonce)
    alt cache considered fresh (drift/revisions ok)
        InternalCache-->>SDK: return cached insert_nonce (and metadata)
        SDK-->>Client: return cached nonce
    else cache stale or missing
        SDK->>Platform: fetch platform_nonce (async, without cache lock)
        Platform-->>SDK: return platform_nonce
        SDK->>SDK: platform_masked = platform_nonce & IDENTITY_NONCE_VALUE_FILTER
        SDK->>SDK: combine/compute insert_nonce (apply bump_first if required)
        SDK->>InternalCache: write (insert_nonce, now_timestamp, platform_masked)
        SDK-->>Client: return updated nonce
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I found three little numbers, tucked warm in a row,

A timestamp to nap, a platform bit's glow.
I fetch when the drift makes old crumbs untrue,
I mask what I borrow and stitch something new.
Hop—cache refreshed, and the network says whoo!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The PR title describes LRU eviction and structured errors, but the actual changes implement stale marking, drift detection, and nonce filtering—not LRU eviction or structured errors.	Update the title to accurately reflect the main changes: 'fix(sdk): prevent nonce cache regression with stale marking, drift detection, and nonce filtering' or similar.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	The PR directly addresses issue `#588` by implementing all core objectives: preventing nonce reuse after failed broadcasts, reconciling cached nonce with Platform values, stripping malformed bits from Platform nonces, detecting drift, and improving concurrency during Platform fetches.
Out of Scope Changes check	✅ Passed	All changes are in-scope: nonce cache refactoring (get_or_fetch_nonce helper, cache state changes), removal of unused parse_proof methods, and new tests directly support the fix. No unrelated modifications detected.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/sdk-nonce-cache-regression

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…turn path The nonce cache previously stored raw values with the upper "missing revisions" bits intact, requiring IDENTITY_NONCE_VALUE_FILTER masking on every return path. This made comparisons between platform_nonce and cached values unreliable since the upper bits could differ. Strip the upper bits immediately after fetching from Platform so the cache only ever holds plain nonce values. This simplifies all downstream comparisons, prevents +1 increments from overflowing into the upper bits, and removes redundant masking from 7 return paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-02-19T12:44:32Z

✅ gRPC Query Coverage Report

================================================================================
gRPC Query Coverage Report - NEW QUERIES ONLY
================================================================================

Total queries in proto: 53
Previously known queries: 47
New queries found: 6

================================================================================

New Query Implementation Status:
--------------------------------------------------------------------------------
✓ getAddressInfo                                /home/runner/work/platform/platform/packages/rs-sdk/src/platform/query.rs
✓ getAddressesBranchState                       /home/runner/work/platform/platform/packages/rs-sdk/src/platform/address_sync/mod.rs
✓ getAddressesInfos                             /home/runner/work/platform/platform/packages/rs-sdk/src/platform/fetch_many.rs
✓ getAddressesTrunkState                        /home/runner/work/platform/platform/packages/rs-sdk/src/platform/query.rs
✓ getRecentAddressBalanceChanges                /home/runner/work/platform/platform/packages/rs-sdk/src/platform/query.rs
✓ getRecentCompactedAddressBalanceChanges       /home/runner/work/platform/platform/packages/rs-sdk/src/platform/query.rs

================================================================================
Summary:
--------------------------------------------------------------------------------
New queries implemented: 6 (100.0%)
New queries missing: 0 (0.0%)

Total known queries: 53
  - Implemented: 50
  - Not implemented: 2
  - Excluded: 1

Not implemented queries:
  - getConsensusParams
  - getTokenPreProgrammedDistributions

coderabbitai

🧹 Nitpick comments (1)

packages/rs-sdk/src/sdk.rs (1)
504-550: Consider extracting the shared nonce fetch-and-update logic.

The masking, drift detection, and effective_current computation in get_identity_contract_nonce (lines 505–550) are identical to get_identity_nonce (lines 391–438). A shared helper (e.g., resolve_nonce(platform_nonce, cached_entry, bump_first, current_time_s)) would eliminate duplication and ensure both paths stay in sync when the logic evolves. The existing TODO: Move to a separate struct at line 469 aligns with this.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/rs-sdk/src/sdk.rs` around lines 504 - 550, Extract the duplicated
masking, drift-detection, and cache-update logic into a single helper (e.g.,
resolve_nonce or a method on a new NonceResolver struct) and call it from both
get_identity_contract_nonce and get_identity_nonce; the helper should accept the
platform_nonce (already masked with IDENTITY_NONCE_VALUE_FILTER as done via
IdentityContractNonceFetcher), a mutable reference to the cache Entry
(Vacant/Occupied), bump_first, and current_time_s, compute effective_current
using MAX_MISSING_IDENTITY_REVISIONS, decide insert_nonce (handling
platform_nonce > effective_current and bump_first cases), update the cache entry
with (insert_nonce, current_time_s), and return the insert_nonce or Result as
needed so both call sites reuse identical behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/rs-sdk/src/sdk.rs`:
- Around line 504-550: Extract the duplicated masking, drift-detection, and
cache-update logic into a single helper (e.g., resolve_nonce or a method on a
new NonceResolver struct) and call it from both get_identity_contract_nonce and
get_identity_nonce; the helper should accept the platform_nonce (already masked
with IDENTITY_NONCE_VALUE_FILTER as done via IdentityContractNonceFetcher), a
mutable reference to the cache Entry (Vacant/Occupied), bump_first, and
current_time_s, compute effective_current using MAX_MISSING_IDENTITY_REVISIONS,
decide insert_nonce (handling platform_nonce > effective_current and bump_first
cases), update the cache entry with (insert_nonce, current_time_s), and return
the insert_nonce or Result as needed so both call sites reuse identical
behavior.

…tform The nonce cache could drift arbitrarily far ahead of Platform through repeated bump_first calls without successful broadcasts. Each bump refreshed the cache timestamp, preventing time-based staleness from triggering a re-fetch. Add last_platform_nonce to the cache entry and check drift inline in should_query_platform. When the cached nonce exceeds the last known platform value by more than MAX_MISSING_IDENTITY_REVISIONS, force a re-fetch and use platform's response as the source of truth. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ce helper get_identity_nonce and get_identity_contract_nonce had identical cache logic differing only in key type and fetcher. Extract the shared staleness check, drift detection, fetch, bump, and cache update into a generic async helper that both methods delegate to. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

packages/rs-sdk/src/internal_cache/mod.rs (1)

7-15: Consider a named struct to prevent positional confusion between the two IdentityNonce fields.

Both .0 (current_nonce) and .2 (last_platform_nonce) share the same IdentityNonce type. A swap at any destructuring site (e.g., let (last_platform_nonce, _, current_nonce) = e.get()) compiles silently but produces wrong semantics.

♻️ Replace tuple alias with a named struct

-/// Cached nonce state: `(current_nonce, last_fetch_timestamp, last_platform_nonce)`.
-///
-/// - `current_nonce` — the nonce value the SDK will use next (may be bumped
-///   ahead of what Platform reported).
-/// - `last_fetch_timestamp` — when we last fetched from Platform.
-/// - `last_platform_nonce` — the value Platform returned on the last fetch
-///   (after stripping upper bits). Used to detect when local bumps have
-///   drifted too far ahead, triggering a re-fetch.
-pub(crate) type NonceCacheEntry = (IdentityNonce, LastQueryTimestamp, IdentityNonce);
+/// Cached nonce state for identity/contract nonce tracking.
+#[derive(Debug, Clone, Copy)]
+pub(crate) struct NonceCacheEntry {
+    /// The nonce value the SDK will use next (may be bumped ahead of Platform).
+    pub current_nonce: IdentityNonce,
+    /// Unix timestamp (seconds) of the last Platform fetch; 0 means stale.
+    pub last_fetch_timestamp: LastQueryTimestamp,
+    /// Plain nonce Platform returned on the last fetch (upper bits stripped).
+    pub last_platform_nonce: IdentityNonce,
+}

All destructuring sites in sdk.rs would then use named fields instead of positional indexing.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/rs-sdk/src/internal_cache/mod.rs` around lines 7 - 15, The tuple
alias NonceCacheEntry uses two IdentityNonce fields which is error-prone;
replace the alias with a small named struct (e.g., struct NonceCacheEntry {
current_nonce: IdentityNonce, last_fetch_timestamp: LastQueryTimestamp,
last_platform_nonce: IdentityNonce }) and update all uses (including
destructuring sites in sdk.rs that reference NonceCacheEntry::get()/e.get()) to
use the named fields instead of positional .0/.1/.2 to prevent silent swaps
between current_nonce and last_platform_nonce. Ensure the struct derives or
implements the same traits previously relied on (Clone/Copy/Debug/etc.) so
callers keep compiling.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/rs-sdk/src/sdk.rs`:
- Around line 407-420: In get_or_fetch_nonce, inside the should_query_platform =
true branch where you compute insert_nonce from platform_nonce, update the logic
to account for any existing cached nonce: when handling Entry::Occupied (and
generally before inserting), read the existing cached nonce (the first element
of the stored tuple) and set base = max(existing_cached_nonce, platform_nonce),
then compute insert_nonce = base + 1 and store (insert_nonce, current_time_s,
platform_nonce); do this instead of unconditionally using platform_nonce so
retries use max(existing, platform) as the base.

---

Nitpick comments:
In `@packages/rs-sdk/src/internal_cache/mod.rs`:
- Around line 7-15: The tuple alias NonceCacheEntry uses two IdentityNonce
fields which is error-prone; replace the alias with a small named struct (e.g.,
struct NonceCacheEntry { current_nonce: IdentityNonce, last_fetch_timestamp:
LastQueryTimestamp, last_platform_nonce: IdentityNonce }) and update all uses
(including destructuring sites in sdk.rs that reference
NonceCacheEntry::get()/e.get()) to use the named fields instead of positional
.0/.1/.2 to prevent silent swaps between current_nonce and last_platform_nonce.
Ensure the struct derives or implements the same traits previously relied on
(Clone/Copy/Debug/etc.) so callers keep compiling.

packages/rs-sdk/src/sdk.rs

coderabbitai

🧹 Nitpick comments (2)

packages/rs-sdk/src/sdk.rs (2)
496-518: refresh_identity_nonce — stale-marking logic is correct; consider a range scan for contract nonces.

Setting timestamp = 0 reliably triggers the staleness check on next access (since 0 < current_time_s − stale_time_s holds for any real UNIX timestamp). The logic is sound.

Minor: the contract-nonce path (lines 510–516) does a full linear scan over all entries to find those matching identity_id. Since the BTreeMap key is (identity_id, contract_id), matching entries are contiguous and a range query (range((identity_id, Identifier::default())..)) would avoid iterating unrelated entries. Not a correctness issue, just a minor efficiency opportunity if the cache grows large.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/rs-sdk/src/sdk.rs` around lines 496 - 518, The current
refresh_identity_nonce implementation correctly marks timestamps as 0 but scans
the entire internal_cache.identity_contract_nonce_counter BTreeMap to find
entries for an identity; optimize by performing a contiguous range scan instead
of iterating all entries — use a range_mut over keys starting at
(identity_id.clone(), <min contract id>) and ending at (identity_id.clone(),
<max contract id>) (or appropriate Bound bounds) to visit only entries whose
first tuple element equals identity_id and set their timestamp to 0; keep the
existing identity_nonce_counter logic unchanged.
371-383: Tokio mutex held across async platform fetch serialises all nonce requests.

Both get_identity_nonce and get_identity_contract_nonce lock the shared nonce counter, then await a potential network call inside get_or_fetch_nonce. Because the Entry borrows the map, the lock cannot be dropped until the fetch completes. Under concurrent callers this means every identity's nonce request queues behind any in-flight platform fetch, regardless of identity.

This is an inherent limitation of the current Entry-based API and is pre-existing — but the refactor makes it structurally clearer. A future optimisation would be a per-identity (or per-key) lock rather than a single map-level mutex.

Also applies to: 407-427
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/rs-sdk/src/sdk.rs` around lines 371 - 383, The code currently holds
internal_cache.identity_nonce_counter.lock() across an await inside
Self::get_or_fetch_nonce, which serialises all concurrent nonce requests; change
the flow so the map lock is not held during the async fetch by: 1) access the
map under internal_cache.identity_nonce_counter only to check/clone the existing
Entry value or to insert a lightweight “pending” marker, then drop the mutex
before performing the async platform call invoked via
IdentityNonceFetcher::fetch_with_settings; 2) after the fetch completes,
re-acquire the identity_nonce_counter mutex and update/replace the entry with
the fetched nonce (or remove the pending marker and insert the real value);
alternatively replace the single map mutex with a per-identity lock (e.g., a
separate Mutex per identity or a keyed lock structure) so get_identity_nonce and
get_identity_contract_nonce only serialize for the same identity rather than
globally. Ensure this change touches the codepaths using get_or_fetch_nonce,
internal_cache.identity_nonce_counter, and the async fetch closure so no mutex
guard is held across await.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@packages/rs-sdk/src/sdk.rs`:
- Around line 461-473: The stale-by-time re-fetch path in refresh_identity_nonce
currently overwrites the cache via entry.and_modify(|e| *e = (insert_nonce,
...)) using platform_nonce (plus bump_first) and ignores the existing cached
nonce, causing stale fetches to return the same nonce; change the logic so when
handling the stale-by-time branch you read the existing cached nonce from the
entry (if present) and compute insert_nonce = max(existing_cached_nonce,
platform_nonce + (if bump_first {1} else {0})), while preserving the original
"drift" branch behavior that intentionally resets to platform_nonce; update the
code around fetch_from_platform(), insert_nonce, and the
entry.and_modify/or_insert calls to use that max for the stale path.

---

Nitpick comments:
In `@packages/rs-sdk/src/sdk.rs`:
- Around line 496-518: The current refresh_identity_nonce implementation
correctly marks timestamps as 0 but scans the entire
internal_cache.identity_contract_nonce_counter BTreeMap to find entries for an
identity; optimize by performing a contiguous range scan instead of iterating
all entries — use a range_mut over keys starting at (identity_id.clone(), <min
contract id>) and ending at (identity_id.clone(), <max contract id>) (or
appropriate Bound bounds) to visit only entries whose first tuple element equals
identity_id and set their timestamp to 0; keep the existing
identity_nonce_counter logic unchanged.
- Around line 371-383: The code currently holds
internal_cache.identity_nonce_counter.lock() across an await inside
Self::get_or_fetch_nonce, which serialises all concurrent nonce requests; change
the flow so the map lock is not held during the async fetch by: 1) access the
map under internal_cache.identity_nonce_counter only to check/clone the existing
Entry value or to insert a lightweight “pending” marker, then drop the mutex
before performing the async platform call invoked via
IdentityNonceFetcher::fetch_with_settings; 2) after the fetch completes,
re-acquire the identity_nonce_counter mutex and update/replace the entry with
the fetched nonce (or remove the pending marker and insert the real value);
alternatively replace the single map mutex with a per-identity lock (e.g., a
separate Mutex per identity or a keyed lock structure) so get_identity_nonce and
get_identity_contract_nonce only serialize for the same identity rather than
globally. Ensure this change touches the codepaths using get_or_fetch_nonce,
internal_cache.identity_nonce_counter, and the async fetch closure so no mutex
guard is held across await.

… higher cached nonce - Do not hold the Mutex across the async fetch_from_platform call; split get_or_fetch_nonce into lock-check-unlock / fetch / lock-update so other callers are not blocked during the network round-trip. - After re-fetching, keep the higher of the cached nonce vs the Platform nonce to prevent regression when Platform has not yet indexed a recent successful broadcast. - Replace panic!("vacant entry not possible") with a structure that cannot reach the impossible state (early-return from cache hit). - Tighten drift threshold from > to >= MAX_MISSING_IDENTITY_REVISIONS as a precaution to stay within Platform's own limit. - Simplify cache write from and_modify().or_insert() to a single insert. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (1)

packages/rs-sdk/src/sdk.rs (1)

412-428: Optional: add tracing to get_identity_contract_nonce for consistency with get_identity_nonce.

get_identity_nonce (lines 390–395) emits a trace! after the fetch. get_identity_contract_nonce returns directly without logging, making it harder to observe nonce progression in traces.

♻️ Proposed addition

     ) -> Result<IdentityNonce, Error> {
         let settings = settings.unwrap_or_default();
-        Self::get_or_fetch_nonce(
+        let nonce = Self::get_or_fetch_nonce(
             &self.internal_cache.identity_contract_nonce_counter,
             (identity_id, contract_id),
             bump_first,
             &settings,
             || async {
                 Ok(IdentityContractNonceFetcher::fetch_with_settings(
                     self,
                     (identity_id, contract_id),
                     settings.request_settings,
                 )
                 .await?
                 .unwrap_or(IdentityContractNonceFetcher(0))
                 .0)
             },
         )
-        .await
+        .await?;
+
+        tracing::trace!(
+            identity_id = %identity_id,
+            contract_id = %contract_id,
+            bump_first,
+            nonce,
+            "Fetched identity contract nonce"
+        );
+
+        Ok(nonce)
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/rs-sdk/src/sdk.rs` around lines 412 - 428, Add a trace call to
get_identity_contract_nonce to mirror get_identity_nonce: after awaiting the
result of Self::get_or_fetch_nonce in get_identity_contract_nonce, emit a trace!
that logs the resolved nonce along with identity_id and contract_id for
observability; update the function surrounding the
Self::get_or_fetch_nonce(...).await call to capture the returned nonce in a
local variable and then call trace! before returning it.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/rs-sdk/src/sdk.rs`:
- Around line 412-428: Add a trace call to get_identity_contract_nonce to mirror
get_identity_nonce: after awaiting the result of Self::get_or_fetch_nonce in
get_identity_contract_nonce, emit a trace! that logs the resolved nonce along
with identity_id and contract_id for observability; update the function
surrounding the Self::get_or_fetch_nonce(...).await call to capture the returned
nonce in a local variable and then call trace! before returning it.

Cover get_or_fetch_nonce with 25 parameterized test cases: - empty cache fetch (basic, zero, filter-max, upper-bits stripping) - fresh cache hit (no-bump, bump, drift below threshold) - stale/drifted cache re-fetch (no-regress, platform-higher, drift-at-max) - error propagation, refresh staleness, key isolation, concurrency races Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (2)

packages/rs-sdk/src/sdk.rs (2)

1367-1389: Test coverage is comprehensive; minor nit on test case name.

The "filter max with bump overflows" label (line 1372) implies a u64 integer overflow. Since IDENTITY_NONCE_VALUE_FILTER ≈ (1 << 40) - 1, the bumped result IDENTITY_NONCE_VALUE_FILTER + 1 = 1 << 40 is well within u64's range. A clearer name would be "nonce exceeds filter range after bump" to signal that the stored value has bits beyond what Platform would return (not an arithmetic overflow).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/rs-sdk/src/sdk.rs` around lines 1367 - 1389, Test case label is
misleading: in the test function empty_cache_fetch update the test_case string
"filter max with bump overflows" to a clearer label such as "nonce exceeds
filter range after bump" so it no longer implies an arithmetic u64 overflow;
modify the test_case for IDENTITIY_NONCE_VALUE_FILTER + 1 accordingly in the
list of #[test_case(... ; "...")] entries.

400-429: get_identity_contract_nonce is missing the observability trace log present in get_identity_nonce.

get_identity_nonce (line 390) logs identity_id, bump_first, and the resolved nonce. get_identity_contract_nonce has no equivalent trace, making it harder to debug nonce issues for identity–contract pairs.

🪵 Suggested trace log

     ) -> Result<IdentityNonce, Error> {
         let settings = settings.unwrap_or_default();
-        Self::get_or_fetch_nonce(
+        let nonce = Self::get_or_fetch_nonce(
             &self.internal_cache.identity_contract_nonce_counter,
             (identity_id, contract_id),
             bump_first,
             &settings,
             || async {
                 Ok(IdentityContractNonceFetcher::fetch_with_settings(
                     self,
                     (identity_id, contract_id),
                     settings.request_settings,
                 )
                 .await?
                 .unwrap_or(IdentityContractNonceFetcher(0))
                 .0)
             },
         )
-        .await
+        .await?;
+
+        tracing::trace!(
+            identity_id = %identity_id,
+            contract_id = %contract_id,
+            bump_first,
+            nonce,
+            "Fetched identity contract nonce"
+        );
+
+        Ok(nonce)
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/rs-sdk/src/sdk.rs` around lines 400 - 429, Add the same
observability trace in get_identity_contract_nonce that get_identity_nonce uses:
after resolving the nonce (i.e., once the Value returned by
Self::get_or_fetch_nonce is available), emit the same trace-level log including
identity_id, contract_id, bump_first and the resolved nonce using the same trace
logging call used in get_identity_nonce; locate this in
get_identity_contract_nonce (and/or inside the fetch closure if needed) and
format the fields identically to the existing get_identity_nonce trace so
debugging for identity–contract nonces matches the single-identity case.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@packages/rs-sdk/src/sdk.rs`:
- Around line 437-503: The code uses IDENTITY_NONCE_VALUE_FILTER in a bitwise
AND with a u64 and compares MAX_MISSING_IDENTITY_REVISIONS with a u64 result, so
ensure both DPP constants are u64 (or explicitly cast them); locate the
constants IDENTITY_NONCE_VALUE_FILTER and MAX_MISSING_IDENTITY_REVISIONS in DPP
and make them pub const ...: u64 = ... (or, if you cannot change their
definitions, cast where used in get_or_fetch_nonce: e.g.
fetch_from_platform().await? & (IDENTITY_NONCE_VALUE_FILTER as u64) and compare
with (MAX_MISSING_IDENTITY_REVISIONS as u64) in the drifted calculation to
guarantee types align).

---

Nitpick comments:
In `@packages/rs-sdk/src/sdk.rs`:
- Around line 1367-1389: Test case label is misleading: in the test function
empty_cache_fetch update the test_case string "filter max with bump overflows"
to a clearer label such as "nonce exceeds filter range after bump" so it no
longer implies an arithmetic u64 overflow; modify the test_case for
IDENTITIY_NONCE_VALUE_FILTER + 1 accordingly in the list of #[test_case(... ;
"...")] entries.
- Around line 400-429: Add the same observability trace in
get_identity_contract_nonce that get_identity_nonce uses: after resolving the
nonce (i.e., once the Value returned by Self::get_or_fetch_nonce is available),
emit the same trace-level log including identity_id, contract_id, bump_first and
the resolved nonce using the same trace logging call used in get_identity_nonce;
locate this in get_identity_contract_nonce (and/or inside the fetch closure if
needed) and format the fields identically to the existing get_identity_nonce
trace so debugging for identity–contract nonces matches the single-identity
case.

shumkov · 2026-02-20T11:30:26Z

packages/rs-sdk/src/internal_cache/mod.rs

+/// - `last_platform_nonce` — the value Platform returned on the last fetch
+///   (after stripping upper bits). Used to detect when local bumps have
+///   drifted too far ahead, triggering a re-fetch.
+pub(crate) type NonceCacheEntry = (IdentityNonce, LastQueryTimestamp, IdentityNonce);


Let's do struct

shumkov · 2026-02-20T11:31:51Z

packages/rs-sdk/src/internal_cache/mod.rs

-        BTreeMap<(Identifier, Identifier), (prelude::IdentityNonce, LastQueryTimestamp)>,
-    >,
+    pub(crate) identity_contract_nonce_counter:
+        tokio::sync::Mutex<BTreeMap<(Identifier, Identifier), NonceCacheEntry>>,


(Identifier, Identifier) terrible too

shumkov · 2026-02-20T11:33:09Z

packages/rs-sdk/src/sdk.rs

+        let nonce = Self::get_or_fetch_nonce(
+            &self.internal_cache.identity_nonce_counter,
+            identity_id,
+            bump_first,
+            &settings,
+            || async {
+                Ok(IdentityNonceFetcher::fetch_with_settings(
+                    self,
+                    identity_id,
+                    settings.request_settings,
+                )
+                .await?
+                .unwrap_or(IdentityNonceFetcher(0))
+                .0)
+            },
+        )
+        .await?;


Let's extract this into a method

shumkov · 2026-02-20T11:36:12Z

packages/rs-sdk/src/internal_cache/mod.rs

    /// returns an error or is updated.
-    pub(crate) identity_nonce_counter:
-        tokio::sync::Mutex<BTreeMap<Identifier, (prelude::IdentityNonce, LastQueryTimestamp)>>,
+    pub(crate) identity_nonce_counter: tokio::sync::Mutex<BTreeMap<Identifier, NonceCacheEntry>>,


The better design would be to have a separate struct that inclapsulates all related logic (methods) which instance we have in SDK. In this case we don't need to spread logic around.

shumkov

It would be nice to have a functional test that repoduce previous problem but fixed with new code

Address PR #3111 review comments from shumkov: - Replace NonceCacheEntry tuple with named struct (current_nonce, last_fetch_timestamp, last_fetched_platform_nonce) - Replace (Identifier, Identifier) key with IdentityContractPair struct - Create NonceCache struct encapsulating all nonce logic with per-map Mutex locking and configurable stale time - Remove InternalSdkCache (was nonce-only, pointless wrapper) - Make Sdk nonce methods thin wrappers delegating to NonceCache - Move get_or_fetch_nonce, get_current_time_seconds, DEFAULT_IDENTITY_NONCE_STALE_TIME_S, and tests to internal_cache/mod.rs - Remove LastQueryTimestamp and StalenessInSeconds type aliases from sdk.rs Public API unchanged — all callers continue using sdk.get_identity_nonce(), sdk.get_identity_contract_nonce(), and sdk.refresh_identity_nonce(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…race and bound memory - Switch from BTreeMap to lru::LruCache (bounded to 100 entries) for both identity and contract nonce caches, fixing unbounded growth - Hold cache lock end-to-end across async Platform fetch, eliminating the TOCTOU race condition (SEC-001) that could cause nonce regression - Apply IDENTITY_NONCE_VALUE_FILTER mask on every nonce bump, not just after Platform fetch (SEC-002) - Change refresh() to pop() entries entirely instead of zeroing timestamps, matching pre-refactor behavior (CODE-001) - Return Result from get_current_time_seconds() instead of panicking on clock error (RUST-001) - Do not update last_fetch_timestamp on cache-only bumps (CODE-002) - Make lru a non-optional dependency (remove feature-gating behind mocks) - Update tests: add LRU eviction, nonce filter masking, and timestamp preservation tests; remove concurrent_update tests (race window gone) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… and tracing Address five security findings (SEC-001 through SEC-005) in the NonceCache: - SEC-001: Return `Error::NonceOverflow` instead of silently wrapping the nonce to zero when it reaches `IDENTITY_NONCE_VALUE_FILTER`. Wrapping would permanently lock out the identity. - SEC-002: Bump default LRU cache capacity from 100 to 1000 to reduce silent eviction of actively-bumped nonces. - SEC-003: Restore three-phase lock pattern — check cache under lock, release lock during platform fetch, re-acquire and merge with `max(cached, platform)`. Eliminates serialization of all nonce fetches across the async boundary. - SEC-004: Add `tracing::trace!` for cache misses, stale/drift refetches, and anti-regression events. No logging on the hot cache-hit path. - SEC-005: Capture a fresh timestamp after the platform fetch so `last_fetch_timestamp` reflects actual receipt time. Also removes all runtime `expect()` calls — the remaining `.expect()` on `NonZeroUsize::new()` is in a `const` context (compile-time evaluation). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace `.unwrap_or(0)` fallback with an explicit error when Platform returns no nonce for an identity or identity-contract pair. A stale DAPI node that has not yet indexed the identity would previously cause the SDK to silently use nonce 0, leading to rejected state transitions. Now the closure surfaces `Error::IdentityNonceNotFound` and logs a `warn!` so the caller can retry against a different node. Also documents the LRU eviction, drift threshold, and DAPI staleness edge cases on `NonceCache`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Remove dead `From<(Identifier, Identifier)>` impl (RUST-005) - Replace double hash lookup (peek + get_mut) with single get_mut (RUST-002) - Add `Debug` bound to generic key, include key in trace logs (RUST-003) - Add manual `Debug` impl for `NonceCache` (RUST-004) - Remove redundant test imports after `use super::*` (RUST-006) - Add test for bump_first=false after bump_first=true sequence (CODE-004) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Preserve max(old, new) last_fetched_platform_nonce when cached nonce wins the Phase 3 merge, preventing drift inflation from a temporarily lower platform value (CODE-001) - Simplify bump_nonce to explicit boundary check instead of indirect masking arithmetic (RUST-001) - Add concurrent bump test (20 tasks via Barrier on fresh cache) - Add concurrent stale-fetch test (50 tasks through Phase 2 merge) Both concurrency tests assert all returned nonces are unique, validating the three-phase locking under contention. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ariants The rs-sdk added two new error variants (NonceOverflow, IdentityNonceNotFound) but the wasm-sdk error conversion was not updated, causing a non-exhaustive match compilation failure in CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… retry The LruCache refactoring (d1225cc) inadvertently changed refresh() from "mark stale" to "pop entry", reintroducing the exact bug from dashpay/dash-evo-tool#588: after a broadcast failure, the bumped nonce was lost, causing the retry to create a byte-identical state transition ("tx already exists in cache"). Changed refresh() to set last_fetch_timestamp = 0 instead of popping, preserving the cached nonce so Phase 3's max(cached, platform) merge advances past the already-used nonce. Added tests covering both scenarios: Platform returning a stale nonce (cached value preserved) and Platform returning a higher nonce (Platform value used). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Extract STALE_TIMESTAMP constant to replace magic value 0 - Document intentional get_mut() LRU promotion in refresh() — refreshed entries are actively in use and should be preserved under eviction pressure - Fix stale section comment that still said "removes entries" - Fix primary test to use PutSettings::default() so the stale timestamp actually triggers a Platform re-fetch; add AtomicBool assertion verifying the fetch closure is invoked Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…instead Move Platform fetch logic from Sdk call sites into NonceCache's pub(crate) methods (get_identity_nonce, get_identity_contract_nonce). The private get_or_fetch_nonce helper retains its generic closure signature unchanged. This eliminates closure boilerplate from sdk.rs and co-locates the fetch + cache logic in one place. Refresh tests now use Sdk::new_mock() with expect_fetch expectations instead of hand-rolled closures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

lklimek requested a review from QuantumExplorer as a code owner February 19, 2026 12:21

lklimek changed the base branch from v2.1-dev to v3.1-dev February 19, 2026 12:22

lklimek requested a review from shumkov as a code owner February 19, 2026 12:22

github-actions bot added this to the v3.1.0 milestone Feb 19, 2026

lklimek and others added 2 commits February 19, 2026 13:37

clippy: remove unused Sdk::parse_proof, Sdk::parse_proof_with_metadata

18dff6c

coderabbitai bot reviewed Feb 19, 2026

View reviewed changes

lklimek and others added 2 commits February 19, 2026 13:59

lklimek force-pushed the fix/sdk-nonce-cache-regression branch from c508de1 to 8aae8c2 Compare February 19, 2026 13:08

coderabbitai bot reviewed Feb 19, 2026

View reviewed changes

packages/rs-sdk/src/sdk.rs Outdated Show resolved Hide resolved

coderabbitai bot reviewed Feb 19, 2026

View reviewed changes

lklimek and others added 2 commits February 19, 2026 15:08

Merge branch 'v3.1-dev' into fix/sdk-nonce-cache-regression

28653d3

coderabbitai bot reviewed Feb 20, 2026

View reviewed changes

shumkov reviewed Feb 20, 2026

View reviewed changes

shumkov requested changes Feb 20, 2026

View reviewed changes

lklimek and others added 8 commits February 23, 2026 08:47

Merge branch 'v3.1-dev' into fix/sdk-nonce-cache-regression

e76f8f8

lklimek changed the title ~~fix(sdk): prevent nonce cache regression on broadcast failure~~ refactor(sdk): rewrite NonceCache with LRU eviction, drift detection, and structured errors Feb 23, 2026

lklimek requested a review from shumkov February 23, 2026 12:12

lklimek and others added 3 commits February 23, 2026 13:28

Comments

Conversation

lklimek commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue being fixed or feature implemented

What was done?

Rewrite: Dedicated NonceCache struct

Three-phase locking

Drift detection

Upper-bit masking at fetch

Stale marking instead of removal

Overflow guard

IdentityNonceNotFound error

wasm-sdk error handling

Cleanup

How Has This Been Tested?

Breaking Changes

Checklist:

Uh oh!

coderabbitai bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ gRPC Query Coverage Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

shumkov Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

shumkov Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

shumkov Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

shumkov Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

shumkov left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lklimek commented Feb 19, 2026 •

edited

Loading

Rewrite: Dedicated `NonceCache` struct

coderabbitai bot commented Feb 19, 2026 •

edited

Loading

github-actions bot commented Feb 19, 2026 •

edited

Loading