refactor(sdk): rewrite NonceCache with LRU eviction, drift detection, and structured errors#3111
refactor(sdk): rewrite NonceCache with LRU eviction, drift detection, and structured errors#3111
Conversation
When broadcast() fails, refresh_identity_nonce() previously removed nonce cache entries entirely. If the TX had partially reached some Tenderdash nodes (entered their mempool) before the error, the next retry would re-fetch the old nonce from Platform (TX not yet processed), derive the same nonce, create identical TX bytes, and get rejected with "tx already exists in cache". Fix by marking cache entries as stale (timestamp=0) instead of removing them. This forces a re-fetch from Platform on the next use while preserving the cached nonce value. The existing MAX(platform, cached) comparison then ensures the nonce always advances, preventing the SDK from recreating an identical state transition. Closes dashpay/dash-evo-tool#588 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRefactors nonce caching to store triples (insert_nonce, timestamp, platform_nonce); centralizes nonce retrieval into a shared get_or_fetch_nonce flow with drift-aware re-fetch and masked platform nonces; removes two Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant SDK
participant InternalCache
participant Platform
Client->>SDK: request nonce for identity / identity+contract
SDK->>InternalCache: read (insert_nonce, ts, platform_nonce)
alt cache considered fresh (drift/revisions ok)
InternalCache-->>SDK: return cached insert_nonce (and metadata)
SDK-->>Client: return cached nonce
else cache stale or missing
SDK->>Platform: fetch platform_nonce (async, without cache lock)
Platform-->>SDK: return platform_nonce
SDK->>SDK: platform_masked = platform_nonce & IDENTITY_NONCE_VALUE_FILTER
SDK->>SDK: combine/compute insert_nonce (apply bump_first if required)
SDK->>InternalCache: write (insert_nonce, now_timestamp, platform_masked)
SDK-->>Client: return updated nonce
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…turn path The nonce cache previously stored raw values with the upper "missing revisions" bits intact, requiring IDENTITY_NONCE_VALUE_FILTER masking on every return path. This made comparisons between platform_nonce and cached values unreliable since the upper bits could differ. Strip the upper bits immediately after fetching from Platform so the cache only ever holds plain nonce values. This simplifies all downstream comparisons, prevents +1 increments from overflowing into the upper bits, and removes redundant masking from 7 return paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
✅ gRPC Query Coverage Report |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/rs-sdk/src/sdk.rs (1)
504-550: Consider extracting the shared nonce fetch-and-update logic.The masking, drift detection, and
effective_currentcomputation inget_identity_contract_nonce(lines 505–550) are identical toget_identity_nonce(lines 391–438). A shared helper (e.g.,resolve_nonce(platform_nonce, cached_entry, bump_first, current_time_s)) would eliminate duplication and ensure both paths stay in sync when the logic evolves. The existingTODO: Move to a separate structat line 469 aligns with this.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/rs-sdk/src/sdk.rs` around lines 504 - 550, Extract the duplicated masking, drift-detection, and cache-update logic into a single helper (e.g., resolve_nonce or a method on a new NonceResolver struct) and call it from both get_identity_contract_nonce and get_identity_nonce; the helper should accept the platform_nonce (already masked with IDENTITY_NONCE_VALUE_FILTER as done via IdentityContractNonceFetcher), a mutable reference to the cache Entry (Vacant/Occupied), bump_first, and current_time_s, compute effective_current using MAX_MISSING_IDENTITY_REVISIONS, decide insert_nonce (handling platform_nonce > effective_current and bump_first cases), update the cache entry with (insert_nonce, current_time_s), and return the insert_nonce or Result as needed so both call sites reuse identical behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@packages/rs-sdk/src/sdk.rs`:
- Around line 504-550: Extract the duplicated masking, drift-detection, and
cache-update logic into a single helper (e.g., resolve_nonce or a method on a
new NonceResolver struct) and call it from both get_identity_contract_nonce and
get_identity_nonce; the helper should accept the platform_nonce (already masked
with IDENTITY_NONCE_VALUE_FILTER as done via IdentityContractNonceFetcher), a
mutable reference to the cache Entry (Vacant/Occupied), bump_first, and
current_time_s, compute effective_current using MAX_MISSING_IDENTITY_REVISIONS,
decide insert_nonce (handling platform_nonce > effective_current and bump_first
cases), update the cache entry with (insert_nonce, current_time_s), and return
the insert_nonce or Result as needed so both call sites reuse identical
behavior.
…tform The nonce cache could drift arbitrarily far ahead of Platform through repeated bump_first calls without successful broadcasts. Each bump refreshed the cache timestamp, preventing time-based staleness from triggering a re-fetch. Add last_platform_nonce to the cache entry and check drift inline in should_query_platform. When the cached nonce exceeds the last known platform value by more than MAX_MISSING_IDENTITY_REVISIONS, force a re-fetch and use platform's response as the source of truth. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ce helper get_identity_nonce and get_identity_contract_nonce had identical cache logic differing only in key type and fetcher. Extract the shared staleness check, drift detection, fetch, bump, and cache update into a generic async helper that both methods delegate to. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
c508de1 to
8aae8c2
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/rs-sdk/src/internal_cache/mod.rs (1)
7-15: Consider a named struct to prevent positional confusion between the twoIdentityNoncefields.Both
.0(current_nonce) and.2(last_platform_nonce) share the sameIdentityNoncetype. A swap at any destructuring site (e.g.,let (last_platform_nonce, _, current_nonce) = e.get()) compiles silently but produces wrong semantics.♻️ Replace tuple alias with a named struct
-/// Cached nonce state: `(current_nonce, last_fetch_timestamp, last_platform_nonce)`. -/// -/// - `current_nonce` — the nonce value the SDK will use next (may be bumped -/// ahead of what Platform reported). -/// - `last_fetch_timestamp` — when we last fetched from Platform. -/// - `last_platform_nonce` — the value Platform returned on the last fetch -/// (after stripping upper bits). Used to detect when local bumps have -/// drifted too far ahead, triggering a re-fetch. -pub(crate) type NonceCacheEntry = (IdentityNonce, LastQueryTimestamp, IdentityNonce); +/// Cached nonce state for identity/contract nonce tracking. +#[derive(Debug, Clone, Copy)] +pub(crate) struct NonceCacheEntry { + /// The nonce value the SDK will use next (may be bumped ahead of Platform). + pub current_nonce: IdentityNonce, + /// Unix timestamp (seconds) of the last Platform fetch; 0 means stale. + pub last_fetch_timestamp: LastQueryTimestamp, + /// Plain nonce Platform returned on the last fetch (upper bits stripped). + pub last_platform_nonce: IdentityNonce, +}All destructuring sites in
sdk.rswould then use named fields instead of positional indexing.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/rs-sdk/src/internal_cache/mod.rs` around lines 7 - 15, The tuple alias NonceCacheEntry uses two IdentityNonce fields which is error-prone; replace the alias with a small named struct (e.g., struct NonceCacheEntry { current_nonce: IdentityNonce, last_fetch_timestamp: LastQueryTimestamp, last_platform_nonce: IdentityNonce }) and update all uses (including destructuring sites in sdk.rs that reference NonceCacheEntry::get()/e.get()) to use the named fields instead of positional .0/.1/.2 to prevent silent swaps between current_nonce and last_platform_nonce. Ensure the struct derives or implements the same traits previously relied on (Clone/Copy/Debug/etc.) so callers keep compiling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/rs-sdk/src/sdk.rs`:
- Around line 407-420: In get_or_fetch_nonce, inside the should_query_platform =
true branch where you compute insert_nonce from platform_nonce, update the logic
to account for any existing cached nonce: when handling Entry::Occupied (and
generally before inserting), read the existing cached nonce (the first element
of the stored tuple) and set base = max(existing_cached_nonce, platform_nonce),
then compute insert_nonce = base + 1 and store (insert_nonce, current_time_s,
platform_nonce); do this instead of unconditionally using platform_nonce so
retries use max(existing, platform) as the base.
---
Nitpick comments:
In `@packages/rs-sdk/src/internal_cache/mod.rs`:
- Around line 7-15: The tuple alias NonceCacheEntry uses two IdentityNonce
fields which is error-prone; replace the alias with a small named struct (e.g.,
struct NonceCacheEntry { current_nonce: IdentityNonce, last_fetch_timestamp:
LastQueryTimestamp, last_platform_nonce: IdentityNonce }) and update all uses
(including destructuring sites in sdk.rs that reference
NonceCacheEntry::get()/e.get()) to use the named fields instead of positional
.0/.1/.2 to prevent silent swaps between current_nonce and last_platform_nonce.
Ensure the struct derives or implements the same traits previously relied on
(Clone/Copy/Debug/etc.) so callers keep compiling.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
packages/rs-sdk/src/sdk.rs (2)
496-518:refresh_identity_nonce— stale-marking logic is correct; consider a range scan for contract nonces.Setting
timestamp = 0reliably triggers the staleness check on next access (since0 < current_time_s − stale_time_sholds for any real UNIX timestamp). The logic is sound.Minor: the contract-nonce path (lines 510–516) does a full linear scan over all entries to find those matching
identity_id. Since the BTreeMap key is(identity_id, contract_id), matching entries are contiguous and a range query (range((identity_id, Identifier::default())..)) would avoid iterating unrelated entries. Not a correctness issue, just a minor efficiency opportunity if the cache grows large.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/rs-sdk/src/sdk.rs` around lines 496 - 518, The current refresh_identity_nonce implementation correctly marks timestamps as 0 but scans the entire internal_cache.identity_contract_nonce_counter BTreeMap to find entries for an identity; optimize by performing a contiguous range scan instead of iterating all entries — use a range_mut over keys starting at (identity_id.clone(), <min contract id>) and ending at (identity_id.clone(), <max contract id>) (or appropriate Bound bounds) to visit only entries whose first tuple element equals identity_id and set their timestamp to 0; keep the existing identity_nonce_counter logic unchanged.
371-383: Tokio mutex held across async platform fetch serialises all nonce requests.Both
get_identity_nonceandget_identity_contract_noncelock the shared nonce counter, then await a potential network call insideget_or_fetch_nonce. Because theEntryborrows the map, the lock cannot be dropped until the fetch completes. Under concurrent callers this means every identity's nonce request queues behind any in-flight platform fetch, regardless of identity.This is an inherent limitation of the current
Entry-based API and is pre-existing — but the refactor makes it structurally clearer. A future optimisation would be a per-identity (or per-key) lock rather than a single map-level mutex.Also applies to: 407-427
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/rs-sdk/src/sdk.rs` around lines 371 - 383, The code currently holds internal_cache.identity_nonce_counter.lock() across an await inside Self::get_or_fetch_nonce, which serialises all concurrent nonce requests; change the flow so the map lock is not held during the async fetch by: 1) access the map under internal_cache.identity_nonce_counter only to check/clone the existing Entry value or to insert a lightweight “pending” marker, then drop the mutex before performing the async platform call invoked via IdentityNonceFetcher::fetch_with_settings; 2) after the fetch completes, re-acquire the identity_nonce_counter mutex and update/replace the entry with the fetched nonce (or remove the pending marker and insert the real value); alternatively replace the single map mutex with a per-identity lock (e.g., a separate Mutex per identity or a keyed lock structure) so get_identity_nonce and get_identity_contract_nonce only serialize for the same identity rather than globally. Ensure this change touches the codepaths using get_or_fetch_nonce, internal_cache.identity_nonce_counter, and the async fetch closure so no mutex guard is held across await.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@packages/rs-sdk/src/sdk.rs`:
- Around line 461-473: The stale-by-time re-fetch path in refresh_identity_nonce
currently overwrites the cache via entry.and_modify(|e| *e = (insert_nonce,
...)) using platform_nonce (plus bump_first) and ignores the existing cached
nonce, causing stale fetches to return the same nonce; change the logic so when
handling the stale-by-time branch you read the existing cached nonce from the
entry (if present) and compute insert_nonce = max(existing_cached_nonce,
platform_nonce + (if bump_first {1} else {0})), while preserving the original
"drift" branch behavior that intentionally resets to platform_nonce; update the
code around fetch_from_platform(), insert_nonce, and the
entry.and_modify/or_insert calls to use that max for the stale path.
---
Nitpick comments:
In `@packages/rs-sdk/src/sdk.rs`:
- Around line 496-518: The current refresh_identity_nonce implementation
correctly marks timestamps as 0 but scans the entire
internal_cache.identity_contract_nonce_counter BTreeMap to find entries for an
identity; optimize by performing a contiguous range scan instead of iterating
all entries — use a range_mut over keys starting at (identity_id.clone(), <min
contract id>) and ending at (identity_id.clone(), <max contract id>) (or
appropriate Bound bounds) to visit only entries whose first tuple element equals
identity_id and set their timestamp to 0; keep the existing
identity_nonce_counter logic unchanged.
- Around line 371-383: The code currently holds
internal_cache.identity_nonce_counter.lock() across an await inside
Self::get_or_fetch_nonce, which serialises all concurrent nonce requests; change
the flow so the map lock is not held during the async fetch by: 1) access the
map under internal_cache.identity_nonce_counter only to check/clone the existing
Entry value or to insert a lightweight “pending” marker, then drop the mutex
before performing the async platform call invoked via
IdentityNonceFetcher::fetch_with_settings; 2) after the fetch completes,
re-acquire the identity_nonce_counter mutex and update/replace the entry with
the fetched nonce (or remove the pending marker and insert the real value);
alternatively replace the single map mutex with a per-identity lock (e.g., a
separate Mutex per identity or a keyed lock structure) so get_identity_nonce and
get_identity_contract_nonce only serialize for the same identity rather than
globally. Ensure this change touches the codepaths using get_or_fetch_nonce,
internal_cache.identity_nonce_counter, and the async fetch closure so no mutex
guard is held across await.
… higher cached nonce
- Do not hold the Mutex across the async fetch_from_platform call;
split get_or_fetch_nonce into lock-check-unlock / fetch / lock-update
so other callers are not blocked during the network round-trip.
- After re-fetching, keep the higher of the cached nonce vs the Platform
nonce to prevent regression when Platform has not yet indexed a recent
successful broadcast.
- Replace panic!("vacant entry not possible") with a structure that
cannot reach the impossible state (early-return from cache hit).
- Tighten drift threshold from > to >= MAX_MISSING_IDENTITY_REVISIONS
as a precaution to stay within Platform's own limit.
- Simplify cache write from and_modify().or_insert() to a single insert.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/rs-sdk/src/sdk.rs (1)
412-428: Optional: add tracing toget_identity_contract_noncefor consistency withget_identity_nonce.
get_identity_nonce(lines 390–395) emits atrace!after the fetch.get_identity_contract_noncereturns directly without logging, making it harder to observe nonce progression in traces.♻️ Proposed addition
) -> Result<IdentityNonce, Error> { let settings = settings.unwrap_or_default(); - Self::get_or_fetch_nonce( + let nonce = Self::get_or_fetch_nonce( &self.internal_cache.identity_contract_nonce_counter, (identity_id, contract_id), bump_first, &settings, || async { Ok(IdentityContractNonceFetcher::fetch_with_settings( self, (identity_id, contract_id), settings.request_settings, ) .await? .unwrap_or(IdentityContractNonceFetcher(0)) .0) }, ) - .await + .await?; + + tracing::trace!( + identity_id = %identity_id, + contract_id = %contract_id, + bump_first, + nonce, + "Fetched identity contract nonce" + ); + + Ok(nonce) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/rs-sdk/src/sdk.rs` around lines 412 - 428, Add a trace call to get_identity_contract_nonce to mirror get_identity_nonce: after awaiting the result of Self::get_or_fetch_nonce in get_identity_contract_nonce, emit a trace! that logs the resolved nonce along with identity_id and contract_id for observability; update the function surrounding the Self::get_or_fetch_nonce(...).await call to capture the returned nonce in a local variable and then call trace! before returning it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@packages/rs-sdk/src/sdk.rs`:
- Around line 412-428: Add a trace call to get_identity_contract_nonce to mirror
get_identity_nonce: after awaiting the result of Self::get_or_fetch_nonce in
get_identity_contract_nonce, emit a trace! that logs the resolved nonce along
with identity_id and contract_id for observability; update the function
surrounding the Self::get_or_fetch_nonce(...).await call to capture the returned
nonce in a local variable and then call trace! before returning it.
Cover get_or_fetch_nonce with 25 parameterized test cases: - empty cache fetch (basic, zero, filter-max, upper-bits stripping) - fresh cache hit (no-bump, bump, drift below threshold) - stale/drifted cache re-fetch (no-regress, platform-higher, drift-at-max) - error propagation, refresh staleness, key isolation, concurrency races Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
packages/rs-sdk/src/sdk.rs (2)
1367-1389: Test coverage is comprehensive; minor nit on test case name.The
"filter max with bump overflows"label (line 1372) implies au64integer overflow. SinceIDENTITY_NONCE_VALUE_FILTER ≈ (1 << 40) - 1, the bumped resultIDENTITY_NONCE_VALUE_FILTER + 1 = 1 << 40is well withinu64's range. A clearer name would be"nonce exceeds filter range after bump"to signal that the stored value has bits beyond what Platform would return (not an arithmetic overflow).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/rs-sdk/src/sdk.rs` around lines 1367 - 1389, Test case label is misleading: in the test function empty_cache_fetch update the test_case string "filter max with bump overflows" to a clearer label such as "nonce exceeds filter range after bump" so it no longer implies an arithmetic u64 overflow; modify the test_case for IDENTITIY_NONCE_VALUE_FILTER + 1 accordingly in the list of #[test_case(... ; "...")] entries.
400-429:get_identity_contract_nonceis missing the observability trace log present inget_identity_nonce.
get_identity_nonce(line 390) logsidentity_id,bump_first, and the resolvednonce.get_identity_contract_noncehas no equivalent trace, making it harder to debug nonce issues for identity–contract pairs.🪵 Suggested trace log
) -> Result<IdentityNonce, Error> { let settings = settings.unwrap_or_default(); - Self::get_or_fetch_nonce( + let nonce = Self::get_or_fetch_nonce( &self.internal_cache.identity_contract_nonce_counter, (identity_id, contract_id), bump_first, &settings, || async { Ok(IdentityContractNonceFetcher::fetch_with_settings( self, (identity_id, contract_id), settings.request_settings, ) .await? .unwrap_or(IdentityContractNonceFetcher(0)) .0) }, ) - .await + .await?; + + tracing::trace!( + identity_id = %identity_id, + contract_id = %contract_id, + bump_first, + nonce, + "Fetched identity contract nonce" + ); + + Ok(nonce) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/rs-sdk/src/sdk.rs` around lines 400 - 429, Add the same observability trace in get_identity_contract_nonce that get_identity_nonce uses: after resolving the nonce (i.e., once the Value returned by Self::get_or_fetch_nonce is available), emit the same trace-level log including identity_id, contract_id, bump_first and the resolved nonce using the same trace logging call used in get_identity_nonce; locate this in get_identity_contract_nonce (and/or inside the fetch closure if needed) and format the fields identically to the existing get_identity_nonce trace so debugging for identity–contract nonces matches the single-identity case.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@packages/rs-sdk/src/sdk.rs`:
- Around line 437-503: The code uses IDENTITY_NONCE_VALUE_FILTER in a bitwise
AND with a u64 and compares MAX_MISSING_IDENTITY_REVISIONS with a u64 result, so
ensure both DPP constants are u64 (or explicitly cast them); locate the
constants IDENTITY_NONCE_VALUE_FILTER and MAX_MISSING_IDENTITY_REVISIONS in DPP
and make them pub const ...: u64 = ... (or, if you cannot change their
definitions, cast where used in get_or_fetch_nonce: e.g.
fetch_from_platform().await? & (IDENTITY_NONCE_VALUE_FILTER as u64) and compare
with (MAX_MISSING_IDENTITY_REVISIONS as u64) in the drifted calculation to
guarantee types align).
---
Nitpick comments:
In `@packages/rs-sdk/src/sdk.rs`:
- Around line 1367-1389: Test case label is misleading: in the test function
empty_cache_fetch update the test_case string "filter max with bump overflows"
to a clearer label such as "nonce exceeds filter range after bump" so it no
longer implies an arithmetic u64 overflow; modify the test_case for
IDENTITIY_NONCE_VALUE_FILTER + 1 accordingly in the list of #[test_case(... ;
"...")] entries.
- Around line 400-429: Add the same observability trace in
get_identity_contract_nonce that get_identity_nonce uses: after resolving the
nonce (i.e., once the Value returned by Self::get_or_fetch_nonce is available),
emit the same trace-level log including identity_id, contract_id, bump_first and
the resolved nonce using the same trace logging call used in get_identity_nonce;
locate this in get_identity_contract_nonce (and/or inside the fetch closure if
needed) and format the fields identically to the existing get_identity_nonce
trace so debugging for identity–contract nonces matches the single-identity
case.
| /// - `last_platform_nonce` — the value Platform returned on the last fetch | ||
| /// (after stripping upper bits). Used to detect when local bumps have | ||
| /// drifted too far ahead, triggering a re-fetch. | ||
| pub(crate) type NonceCacheEntry = (IdentityNonce, LastQueryTimestamp, IdentityNonce); |
| BTreeMap<(Identifier, Identifier), (prelude::IdentityNonce, LastQueryTimestamp)>, | ||
| >, | ||
| pub(crate) identity_contract_nonce_counter: | ||
| tokio::sync::Mutex<BTreeMap<(Identifier, Identifier), NonceCacheEntry>>, |
There was a problem hiding this comment.
(Identifier, Identifier) terrible too
packages/rs-sdk/src/sdk.rs
Outdated
| let nonce = Self::get_or_fetch_nonce( | ||
| &self.internal_cache.identity_nonce_counter, | ||
| identity_id, | ||
| bump_first, | ||
| &settings, | ||
| || async { | ||
| Ok(IdentityNonceFetcher::fetch_with_settings( | ||
| self, | ||
| identity_id, | ||
| settings.request_settings, | ||
| ) | ||
| .await? | ||
| .unwrap_or(IdentityNonceFetcher(0)) | ||
| .0) | ||
| }, | ||
| ) | ||
| .await?; |
There was a problem hiding this comment.
Let's extract this into a method
| /// returns an error or is updated. | ||
| pub(crate) identity_nonce_counter: | ||
| tokio::sync::Mutex<BTreeMap<Identifier, (prelude::IdentityNonce, LastQueryTimestamp)>>, | ||
| pub(crate) identity_nonce_counter: tokio::sync::Mutex<BTreeMap<Identifier, NonceCacheEntry>>, |
There was a problem hiding this comment.
The better design would be to have a separate struct that inclapsulates all related logic (methods) which instance we have in SDK. In this case we don't need to spread logic around.
shumkov
left a comment
There was a problem hiding this comment.
It would be nice to have a functional test that repoduce previous problem but fixed with new code
Address PR #3111 review comments from shumkov: - Replace NonceCacheEntry tuple with named struct (current_nonce, last_fetch_timestamp, last_fetched_platform_nonce) - Replace (Identifier, Identifier) key with IdentityContractPair struct - Create NonceCache struct encapsulating all nonce logic with per-map Mutex locking and configurable stale time - Remove InternalSdkCache (was nonce-only, pointless wrapper) - Make Sdk nonce methods thin wrappers delegating to NonceCache - Move get_or_fetch_nonce, get_current_time_seconds, DEFAULT_IDENTITY_NONCE_STALE_TIME_S, and tests to internal_cache/mod.rs - Remove LastQueryTimestamp and StalenessInSeconds type aliases from sdk.rs Public API unchanged — all callers continue using sdk.get_identity_nonce(), sdk.get_identity_contract_nonce(), and sdk.refresh_identity_nonce(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…race and bound memory - Switch from BTreeMap to lru::LruCache (bounded to 100 entries) for both identity and contract nonce caches, fixing unbounded growth - Hold cache lock end-to-end across async Platform fetch, eliminating the TOCTOU race condition (SEC-001) that could cause nonce regression - Apply IDENTITY_NONCE_VALUE_FILTER mask on every nonce bump, not just after Platform fetch (SEC-002) - Change refresh() to pop() entries entirely instead of zeroing timestamps, matching pre-refactor behavior (CODE-001) - Return Result from get_current_time_seconds() instead of panicking on clock error (RUST-001) - Do not update last_fetch_timestamp on cache-only bumps (CODE-002) - Make lru a non-optional dependency (remove feature-gating behind mocks) - Update tests: add LRU eviction, nonce filter masking, and timestamp preservation tests; remove concurrent_update tests (race window gone) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… and tracing Address five security findings (SEC-001 through SEC-005) in the NonceCache: - SEC-001: Return `Error::NonceOverflow` instead of silently wrapping the nonce to zero when it reaches `IDENTITY_NONCE_VALUE_FILTER`. Wrapping would permanently lock out the identity. - SEC-002: Bump default LRU cache capacity from 100 to 1000 to reduce silent eviction of actively-bumped nonces. - SEC-003: Restore three-phase lock pattern — check cache under lock, release lock during platform fetch, re-acquire and merge with `max(cached, platform)`. Eliminates serialization of all nonce fetches across the async boundary. - SEC-004: Add `tracing::trace!` for cache misses, stale/drift refetches, and anti-regression events. No logging on the hot cache-hit path. - SEC-005: Capture a fresh timestamp after the platform fetch so `last_fetch_timestamp` reflects actual receipt time. Also removes all runtime `expect()` calls — the remaining `.expect()` on `NonZeroUsize::new()` is in a `const` context (compile-time evaluation). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace `.unwrap_or(0)` fallback with an explicit error when Platform returns no nonce for an identity or identity-contract pair. A stale DAPI node that has not yet indexed the identity would previously cause the SDK to silently use nonce 0, leading to rejected state transitions. Now the closure surfaces `Error::IdentityNonceNotFound` and logs a `warn!` so the caller can retry against a different node. Also documents the LRU eviction, drift threshold, and DAPI staleness edge cases on `NonceCache`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove dead `From<(Identifier, Identifier)>` impl (RUST-005) - Replace double hash lookup (peek + get_mut) with single get_mut (RUST-002) - Add `Debug` bound to generic key, include key in trace logs (RUST-003) - Add manual `Debug` impl for `NonceCache` (RUST-004) - Remove redundant test imports after `use super::*` (RUST-006) - Add test for bump_first=false after bump_first=true sequence (CODE-004) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Preserve max(old, new) last_fetched_platform_nonce when cached nonce wins the Phase 3 merge, preventing drift inflation from a temporarily lower platform value (CODE-001) - Simplify bump_nonce to explicit boundary check instead of indirect masking arithmetic (RUST-001) - Add concurrent bump test (20 tasks via Barrier on fresh cache) - Add concurrent stale-fetch test (50 tasks through Phase 2 merge) Both concurrency tests assert all returned nonces are unique, validating the three-phase locking under contention. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ariants The rs-sdk added two new error variants (NonceOverflow, IdentityNonceNotFound) but the wasm-sdk error conversion was not updated, causing a non-exhaustive match compilation failure in CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… retry The LruCache refactoring (d1225cc) inadvertently changed refresh() from "mark stale" to "pop entry", reintroducing the exact bug from dashpay/dash-evo-tool#588: after a broadcast failure, the bumped nonce was lost, causing the retry to create a byte-identical state transition ("tx already exists in cache"). Changed refresh() to set last_fetch_timestamp = 0 instead of popping, preserving the cached nonce so Phase 3's max(cached, platform) merge advances past the already-used nonce. Added tests covering both scenarios: Platform returning a stale nonce (cached value preserved) and Platform returning a higher nonce (Platform value used). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Extract STALE_TIMESTAMP constant to replace magic value 0 - Document intentional get_mut() LRU promotion in refresh() — refreshed entries are actively in use and should be preserved under eviction pressure - Fix stale section comment that still said "removes entries" - Fix primary test to use PutSettings::default() so the stale timestamp actually triggers a Platform re-fetch; add AtomicBool assertion verifying the fetch closure is invoked Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…instead Move Platform fetch logic from Sdk call sites into NonceCache's pub(crate) methods (get_identity_nonce, get_identity_contract_nonce). The private get_or_fetch_nonce helper retains its generic closure signature unchanged. This eliminates closure boilerplate from sdk.rs and co-locates the fetch + cache logic in one place. Refresh tests now use Sdk::new_mock() with expect_fetch expectations instead of hand-rolled closures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Issue being fixed or feature implemented
Addresses dashpay/dash-evo-tool#588
The SDK's nonce cache had several design issues that could cause retry loops to fail after a broadcast error:
refresh_identity_nonce()deleted cache entries entirely, causing re-fetch of stale nonces from Platform and recreation of identical (failing) state transitions.bump_firstcalls without successful broadcasts could push the cached nonce arbitrarily ahead of Platform with no re-fetch trigger.get_identity_nonceandget_identity_contract_noncecontained identical cache logic.What was done?
Rewrite: Dedicated
NonceCachestructExtracted all nonce caching logic from
Sdkinto a self-containedNonceCachestruct backed by an LRU cache (bounded to 1 000 entries). The struct owns its mutex and exposes a clean async API.Three-phase locking
get_or_fetch_nonceno longer holds the mutex across the asyncfetch_from_platformcall. The function is structured as: lock → check → unlock → fetch (no lock) → lock → update → unlock. This prevents blocking during network round-trips.Drift detection
Each cache entry tracks
last_platform_nonce. When the cached nonce drifts≥ MAX_MISSING_IDENTITY_REVISIONSahead, a re-fetch is forced and Platform's response becomes the source of truth — while still preserving the higher of cached vs Platform nonce to avoid regression.Upper-bit masking at fetch
The
IDENTITY_NONCE_VALUE_FILTERmask is applied once, at fetch time. The cache only holds plain nonce values, eliminating redundant masking on every return path.Stale marking instead of removal
refresh_identity_nonce()sets cache entry timestamps to 0 (stale) instead of removing entries. This forces a re-fetch on next use while preserving the cached nonce value.Overflow guard
Added
NonceOverflowerror variant — returned when a nonce reachesu64::MAXinstead of silently wrapping to zero.IdentityNonceNotFound error
Platform returning no nonce for a queried identity now returns
IdentityNonceNotFoundinstead of silently defaulting to nonce 0. This is a retriable error — the DAPI node may not have indexed the identity yet.wasm-sdk error handling
Added
NonceOverflowandIdentityNonceNotFoundvariants toWasmSdkErrorKindand theFrom<SdkError>conversion in the wasm-sdk, fixing a non-exhaustive match compilation failure.Cleanup
Sdk::parse_proofandSdk::parse_proof_with_metadata(dead code).get_or_fetch_noncehelper.How Has This Been Tested?
cargo check -p dash-sdk -p wasm-sdkpasses cleanlycargo clippy -p dash-sdkpasses cleanlycargo test -p dash-sdkpasses cleanlyBreaking Changes
None — the
NonceCacheis an internal implementation detail. PublicSdkAPI is unchanged.Checklist: