Skip to content

Comments

feat: add grovedb-dense-fixed-sized-merkle-tree crate#409

Merged
QuantumExplorer merged 10 commits intodevelopfrom
feat/dense-fixed-sized-merkle-tree
Feb 22, 2026
Merged

feat: add grovedb-dense-fixed-sized-merkle-tree crate#409
QuantumExplorer merged 10 commits intodevelopfrom
feat/dense-fixed-sized-merkle-tree

Conversation

@QuantumExplorer
Copy link
Member

@QuantumExplorer QuantumExplorer commented Feb 22, 2026

Summary

  • Adds a new grovedb-dense-fixed-sized-merkle-tree crate: a dense, fixed-size (power-of-2 capacity) Merkle tree using Blake3 domain-separated hashing
  • Height-parameterized trees (1..=16) with capacity = 2^height - 1, append-only inserts with automatic hash propagation
  • Includes inclusion proof generation/verification with position-based queries and an optional StorageContext adapter for GroveDB integration
  • 77 tests covering correctness, edge cases, security regression, and DoS protection

Test plan

  • cargo test -p grovedb-dense-fixed-sized-merkle-tree — all 77 tests pass
  • cargo check — workspace compiles cleanly
  • CI pipeline

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Dense fixed-size Merkle tree with public API, proof generation, serialization, and verification (Blake3)
    • Optional storage integration with a storage adapter for read/write and read-after-write visibility
  • Tests

    • Extensive unit and integration tests covering construction, insertion, proofs, verification, failure modes, and query-based proof generation
  • Chores

    • Added two new workspace members to the project workspace

Dense (fixed-size, power-of-2) Merkle tree for authenticated storage
of fixed-capacity collections. Uses Blake3 domain-separated hashing
with O(log n) proof generation and verification.

Key features:
- Height-parameterized trees (1..=16) with capacity = 2^height - 1
- Append-only inserts with automatic hash propagation
- Inclusion proofs with position-based queries
- Optional StorageContext adapter for GroveDB integration
- 77 tests including security/DoS regression tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 22, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new workspace crate grovedb-dense-fixed-sized-merkle-tree providing a fixed-capacity Blake3-based Merkle tree, storage adapter, cost-aware store trait, proof generation/serialization, verification routines, and extensive tests; also adds grovedb-query to the workspace.

Changes

Cohort / File(s) Summary
Workspace Configuration
Cargo.toml
Added workspace members: grovedb-dense-fixed-sized-merkle-tree, grovedb-query.
Crate Manifest & Exports
grovedb-dense-fixed-sized-merkle-tree/Cargo.toml, grovedb-dense-fixed-sized-merkle-tree/src/lib.rs
New crate manifest and library re-exports (errors, proof, store, tree, storage-gated items). Defines storage feature and external deps (blake3, bincode, thiserror).
Errors
grovedb-dense-fixed-sized-merkle-tree/src/error.rs
Adds DenseMerkleError enum with variants: InvalidData, TreeFull {capacity,count}, StoreError, InvalidProof.
Hash Utilities
grovedb-dense-fixed-sized-merkle-tree/src/hash.rs
Adds validate_height (1..=16) and node_hash (Blake3 over value
Tree Core
grovedb-dense-fixed-sized-merkle-tree/src/tree.rs
Implements DenseFixedSizedMerkleTree (constructors, capacity/count, insert/try_insert with rollback, get, root_hash, hashing logic) with cost accounting.
Storage Abstraction
grovedb-dense-fixed-sized-merkle-tree/src/store.rs, .../storage_adapter.rs
Adds DenseTreeStore trait (cost-aware get/put). Implements DenseTreeStorageContext adapter for GroveDB StorageContext with write-through cache, position_key, and cost propagation.
Proof Generation
grovedb-dense-fixed-sized-merkle-tree/src/proof/mod.rs
Adds DenseTreeProof struct, generate/generate_for_query, encode/decode (bincode), ancestor expansion, and cost-aware proof construction.
Proof Verification
grovedb-dense-fixed-sized-merkle-tree/src/verify.rs
Adds pure verification routines to recompute/validate root, enforce DoS limits, duplicate/disjointness checks, and query-based completeness/soundness checks.
Tests
grovedb-dense-fixed-sized-merkle-tree/src/tests.rs, .../proof/tests.rs
Large, comprehensive test suites (in-memory MemStore, failing-store variants) covering construction, inserts, root determinism, proof generation/verification, query handling, edge cases, and failure injection.
Minor refactor
costs/src/storage_cost/removal.rs
Simplified nested traversal for removed-bytes computation (behavior preserved).

Sequence Diagram(s)

mermaid
sequenceDiagram
autonumber
participant Client as Client
participant Tree as DenseFixedSizedMerkleTree
participant Store as DenseTreeStore
participant Adapter as DenseTreeStorageContext
participant Backend as StorageBackend
Client->>Tree: insert(value)
Tree->>Store: put_value(position, value) (CostResult)
alt Store is DenseTreeStorageContext
Store->>Adapter: position_key(pos)
Adapter->>Backend: storage_put(key, value) (cost accounted)
Backend-->>Adapter: Ok / Err
Adapter-->>Tree: CostResult OK/Err (updates write-through cache)
else Other Store impl
Store->>Backend: put (user-defined)
Backend-->>Store: Ok / Err
Store-->>Tree: CostResult OK/Err
end
Tree->>Tree: compute_root_hash(store) (reads via get_value)
Tree-->>Client: Result(root_hash, position) or Error

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped in code and found a tree,
Blake3 leaves whisper back to me,
Proofs stitched tight, positions neat,
Cached dreams keep every beat,
I thump my foot — the crate’s complete!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and specifically summarizes the main change: adding a new dense fixed-sized Merkle tree crate to the GroveDB workspace, which aligns perfectly with the changeset's primary focus.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/dense-fixed-sized-merkle-tree

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (8)
grovedb-dense-fixed-sized-merkle-tree/src/hash.rs (1)

10-16: blake3_merge lacks domain separation for internal nodes.

INTERNAL_DOMAIN_TAG is defined in this module but blake3_merge does not prepend it. The public compute_dense_merkle_root / compute_dense_merkle_root_from_values functions therefore produce Merkle roots without domain separation at any level, making them theoretically susceptible to second-preimage attacks (an attacker can construct a shorter tree whose root collides with a taller one).

The tree's own hashing (in tree.rs) correctly uses 0x00 / 0x01 prefixes, so these public utility functions are the only gap. If they're intended for external consumption, consider applying domain separation here as well.

♻️ Suggested fix
 pub(crate) fn blake3_merge(left: &[u8; 32], right: &[u8; 32]) -> [u8; 32] {
-    let mut data = [0u8; 64];
-    data[..32].copy_from_slice(left);
-    data[32..].copy_from_slice(right);
-    *blake3::hash(&data).as_bytes()
+    let mut hasher = blake3::Hasher::new();
+    hasher.update(&[INTERNAL_DOMAIN_TAG]);
+    hasher.update(left);
+    hasher.update(right);
+    *hasher.finalize().as_bytes()
 }

And in compute_dense_merkle_root_from_values, apply the leaf domain tag:

-    let leaf_hashes: Vec<[u8; 32]> = values.iter().map(|v| *blake3::hash(v).as_bytes()).collect();
+    let leaf_hashes: Vec<[u8; 32]> = values.iter().map(|v| {
+        let mut hasher = blake3::Hasher::new();
+        hasher.update(&[LEAF_DOMAIN_TAG]);
+        hasher.update(v);
+        *hasher.finalize().as_bytes()
+    }).collect();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@grovedb-dense-fixed-sized-merkle-tree/src/hash.rs` around lines 10 - 16,
blake3_merge currently concatenates left/right and hashes without domain
separation; update it to prepend the module's INTERNAL_DOMAIN_TAG byte before
hashing so internal-node hashes match tree.rs behavior (use INTERNAL_DOMAIN_TAG
when forming the 64+1 input to blake3 in blake3_merge). Also ensure
compute_dense_merkle_root and compute_dense_merkle_root_from_values apply the
correct leaf domain tag for leaf hashing (use the leaf domain tag constant when
hashing leaves in compute_dense_merkle_root_from_values) so public utilities
match the tree's 0x00/0x01 prefixes.
grovedb-dense-fixed-sized-merkle-tree/src/storage_adapter.rs (1)

41-43: take_cost resets accumulated cost to Default — ensure callers are aware.

RefCell::take() replaces the inner value with Default::default(). If take_cost is called multiple times, subsequent calls return zero costs. This is fine if intentional but could surprise callers who expect cumulative costs across multiple take_cost calls.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@grovedb-dense-fixed-sized-merkle-tree/src/storage_adapter.rs` around lines 41
- 43, The current take_cost method (pub fn take_cost(&self) -> OperationCost in
storage_adapter.rs) uses RefCell::take() which replaces self.cost with
Default::default(), so repeated calls will return zero after the first; either
make this behavior explicit or change it: if you want to return the accumulated
cost without resetting, replace the body to clone the value (e.g.,
self.cost.borrow().clone()), or if you intend to reset, rename the method to
reset_and_take_cost or document it clearly and add a separate peek_cost method
that returns a clone without clearing; update uses of take_cost accordingly and
keep the OperationCost type cloning/Default behavior in mind.
grovedb-dense-fixed-sized-merkle-tree/src/proof.rs (1)

148-163: Redundant zero-hash entries added to node_hashes for unfilled positions.

In a partially filled tree, children of expanded-set nodes that are >= count get their hash computed via tree.hash_position, which returns [0u8; 32]. These entries are included in node_hashes but are never consumed during verification — recompute_hash in verify.rs returns [0u8; 32] for position >= count before checking the hash map.

This slightly bloats proofs for partially filled trees without affecting correctness. Consider skipping children that are beyond count:

♻️ Suggested optimization
             if pos < first_leaf {
                 let left_child = 2 * pos + 1;
                 let right_child = 2 * pos + 2;

-                if !expanded.contains(&left_child) {
+                if !expanded.contains(&left_child) && left_child < count {
                     let (hash, _) = tree.hash_position(left_child, store)?;
                     node_hashes.push((left_child, hash));
                 }
-                if !expanded.contains(&right_child) {
+                if !expanded.contains(&right_child) && right_child < count {
                     let (hash, _) = tree.hash_position(right_child, store)?;
                     node_hashes.push((right_child, hash));
                 }
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@grovedb-dense-fixed-sized-merkle-tree/src/proof.rs` around lines 148 - 163,
The proof builder is adding zero-hash entries for unfilled child positions which
waste space; update the block in the loop that computes children (using symbols
pos, first_leaf, left_child, right_child, expanded, node_hashes) to skip any
child whose index is >= count before calling tree.hash_position(store) and
pushing to node_hashes — i.e., for each child check child < count and only then
compute the hash and push; leave behavior unchanged for filled children.
grovedb-dense-fixed-sized-merkle-tree/src/verify.rs (4)

83-120: Three nearly identical duplicate-detection blocks — extract a helper.

The three blocks checking for duplicate positions in entries, node_value_hashes, and node_hashes are copy-paste with only the field name and error label differing. A small helper reduces repetition:

♻️ Example helper
fn check_no_duplicate_positions<T>(items: &[(u16, T)], label: &str) -> Result<(), DenseMerkleError> {
    let mut seen = std::collections::BTreeSet::new();
    for (pos, _) in items {
        if !seen.insert(*pos) {
            return Err(DenseMerkleError::InvalidProof(format!(
                "duplicate {} at position {}", label, pos
            )));
        }
    }
    Ok(())
}

Then:

check_no_duplicate_positions(&self.entries, "entry")?;
check_no_duplicate_positions(&self.node_value_hashes, "node_value_hash")?;
check_no_duplicate_positions(&self.node_hashes, "node_hash")?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@grovedb-dense-fixed-sized-merkle-tree/src/verify.rs` around lines 83 - 120,
Extract the repeated duplicate-position checks into a helper like
check_no_duplicate_positions<T>(items: &[(u16, T)], label: &str) -> Result<(),
DenseMerkleError> and replace the three blocks that iterate over self.entries,
self.node_value_hashes, and self.node_hashes with calls to that helper (e.g.,
check_no_duplicate_positions(&self.entries, "entry")?), making sure the helper
uses a BTreeSet to detect duplicates and returns DenseMerkleError::InvalidProof
with a formatted message on duplicate insertion.

184-192: Redundant < capacity check in the entry filter.

Since self.count <= capacity is validated at lines 65-70, the condition *pos < self.count already implies *pos < capacity. The && *pos < capacity arm is unreachable.

♻️ Simplified filter
         let entries = self
             .entries
             .iter()
-            .filter(|(pos, _)| *pos < self.count && *pos < capacity)
+            .filter(|(pos, _)| *pos < self.count)
             .map(|(pos, val)| (*pos, val.clone()))
             .collect();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@grovedb-dense-fixed-sized-merkle-tree/src/verify.rs` around lines 184 - 192,
The filter on self.entries redundantly checks "*pos < capacity" even though
earlier validation guarantees self.count <= capacity, so update the closure used
in the iterator (the filter on self.entries in verify.rs) to only test "*pos <
self.count" and remove the "&& *pos < capacity" arm; leave the subsequent map
and collect and the Ok((computed_root, entries)) return unchanged.

42-49: verify_and_get_root returns entries before the caller verifies the root — document the trust boundary clearly.

This method returns proved entries alongside the computed root, but those entries are only trustworthy once the caller independently verifies the root (e.g., via the Merk child-hash chain). Consider adding a # Safety or # Important doc section to reinforce that the returned entries must not be trusted until the root is authenticated.

Based on learnings: "Never trust unverified proofs; always verify before use."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@grovedb-dense-fixed-sized-merkle-tree/src/verify.rs` around lines 42 - 49,
Add a clear safety note to the doc comment for verify_and_get_root (the method
that calls verify_inner) stating that the returned entries are NOT to be trusted
until the computed root is authenticated; include a `# Safety` or `# Important`
section telling callers to verify the returned root against the expected root
(for example via the Merk child-hash chain) before using any proved entries, and
mention that this method only computes and returns the root and entries but does
not perform that final authentication step.

262-267: Consider using std formatting for the hex helper.

Minor: the manual hex_encode could be replaced with a format!-based approach or the hex crate. Since this is only for error messages, the current implementation is fine — just noting it for awareness.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@grovedb-dense-fixed-sized-merkle-tree/src/verify.rs` around lines 262 - 267,
Replace the manual byte-by-byte hex helper function hex_encode with a standard
utility: either call hex::encode(bytes) (add the "hex" crate) or use std
formatting via format!("{:02x}", ...) applied more succinctly (e.g., using
bytes.iter().format with itertools) so the function becomes a thin wrapper
around a well-tested encoder; update the hex_encode function to delegate to
hex::encode(bytes) and adjust imports/Cargo.toml accordingly since this is only
used for error messages.
grovedb-dense-fixed-sized-merkle-tree/src/tree.rs (1)

68-120: insert and try_insert share almost identical logic — consider extracting the common path.

Both methods duplicate the put_value → increment count → compute_root_hash → rollback on error sequence. The only difference is the "full" handling. A private helper that performs the insert-and-hash step could reduce duplication:

♻️ Sketch of a shared helper
+    /// Shared insert logic; caller must check capacity before calling.
+    fn insert_inner<S: DenseTreeStore>(
+        &mut self,
+        value: &[u8],
+        store: &S,
+    ) -> Result<([u8; 32], u16, u32), DenseMerkleError> {
+        let position = self.count;
+        store.put_value(position, value)?;
+        self.count += 1;
+
+        match self.compute_root_hash(store) {
+            Ok((root_hash, hash_calls)) => Ok((root_hash, position, hash_calls)),
+            Err(e) => {
+                self.count -= 1;
+                Err(e)
+            }
+        }
+    }
+
     pub fn insert<S: DenseTreeStore>(
         &mut self,
         value: &[u8],
         store: &S,
     ) -> Result<([u8; 32], u16, u32), DenseMerkleError> {
         if self.count >= self.capacity() {
             return Err(DenseMerkleError::TreeFull {
                 capacity: self.capacity(),
                 count: self.count,
             });
         }
-        let position = self.count;
-        store.put_value(position, value)?;
-        self.count += 1;
-        match self.compute_root_hash(store) {
-            Ok((root_hash, hash_calls)) => Ok((root_hash, position, hash_calls)),
-            Err(e) => {
-                self.count -= 1;
-                Err(e)
-            }
-        }
+        self.insert_inner(value, store)
     }

     pub fn try_insert<S: DenseTreeStore>(
         &mut self,
         value: &[u8],
         store: &S,
     ) -> Result<Option<([u8; 32], u16, u32)>, DenseMerkleError> {
         if self.count >= self.capacity() {
             return Ok(None);
         }
-        let position = self.count;
-        store.put_value(position, value)?;
-        self.count += 1;
-        match self.compute_root_hash(store) {
-            Ok((root_hash, hash_calls)) => Ok(Some((root_hash, position, hash_calls))),
-            Err(e) => {
-                self.count -= 1;
-                Err(e)
-            }
-        }
+        self.insert_inner(value, store).map(Some)
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@grovedb-dense-fixed-sized-merkle-tree/src/tree.rs` around lines 68 - 120,
Both insert and try_insert duplicate the sequence of checking capacity, calling
store.put_value, incrementing self.count, calling self.compute_root_hash, and
rolling back on error; extract that shared logic into a private helper (e.g., fn
do_insert_and_hash<S: DenseTreeStore>(&mut self, value: &[u8], store: &S) ->
Result<([u8;32], u16, u32), DenseMerkleError>) that performs put_value,
increments count, calls compute_root_hash, and decrements count on failure; then
simplify insert to handle the full-tree error and call the helper, and simplify
try_insert to return None when full or call the same helper and wrap its success
in Some(...). Ensure the helper uses self.capacity(), self.count,
store.put_value, and self.compute_root_hash so behavior is identical.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@grovedb-dense-fixed-sized-merkle-tree/src/hash.rs`:
- Around line 53-71: The computation of the total hash count in
compute_dense_merkle_root_from_values can overflow when casting
leaf_hashes.len() to u32 and computing 2 * n - 1; change the arithmetic to use a
wider integer and checked operations: compute n as usize or u64 (e.g., let n =
leaf_hashes.len() as u64), compute total = n.checked_mul(2).and_then(|v|
v.checked_sub(1)) and return an error if it overflows (or else ensure the
function returns usize for the count), then convert to the expected return type
only after confirming no overflow; update references to n and the returned count
accordingly and keep compute_dense_merkle_root(&leaf_hashes)? usage intact.

In `@grovedb-dense-fixed-sized-merkle-tree/src/storage_adapter.rs`:
- Around line 46-70: The AuxDenseTreeStore implementation is using main storage
methods; update get_value and put_value to call the auxiliary storage APIs:
replace self.ctx.get(&key) with self.ctx.get_aux(&key) in get_value and replace
self.ctx.put(&key, value, None, None) with self.ctx.put_aux(&key, value, None,
None) in put_value so auxiliary tree data is persisted to the aux store; keep
the existing position_key, cache handling, error mapping
(DenseMerkleError::StoreError) and cost accounting logic unchanged.

In `@grovedb-dense-fixed-sized-merkle-tree/src/tests.rs`:
- Around line 347-378: The test currently accepts both Ok and Err from
tampered.verify(&root), which weakens the assertion; pick one behavior and
enforce it — here change the test_vuln2_out_of_range_entries_filtered to assert
that tampered.verify(&root) returns an Err (i.e., the proof must be invalid).
Concretely, replace the match on tampered.verify(&root) with an assertion like
tampered.verify(&root).expect_err("phantom entry should make proof invalid");
reference DenseTreeProof::generate for proof creation and tampered.verify for
verification when making the change.
- Around line 398-417: The test may silently pass when proof.node_value_hashes
is empty; add a precondition assertion after DenseTreeProof::generate to ensure
node_value_hashes is non-empty (e.g., assert!(proof.node_value_hashes.len() > 0,
"...")) before attempting to push a duplicate and running proof.verify; apply
the same change to test_vuln3_duplicate_node_hashes_rejected to assert the
corresponding vector is non-empty so the duplicate-injection and subsequent
assert!(result.is_err(), ...) actually execute.

---

Nitpick comments:
In `@grovedb-dense-fixed-sized-merkle-tree/src/hash.rs`:
- Around line 10-16: blake3_merge currently concatenates left/right and hashes
without domain separation; update it to prepend the module's INTERNAL_DOMAIN_TAG
byte before hashing so internal-node hashes match tree.rs behavior (use
INTERNAL_DOMAIN_TAG when forming the 64+1 input to blake3 in blake3_merge). Also
ensure compute_dense_merkle_root and compute_dense_merkle_root_from_values apply
the correct leaf domain tag for leaf hashing (use the leaf domain tag constant
when hashing leaves in compute_dense_merkle_root_from_values) so public
utilities match the tree's 0x00/0x01 prefixes.

In `@grovedb-dense-fixed-sized-merkle-tree/src/proof.rs`:
- Around line 148-163: The proof builder is adding zero-hash entries for
unfilled child positions which waste space; update the block in the loop that
computes children (using symbols pos, first_leaf, left_child, right_child,
expanded, node_hashes) to skip any child whose index is >= count before calling
tree.hash_position(store) and pushing to node_hashes — i.e., for each child
check child < count and only then compute the hash and push; leave behavior
unchanged for filled children.

In `@grovedb-dense-fixed-sized-merkle-tree/src/storage_adapter.rs`:
- Around line 41-43: The current take_cost method (pub fn take_cost(&self) ->
OperationCost in storage_adapter.rs) uses RefCell::take() which replaces
self.cost with Default::default(), so repeated calls will return zero after the
first; either make this behavior explicit or change it: if you want to return
the accumulated cost without resetting, replace the body to clone the value
(e.g., self.cost.borrow().clone()), or if you intend to reset, rename the method
to reset_and_take_cost or document it clearly and add a separate peek_cost
method that returns a clone without clearing; update uses of take_cost
accordingly and keep the OperationCost type cloning/Default behavior in mind.

In `@grovedb-dense-fixed-sized-merkle-tree/src/tree.rs`:
- Around line 68-120: Both insert and try_insert duplicate the sequence of
checking capacity, calling store.put_value, incrementing self.count, calling
self.compute_root_hash, and rolling back on error; extract that shared logic
into a private helper (e.g., fn do_insert_and_hash<S: DenseTreeStore>(&mut self,
value: &[u8], store: &S) -> Result<([u8;32], u16, u32), DenseMerkleError>) that
performs put_value, increments count, calls compute_root_hash, and decrements
count on failure; then simplify insert to handle the full-tree error and call
the helper, and simplify try_insert to return None when full or call the same
helper and wrap its success in Some(...). Ensure the helper uses
self.capacity(), self.count, store.put_value, and self.compute_root_hash so
behavior is identical.

In `@grovedb-dense-fixed-sized-merkle-tree/src/verify.rs`:
- Around line 83-120: Extract the repeated duplicate-position checks into a
helper like check_no_duplicate_positions<T>(items: &[(u16, T)], label: &str) ->
Result<(), DenseMerkleError> and replace the three blocks that iterate over
self.entries, self.node_value_hashes, and self.node_hashes with calls to that
helper (e.g., check_no_duplicate_positions(&self.entries, "entry")?), making
sure the helper uses a BTreeSet to detect duplicates and returns
DenseMerkleError::InvalidProof with a formatted message on duplicate insertion.
- Around line 184-192: The filter on self.entries redundantly checks "*pos <
capacity" even though earlier validation guarantees self.count <= capacity, so
update the closure used in the iterator (the filter on self.entries in
verify.rs) to only test "*pos < self.count" and remove the "&& *pos < capacity"
arm; leave the subsequent map and collect and the Ok((computed_root, entries))
return unchanged.
- Around line 42-49: Add a clear safety note to the doc comment for
verify_and_get_root (the method that calls verify_inner) stating that the
returned entries are NOT to be trusted until the computed root is authenticated;
include a `# Safety` or `# Important` section telling callers to verify the
returned root against the expected root (for example via the Merk child-hash
chain) before using any proved entries, and mention that this method only
computes and returns the root and entries but does not perform that final
authentication step.
- Around line 262-267: Replace the manual byte-by-byte hex helper function
hex_encode with a standard utility: either call hex::encode(bytes) (add the
"hex" crate) or use std formatting via format!("{:02x}", ...) applied more
succinctly (e.g., using bytes.iter().format with itertools) so the function
becomes a thin wrapper around a well-tested encoder; update the hex_encode
function to delegate to hex::encode(bytes) and adjust imports/Cargo.toml
accordingly since this is only used for error messages.

- Remove leaf/internal domain tags (LEAF_DOMAIN_TAG, INTERNAL_DOMAIN_TAG).
  All nodes now use the same hash: blake3(H(value) || H(left) || H(right)),
  with [0;32] for absent children. This is simpler since every node in the
  dense tree stores data.
- Move validate_height from error.rs to hash.rs where utilities belong.
- Use u32 arithmetic for child position computation to avoid overflow
  at leaf positions near capacity.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
grovedb-dense-fixed-sized-merkle-tree/src/proof.rs (1)

90-108: generate relies on from_state (line 129) for count-vs-capacity validation — consider an explicit early check.

If count > capacity, the error currently surfaces from DenseFixedSizedMerkleTree::from_state on line 129 with an InvalidData error, which may be confusing in the context of proof generation. An explicit check after line 98 with an InvalidProof error would be clearer.

Suggested addition after line 98
         let capacity = ((1u32 << height) - 1) as u16;
+        if count > capacity {
+            return Err(DenseMerkleError::InvalidProof(format!(
+                "count {} exceeds capacity {} for height {}",
+                count, capacity, height
+            )));
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@grovedb-dense-fixed-sized-merkle-tree/src/proof.rs` around lines 90 - 108, In
generate (pub fn generate) add an explicit capacity check after computing
capacity: if count > capacity return DenseMerkleError::InvalidProof with a clear
message (e.g., "count exceeds capacity") so proof generation fails with an
InvalidProof error rather than bubbling up an InvalidData from
DenseFixedSizedMerkleTree::from_state; this keeps validation local to generate
and clarifies the error source.
grovedb-dense-fixed-sized-merkle-tree/src/verify.rs (1)

79-125: Duplicate BTreeSet construction — sets from duplicate checks could be reused.

Lines 80-116 build three BTreeSets during duplicate detection, then lines 120-125 build three more from the same data for disjoint checks. You can reuse the sets from the first pass.

Proposed refactor
         // Vuln 3: Reject duplicate positions in entries
+        let entry_positions: std::collections::BTreeSet<u16>;
         {
             let mut seen = std::collections::BTreeSet::new();
             for (pos, _) in &self.entries {
                 if !seen.insert(*pos) {
                     return Err(DenseMerkleError::InvalidProof(format!(
                         "duplicate entry at position {}",
                         pos
                     )));
                 }
             }
+            entry_positions = seen;
         }
 
         // Vuln 3: Reject duplicate positions in node_value_hashes
+        let value_hash_positions: std::collections::BTreeSet<u16>;
         {
             let mut seen = std::collections::BTreeSet::new();
             for (pos, _) in &self.node_value_hashes {
                 if !seen.insert(*pos) {
                     return Err(DenseMerkleError::InvalidProof(format!(
                         "duplicate node_value_hash at position {}",
                         pos
                     )));
                 }
             }
+            value_hash_positions = seen;
         }
 
         // Vuln 3: Reject duplicate positions in node_hashes
+        let hash_positions: std::collections::BTreeSet<u16>;
         {
             let mut seen = std::collections::BTreeSet::new();
             for (pos, _) in &self.node_hashes {
                 if !seen.insert(*pos) {
                     return Err(DenseMerkleError::InvalidProof(format!(
                         "duplicate node_hash at position {}",
                         pos
                     )));
                 }
             }
+            hash_positions = seen;
         }
 
-        // Vuln 6: Validate that entries, node_value_hashes, and node_hashes have
-        // pairwise-disjoint position sets
-        let entry_positions: std::collections::BTreeSet<u16> =
-            self.entries.iter().map(|(p, _)| *p).collect();
-        let value_hash_positions: std::collections::BTreeSet<u16> =
-            self.node_value_hashes.iter().map(|(p, _)| *p).collect();
-        let hash_positions: std::collections::BTreeSet<u16> =
-            self.node_hashes.iter().map(|(p, _)| *p).collect();
+        // Vuln 6: Validate pairwise-disjoint position sets
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@grovedb-dense-fixed-sized-merkle-tree/src/verify.rs` around lines 79 - 125,
The duplicate-check blocks build new BTreeSets and then the code rebuilds sets
again for disjointness; instead, reuse the sets created during the duplicate
detection to avoid double work: keep the first three BTreeSets (currently named
within the three blocks as seen) and promote them to variables (e.g.,
entry_positions, value_hash_positions, hash_positions) created while checking
duplicates in the loops over self.entries, self.node_value_hashes, and
self.node_hashes, then remove the later .collect() calls so the
pairwise-disjoint check uses those same sets when performing the disjointness
test in verify (refer to self.entries, self.node_value_hashes, self.node_hashes
and the variables entry_positions/value_hash_positions/hash_positions).
grovedb-dense-fixed-sized-merkle-tree/src/hash.rs (1)

14-20: blake3_merge uses a different hashing scheme than node_hash — ensure this is intentional.

blake3_merge hashes left || right (64 bytes), while node_hash hashes value_hash || left_hash || right_hash (96 bytes). compute_dense_merkle_root uses blake3_merge, so it builds a standard binary Merkle tree over leaf hashes — distinct from the DenseFixedSizedMerkleTree which embeds values at every level via node_hash.

Both are publicly exported. Consider adding a doc comment on compute_dense_merkle_root clarifying it builds a standard leaf-only Merkle tree, not the same structure as DenseFixedSizedMerkleTree.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@grovedb-dense-fixed-sized-merkle-tree/src/hash.rs` around lines 14 - 20, The
functions blake3_merge and node_hash use different concatenation schemes
(blake3_merge: left||right; node_hash: value_hash||left_hash||right_hash), so
update compute_dense_merkle_root's documentation to state explicitly that
compute_dense_merkle_root builds a standard leaf-only Merkle tree using
blake3_merge (i.e., hashing left||right of 32-byte leaf hashes) and that this
differs intentionally from the DenseFixedSizedMerkleTree node layout produced by
node_hash; mention the exported symbols blake3_merge, node_hash,
compute_dense_merkle_root, and DenseFixedSizedMerkleTree so callers are aware of
the behavioral difference.
grovedb-dense-fixed-sized-merkle-tree/src/tests.rs (1)

862-878: Test name is misleading — it doesn't actually test DoS rejection.

test_dos_too_many_entries_rejected creates 65,535 entries, which is under the MAX_PROOF_ELEMENTS limit of 100,000. The test asserts root mismatch failure, not DoS rejection. Consider renaming to reflect what's actually being tested, or testing with >100,000 entries to truly exercise the DoS limit.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@grovedb-dense-fixed-sized-merkle-tree/src/tests.rs` around lines 862 - 878,
The test named test_dos_too_many_entries_rejected is misleading because it
builds 65,535 u16-positioned entries (below MAX_PROOF_ELEMENTS = 100_000) and
only asserts a root-mismatch; either rename the test to reflect it's checking
root-mismatch (e.g., test_proof_root_mismatch_with_max_u16_entries) or change
the test to actually exceed the DoS limit by creating >MAX_PROOF_ELEMENTS
entries (use a position type that can hold values >u16 and set
DenseTreeProof.count >100_000), then call DenseTreeProof::verify and assert it
returns the DoS/too-many-elements error instead of a generic root-mismatch.
Ensure you update the entries variable, count, and position type to match the
larger size and assert the specific verification error for excessive elements.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@grovedb-dense-fixed-sized-merkle-tree/src/tests.rs`:
- Around line 615-629: The test
test_vuln6_overlapping_node_value_hashes_and_node_hashes_rejected currently uses
an if let on proof.node_value_hashes which can skip the assertion silently;
ensure the precondition by asserting proof.node_value_hashes is not empty (e.g.,
assert!(!proof.node_value_hashes.is_empty(), ...)) before accessing it or
replace the conditional with an unwrap-like access to obtain the first element,
then proceed to push into proof.node_hashes and call proof.verify(&root) as
before so the test fails loudly if the setup is invalid.

---

Duplicate comments:
In `@grovedb-dense-fixed-sized-merkle-tree/src/hash.rs`:
- Around line 88-91: Add an explicit length guard before casting values.len() to
u32: check that values.len() <= (u32::MAX / 2) as usize (or <= (u32::MAX + 1) /
2 if you prefer exact bound) and return a clear Err variant if it exceeds that
limit; then continue to compute leaf_hashes, set n = values.len() as u32, call
compute_dense_merkle_root(&leaf_hashes) and compute the size as 2 * n - 1. Also
add a short doc comment on the public function noting the max-safe number of
leaves to explain the bound.

In `@grovedb-dense-fixed-sized-merkle-tree/src/tests.rs`:
- Around line 401-420: The test test_vuln3_duplicate_node_value_hashes_rejected
can pass vacuously if DenseTreeProof::generate returns an empty
proof.node_value_hashes; ensure the test actually exercises the duplicate case
by asserting node_value_hashes is non-empty (e.g., assert! or expect)
immediately after generating the proof, then inject a duplicate into
proof.node_value_hashes and call proof.verify(&root) and assert the result is
Err; reference proof.node_value_hashes, DenseTreeProof::generate, and
proof.verify to locate where to add the non-empty check.
- Around line 350-381: The test test_vuln2_out_of_range_entries_filtered
currently accepts both Ok and Err which weakens it; decide that a proof with a
phantom entry should be rejected and change the assertion to require
tampered.verify(&root) to return an Err (e.g.,
assert!(tampered.verify(&root).is_err(), "proof with out-of-range entry should
be invalid")), removing the match that accepts Ok—use DenseTreeProof,
tampered.entries push, and tampered.verify(&root) to locate and update the
assertion.
- Around line 599-613: The test test_vuln3_duplicate_node_hashes_rejected can
silently pass when proof.node_hashes is empty because the current if-let skips
the assertion; change the check to force a panic if node_hashes is empty by
replacing the conditional with an index or expect call (e.g., use
proof.node_hashes.first().cloned().expect("proof should have at least one
node_hash")) so the injected duplicate is always added and the subsequent
DenseTreeProof::generate(...), proof.node_hashes push, and proof.verify(&root)
assertion always run.

---

Nitpick comments:
In `@grovedb-dense-fixed-sized-merkle-tree/src/hash.rs`:
- Around line 14-20: The functions blake3_merge and node_hash use different
concatenation schemes (blake3_merge: left||right; node_hash:
value_hash||left_hash||right_hash), so update compute_dense_merkle_root's
documentation to state explicitly that compute_dense_merkle_root builds a
standard leaf-only Merkle tree using blake3_merge (i.e., hashing left||right of
32-byte leaf hashes) and that this differs intentionally from the
DenseFixedSizedMerkleTree node layout produced by node_hash; mention the
exported symbols blake3_merge, node_hash, compute_dense_merkle_root, and
DenseFixedSizedMerkleTree so callers are aware of the behavioral difference.

In `@grovedb-dense-fixed-sized-merkle-tree/src/proof.rs`:
- Around line 90-108: In generate (pub fn generate) add an explicit capacity
check after computing capacity: if count > capacity return
DenseMerkleError::InvalidProof with a clear message (e.g., "count exceeds
capacity") so proof generation fails with an InvalidProof error rather than
bubbling up an InvalidData from DenseFixedSizedMerkleTree::from_state; this
keeps validation local to generate and clarifies the error source.

In `@grovedb-dense-fixed-sized-merkle-tree/src/tests.rs`:
- Around line 862-878: The test named test_dos_too_many_entries_rejected is
misleading because it builds 65,535 u16-positioned entries (below
MAX_PROOF_ELEMENTS = 100_000) and only asserts a root-mismatch; either rename
the test to reflect it's checking root-mismatch (e.g.,
test_proof_root_mismatch_with_max_u16_entries) or change the test to actually
exceed the DoS limit by creating >MAX_PROOF_ELEMENTS entries (use a position
type that can hold values >u16 and set DenseTreeProof.count >100_000), then call
DenseTreeProof::verify and assert it returns the DoS/too-many-elements error
instead of a generic root-mismatch. Ensure you update the entries variable,
count, and position type to match the larger size and assert the specific
verification error for excessive elements.

In `@grovedb-dense-fixed-sized-merkle-tree/src/verify.rs`:
- Around line 79-125: The duplicate-check blocks build new BTreeSets and then
the code rebuilds sets again for disjointness; instead, reuse the sets created
during the duplicate detection to avoid double work: keep the first three
BTreeSets (currently named within the three blocks as seen) and promote them to
variables (e.g., entry_positions, value_hash_positions, hash_positions) created
while checking duplicates in the loops over self.entries,
self.node_value_hashes, and self.node_hashes, then remove the later .collect()
calls so the pairwise-disjoint check uses those same sets when performing the
disjointness test in verify (refer to self.entries, self.node_value_hashes,
self.node_hashes and the variables
entry_positions/value_hash_positions/hash_positions).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
grovedb-dense-fixed-sized-merkle-tree/src/verify.rs (1)

79-116: Consider reusing duplicate-check sets for the disjointness checks to avoid redundant iteration.

Lines 79–116 build three BTreeSets for duplicate detection, then Lines 120–125 rebuild three more BTreeSets from the same data for disjointness checks. You could reuse the sets from the duplicate checks (or build all six checks in one pass) to avoid iterating each field twice.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@grovedb-dense-fixed-sized-merkle-tree/src/verify.rs` around lines 79 - 116,
Currently verify.rs creates three BTreeSet instances twice (once to detect
duplicates for entries, node_value_hashes, node_hashes and again to test
disjointness), causing redundant iteration; refactor verify logic to build each
BTreeSet once and reuse them for both the duplicate check and the subsequent
disjointness checks (or perform a single pass that simultaneously checks
duplicates and records positions), e.g. create sets for entries,
node_value_hashes and node_hashes (used in the duplicate checks that return
DenseMerkleError::InvalidProof on insert failure) and then reuse those same sets
when performing the disjointness comparisons instead of rebuilding new sets.
grovedb-dense-fixed-sized-merkle-tree/src/proof/mod.rs (1)

181-201: Good defensive decoding — consider also rejecting empty entries when count > 0 is claimed.

The 100 MB limit, height validation, and count-vs-capacity check are solid. One edge to consider: a proof with count > 0 but zero entries, zero node_value_hashes, and zero node_hashes will pass decode_from_slice but will fail later in verify_inner with "incomplete proof: no value or value hash for position 0". This is acceptable since verification catches it, but an early rejection here could give a clearer error message.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@grovedb-dense-fixed-sized-merkle-tree/src/proof/mod.rs` around lines 181 -
201, The decode_from_slice function currently allows a proof with proof.count >
0 but empty proof.entries / proof.node_value_hashes / proof.node_hashes to pass;
add a validation in decode_from_slice after decoding (and after the existing
height/count checks) that if proof.count > 0 then at least one of proof.entries,
proof.node_value_hashes, or proof.node_hashes is non-empty, otherwise return
DenseMerkleError::InvalidProof with a clear message like "incomplete proof:
declared count > 0 but no entries or hashes present"; reference the
decode_from_slice function and the proof.count / proof.entries /
proof.node_value_hashes / proof.node_hashes fields when making the check.
grovedb-dense-fixed-sized-merkle-tree/src/tests.rs (1)

9-30: MemStore is duplicated between tests.rs and proof/tests.rs.

Both test modules define identical MemStore structs with the same DenseTreeStore implementation. Consider extracting a shared test_utils module (e.g., #[cfg(test)] pub(crate) mod test_utils) to avoid the duplication.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@grovedb-dense-fixed-sized-merkle-tree/src/tests.rs` around lines 9 - 30,
Extract the duplicated MemStore and its DenseTreeStore impl into a single shared
test helper module (e.g., add #[cfg(test)] pub(crate) mod test_utils) containing
the struct MemStore, its new() constructor, and the get_value/put_value
implementations; then remove the duplicate definitions from both test modules
and import/use the shared test_utils::MemStore in tests that previously defined
it so all tests reference the single implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Cargo.toml`:
- Line 18: Remove the nonexistent workspace member entry "grovedb-query" from
the workspace members list in Cargo.toml; locate the string "grovedb-query" in
Cargo.toml and delete that entry so cargo generate-lockfile / cargo test no
longer fails due to a missing directory, and re-run CI to confirm the workspace
list only contains actual crate directories.

In `@grovedb-dense-fixed-sized-merkle-tree/src/verify.rs`:
- Around line 180-186: The current logic in verify_inner silently filters out
entries with positions >= count (the entries iterator/filter around
self.entries.iter().filter(|(pos, _)| *pos < self.count && *pos < capacity) and
the subsequent collect), which makes proofs malleable; change this to explicitly
reject the proof if any entry has pos >= self.count (or pos >= capacity) by
returning an Err early from verify_inner instead of dropping them; after
validating all positions are in-range, you can safely use a plain clone of
self.entries (or .cloned().collect()) and proceed with recompute_hash (see
recompute_hash) and the rest of verification.

---

Duplicate comments:
In `@grovedb-dense-fixed-sized-merkle-tree/src/storage_adapter.rs`:
- Around line 39-81: The get_value and put_value implementations are using main
storage (self.ctx.get / self.ctx.put) but this context is for auxiliary storage;
replace those calls with the auxiliary variants (self.ctx.get_aux(&key) in
get_value and self.ctx.put_aux(&key, value, None, None) in put_value), keeping
the same unwrap_add_cost(&mut cost) wrapping, cache update (in put_value), and
existing error handling (DenseMerkleError::StoreError messages) so node data is
stored in the aux namespace rather than colliding with primary KV pairs.

In `@grovedb-dense-fixed-sized-merkle-tree/src/tests.rs`:
- Around line 565-573: The test
test_vuln6_overlapping_node_value_hashes_and_node_hashes_rejected currently
skips the check when proof.node_value_hashes is empty due to the if let
conditional; change it so the test always exercises the overlapping case by
asserting proof.node_value_hashes is non-empty (or constructing a proof with at
least one node_value_hashes entry) and then unconditionally pushing the
overlapping node into proof.node_hashes before calling proof.verify(&root) and
asserting an error; reference the existing
proof.node_value_hashes.first().cloned() usage and the proof.verify(&root)
assertion when making the change.
- Around line 296-307: The test test_vuln2_out_of_range_entries_filtered
currently accepts both Ok and Err from tampered.verify(&root), which weakens the
check; change it to assert a single expected behavior — for example, require
tampered.verify(&root) to return Ok(verified) and then assert every pos in
verified is < 3 (replace the match with a single let Ok(verified) =
tampered.verify(&root) else { panic!(...) }; loop over verified and assert *pos
< 3). This removes the Err branch and ensures the proof must succeed and
out-of-range entries are filtered.
- Around line 547-554: The test test_vuln3_duplicate_node_hashes_rejected
currently uses if let Some(first) = proof.node_hashes.first().cloned() which
skips the assertion when node_hashes is empty; change it to assert the
precondition that proof.node_hashes is non-empty (e.g.,
assert!(!proof.node_hashes.is_empty(), "...")) and then safely obtain the first
value (e.g., unwrap the first cloned value), push it back into
proof.node_hashes, and run proof.verify(&root) expecting an error so the test
cannot silently pass.
- Around line 338-350: The test can silently skip assertions if
proof.node_value_hashes is empty; replace the optional check with a hard
precondition: call let first =
proof.node_value_hashes.first().cloned().expect("expected at least one
node_value_hash from generate") and push that value into proof.node_value_hashes
unconditionally, then assert result.is_err() (remove the conditional guard on
proof.node_value_hashes.len() > 1) so the test always verifies that duplicated
node_value_hashes are rejected; target the proof.node_value_hashes usage and the
result = proof.verify(&root) assertion in the test.

---

Nitpick comments:
In `@grovedb-dense-fixed-sized-merkle-tree/src/proof/mod.rs`:
- Around line 181-201: The decode_from_slice function currently allows a proof
with proof.count > 0 but empty proof.entries / proof.node_value_hashes /
proof.node_hashes to pass; add a validation in decode_from_slice after decoding
(and after the existing height/count checks) that if proof.count > 0 then at
least one of proof.entries, proof.node_value_hashes, or proof.node_hashes is
non-empty, otherwise return DenseMerkleError::InvalidProof with a clear message
like "incomplete proof: declared count > 0 but no entries or hashes present";
reference the decode_from_slice function and the proof.count / proof.entries /
proof.node_value_hashes / proof.node_hashes fields when making the check.

In `@grovedb-dense-fixed-sized-merkle-tree/src/tests.rs`:
- Around line 9-30: Extract the duplicated MemStore and its DenseTreeStore impl
into a single shared test helper module (e.g., add #[cfg(test)] pub(crate) mod
test_utils) containing the struct MemStore, its new() constructor, and the
get_value/put_value implementations; then remove the duplicate definitions from
both test modules and import/use the shared test_utils::MemStore in tests that
previously defined it so all tests reference the single implementation.

In `@grovedb-dense-fixed-sized-merkle-tree/src/verify.rs`:
- Around line 79-116: Currently verify.rs creates three BTreeSet instances twice
(once to detect duplicates for entries, node_value_hashes, node_hashes and again
to test disjointness), causing redundant iteration; refactor verify logic to
build each BTreeSet once and reuse them for both the duplicate check and the
subsequent disjointness checks (or perform a single pass that simultaneously
checks duplicates and records positions), e.g. create sets for entries,
node_value_hashes and node_hashes (used in the duplicate checks that return
DenseMerkleError::InvalidProof on insert failure) and then reuse those same sets
when performing the disjointness comparisons instead of rebuilding new sets.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
grovedb-dense-fixed-sized-merkle-tree/src/proof/tests.rs (1)

348-361: Missing overflow regression test for RangeAfter at u16::MAX

The existing test_query_range_after test uses start = 4, which never triggers the start + 1 overflow identified in query_to_positions. Add a test that passes start = u16::MAX (encoded as [0xFF, 0xFF]) to confirm the range is empty and does not wrap to all positions. Without this, the overflow bug in mod.rs would pass the current test suite.

✅ Suggested regression test
#[test]
fn test_query_range_after_at_u16_max() {
    let (store, root) = make_tree_h3_full();
    // RangeAfter(0xFFFF) should produce an empty range, not wrap to 0..count
    let mut query = Query::new();
    query.insert_range_after(vec![0xFF, 0xFF]..);
    let proof = DenseTreeProof::generate_for_query(3, 7, &query, &store)
        .unwrap()
        .expect("generate_for_query should succeed");
    assert_eq!(
        proof.entries.len(),
        0,
        "RangeAfter(u16::MAX) should be empty, not full range"
    );
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@grovedb-dense-fixed-sized-merkle-tree/src/proof/tests.rs` around lines 348 -
361, Add a regression test that ensures RangeAfter at u16::MAX does not wrap:
create a new test (e.g., test_query_range_after_at_u16_max) that uses
Query::insert_range_after with the key encoded as vec![0xFF, 0xFF], call
DenseTreeProof::generate_for_query(3, 7, &query, &store) and assert the
resulting proof has zero entries (or proof.entries.len() == 0); this verifies
the overflow in query_to_positions is fixed and RangeAfter(u16::MAX) yields an
empty range rather than wrapping to the full range.
grovedb-dense-fixed-sized-merkle-tree/src/proof/mod.rs (1)

114-114: Add #[cfg(test)] to mod tests;

The declaration is unconditional; the module body is guarded in tests.rs, so this works, but idiomatic Rust places #[cfg(test)] on the mod declaration to signal intent and avoid compiling the module item in production.

-mod tests;
+#[cfg(test)]
+mod tests;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@grovedb-dense-fixed-sized-merkle-tree/src/proof/mod.rs` at line 114, The
module declaration "mod tests;" in proof/mod.rs is unconditional but the tests
are guarded in tests.rs; add the test-only attribute by annotating the
declaration with #[cfg(test)] so the tests module is only compiled in test
builds (i.e., change the mod tests; declaration to #[cfg(test)] mod tests;),
leaving tests.rs unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@grovedb-dense-fixed-sized-merkle-tree/src/proof/mod.rs`:
- Around line 87-107: The three match arms handling QueryItem::RangeAfter,
QueryItem::RangeAfterTo, and QueryItem::RangeAfterToInclusive perform unchecked
start + 1 on a u16 which can overflow; replace those increments with
saturating_add(1) (i.e., use start.saturating_add(1)) so that u16::MAX stays at
65535 and the generated ranges remain empty as intended (apply the same change
in the RangeAfterTo and RangeAfterToInclusive range expressions: use
start.saturating_add(1)..end and start.saturating_add(1)..=clamped_end
respectively).
- Around line 282-302: The decode_from_slice function currently only validates
height and count; extend validation after computing capacity in
decode_from_slice to iterate over proof.entries, proof.node_value_hashes, and
proof.node_hashes and ensure each stored position is within the valid range
(e.g., >= 1 and <= capacity); if any position is out of range return
DenseMerkleError::InvalidProof with a clear message naming the offending
collection (entries/node_value_hashes/node_hashes) and the invalid position
value and the capacity; this defends against crafted proofs that contain invalid
positions and keeps the existing height/count checks and error types intact.

---

Nitpick comments:
In `@grovedb-dense-fixed-sized-merkle-tree/src/proof/mod.rs`:
- Line 114: The module declaration "mod tests;" in proof/mod.rs is unconditional
but the tests are guarded in tests.rs; add the test-only attribute by annotating
the declaration with #[cfg(test)] so the tests module is only compiled in test
builds (i.e., change the mod tests; declaration to #[cfg(test)] mod tests;),
leaving tests.rs unchanged.

In `@grovedb-dense-fixed-sized-merkle-tree/src/proof/tests.rs`:
- Around line 348-361: Add a regression test that ensures RangeAfter at u16::MAX
does not wrap: create a new test (e.g., test_query_range_after_at_u16_max) that
uses Query::insert_range_after with the key encoded as vec![0xFF, 0xFF], call
DenseTreeProof::generate_for_query(3, 7, &query, &store) and assert the
resulting proof has zero entries (or proof.entries.len() == 0); this verifies
the overflow in query_to_positions is fixed and RangeAfter(u16::MAX) yields an
empty range rather than wrapping to the full range.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (5)
grovedb-dense-fixed-sized-merkle-tree/src/proof/mod.rs (2)

114-114: mod tests; without #[cfg(test)] compiles the (empty) module in non-test builds.

The file's content is guarded by #[cfg(test)] internally, so there's no functional issue, but adding the attribute here is more idiomatic:

+#[cfg(test)]
 mod tests;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@grovedb-dense-fixed-sized-merkle-tree/src/proof/mod.rs` at line 114, The
declaration "mod tests;" is compiled in non-test builds; add the test-only
attribute by changing it to "#[cfg(test)] mod tests;" so the tests module is
only included for cargo test; update the line containing "mod tests;" (the tests
module declaration in proof/mod.rs) to be prefixed with #[cfg(test)].

270-277: Encoding uses with_no_limit() — consider matching a limit with decode for symmetry.

encode_to_vec uses no size limit while decode_from_slice caps at 100MB. For a height-16 tree the maximum legitimate proof is well under 100MB, but an attacker who can influence what gets encoded could cause unbounded memory allocation. Since encoding is typically server-side and controlled, this is low risk.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@grovedb-dense-fixed-sized-merkle-tree/src/proof/mod.rs` around lines 270 -
277, The encode_to_vec function currently uses
bincode::config::standard().with_big_endian().with_no_limit(); change it to use
a bounded config that matches decode_from_slice’s limit (e.g. .with_limit(100 *
1024 * 1024) or a shared PROOF_MAX_SIZE constant) so encoding and decoding use
the same size cap; update encode_to_vec to build its bincode config with that
limit and reference the same constant used by decode_from_slice.
grovedb-dense-fixed-sized-merkle-tree/src/verify.rs (1)

134-215: Duplicate, disjointness, and auth-path checks provide strong proof structure validation.

The ancestor-set check (Vuln 1) correctly prevents an attacker from short-circuiting verification by injecting a precomputed hash at an ancestor of a proved entry.

One minor note: lines 136, 149, and 162 each allocate a new BTreeSet for duplicate detection, then lines 175-180 rebuild three more sets for the disjointness check. Since you already have entry_positions, value_hash_positions, and hash_positions from lines 175-180, the earlier duplicate checks could reuse those by checking set.len() == vec.len() instead. Not critical — current approach is clear.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@grovedb-dense-fixed-sized-merkle-tree/src/verify.rs` around lines 134 - 215,
The duplicate-position checks currently create temporary BTreeSets then later
rebuild entry_positions, value_hash_positions, and hash_positions for
disjointness; simplify by removing the first three temp sets and instead build
entry_positions, value_hash_positions, and hash_positions once (from
self.entries, self.node_value_hashes, self.node_hashes) and use their lengths
vs. the original vectors' lengths to detect duplicates (i.e., if set.len() !=
self.entries.len() then return the same InvalidProof error), keeping the
existing error messages and the subsequent ancestor_set/auth-path check that
iterates over entry_positions and node_hashes.
grovedb-dense-fixed-sized-merkle-tree/src/tests.rs (1)

816-832: Test name test_dos_too_many_entries_rejected is misleading.

The test verifies that 65535 entries (equal to capacity) are not rejected by the DoS check, but fail for root mismatch. The name implies they should be rejected. Consider renaming to test_dos_at_capacity_entries_fail_root_mismatch or similar.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@grovedb-dense-fixed-sized-merkle-tree/src/tests.rs` around lines 816 - 832,
The test function name test_dos_too_many_entries_rejected is misleading because
it constructs a DenseTreeProof with 65_535 entries (equal to u16 capacity) and
asserts it fails due to root mismatch via
DenseTreeProof::verify_against_expected_root, not that it was rejected by DoS
checks; rename the test to something like
test_dos_at_capacity_entries_fail_root_mismatch and update any
references/comments to reflect that it validates a root-mismatch failure at
capacity rather than a DoS rejection.
grovedb-dense-fixed-sized-merkle-tree/src/proof/tests.rs (1)

10-32: Duplicate MemStore implementation across test modules.

MemStore is identically defined in both src/tests.rs and src/proof/tests.rs. Consider extracting it to a shared #[cfg(test)] test utilities module to reduce duplication.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@grovedb-dense-fixed-sized-merkle-tree/src/proof/tests.rs` around lines 10 -
32, The MemStore test helper is duplicated; extract it into a single shared
#[cfg(test)] test utilities module and re-use it from both test modules: create
a test_utils module that declares pub(crate) struct MemStore { data:
RefCell<HashMap<u16, Vec<u8>>> } and its impl (new) plus impl DenseTreeStore for
MemStore implementing get_value and put_value, then replace the local MemStore
definitions in both tests with use crate::test_utils::MemStore (or mod
test_utils) so the same MemStore, get_value and put_value implementations are
reused.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@grovedb-dense-fixed-sized-merkle-tree/src/proof/mod.rs`:
- Around line 87-107: The three QueryItem arms (QueryItem::RangeAfter,
RangeAfterTo, RangeAfterToInclusive) perform unchecked start + 1 on a u16 which
can overflow; change them to convert start to usize and compute the next
position safely (e.g., let start = bytes_to_position(&r.start)? as usize; let
start_next = start.saturating_add(1);) then iterate from start_next..count (or
start_next..end / start_next..=clamped_end) and apply the same safe
conversion/clamping for end (as usize) and clamped_end; update the three
branches (RangeAfter, RangeAfterTo, RangeAfterToInclusive) to use saturating_add
or checked arithmetic and usize indices to avoid overflow/panic.

In `@grovedb-dense-fixed-sized-merkle-tree/src/tests.rs`:
- Around line 328-349: The test test_vuln3_duplicate_node_value_hashes_rejected
can silently pass when node_value_hashes is empty; change the injection to
unwrap the first element so the test always exercises the duplicate case: after
generating proof via DenseTreeProof::generate (and using make_full_h3_tree),
replace the conditional if let Some(first) with first =
proof.node_value_hashes.first().cloned().expect("expected at least one
node_value_hash"); then push that first into proof.node_value_hashes and assert
verify_against_expected_root(&root) returns an Err when
proof.node_value_hashes.len() > 1.
- Around line 556-572: The test
test_vuln6_overlapping_node_value_hashes_and_node_hashes_rejected can silently
skip the check when proof.node_value_hashes is empty; add an explicit
precondition assert that proof.node_value_hashes is non-empty (e.g.
assert!(!proof.node_value_hashes.is_empty())) before taking the first element,
then proceed to push into proof.node_hashes and call
verify_against_expected_root(&root) to ensure the test fails as intended;
reference proof, node_value_hashes, node_hashes, verify_against_expected_root,
and root when making the change.
- Around line 271-307: The test test_vuln2_out_of_range_entries_filtered
currently accepts either Ok or Err from
tampered.verify_against_expected_root(&root), weakening the assertion; pick a
single expected behavior (preferably that a proof with out-of-range phantom
entries fails verification) and assert it deterministically: call
DenseTreeProof::generate to get legit_proof, mutate tampered.entries to inject
the phantom entry, then replace the permissive if-let with a strict assertion
that tampered.verify_against_expected_root(&root) returns an Err (or use
is_err()), referencing the test function name and
DenseTreeProof::verify_against_expected_root and tampered.entries to locate the
code to change.
- Around line 538-554: The test test_vuln3_duplicate_node_hashes_rejected
currently may silently succeed if proof.node_hashes is empty; ensure the
precondition by asserting that DenseTreeProof::generate returned a proof with at
least one node hash before mutating: replace the optional if-let with a required
check (e.g., call .first().cloned().expect(...) or add an assert! on
proof.node_hashes.is_empty() == false) so the test fails loudly if no node
hashes exist, then proceed to push the duplicate and
verify_against_expected_root as before.

In `@grovedb-dense-fixed-sized-merkle-tree/src/verify.rs`:
- Around line 235-243: The code currently silently filters out entries with
positions >= count when building the returned entries; instead, in verify_inner
you must reject any proof that contains an entry with pos >= self.count (or pos
>= capacity) before proceeding to duplicate-position checks and recompute_hash.
Locate the use of self.entries (and the filter/map that produces entries) in
verify_inner and replace the silent filter with an explicit validation that
returns an Err (or verification failure) if any (pos, _) in self.entries has pos
>= self.count or pos >= capacity, so malformed proofs are rejected rather than
silently dropping phantom entries and allowing malleability.

---

Nitpick comments:
In `@grovedb-dense-fixed-sized-merkle-tree/src/proof/mod.rs`:
- Line 114: The declaration "mod tests;" is compiled in non-test builds; add the
test-only attribute by changing it to "#[cfg(test)] mod tests;" so the tests
module is only included for cargo test; update the line containing "mod tests;"
(the tests module declaration in proof/mod.rs) to be prefixed with #[cfg(test)].
- Around line 270-277: The encode_to_vec function currently uses
bincode::config::standard().with_big_endian().with_no_limit(); change it to use
a bounded config that matches decode_from_slice’s limit (e.g. .with_limit(100 *
1024 * 1024) or a shared PROOF_MAX_SIZE constant) so encoding and decoding use
the same size cap; update encode_to_vec to build its bincode config with that
limit and reference the same constant used by decode_from_slice.

In `@grovedb-dense-fixed-sized-merkle-tree/src/proof/tests.rs`:
- Around line 10-32: The MemStore test helper is duplicated; extract it into a
single shared #[cfg(test)] test utilities module and re-use it from both test
modules: create a test_utils module that declares pub(crate) struct MemStore {
data: RefCell<HashMap<u16, Vec<u8>>> } and its impl (new) plus impl
DenseTreeStore for MemStore implementing get_value and put_value, then replace
the local MemStore definitions in both tests with use
crate::test_utils::MemStore (or mod test_utils) so the same MemStore, get_value
and put_value implementations are reused.

In `@grovedb-dense-fixed-sized-merkle-tree/src/tests.rs`:
- Around line 816-832: The test function name test_dos_too_many_entries_rejected
is misleading because it constructs a DenseTreeProof with 65_535 entries (equal
to u16 capacity) and asserts it fails due to root mismatch via
DenseTreeProof::verify_against_expected_root, not that it was rejected by DoS
checks; rename the test to something like
test_dos_at_capacity_entries_fail_root_mismatch and update any
references/comments to reflect that it validates a root-mismatch failure at
capacity rather than a DoS rejection.

In `@grovedb-dense-fixed-sized-merkle-tree/src/verify.rs`:
- Around line 134-215: The duplicate-position checks currently create temporary
BTreeSets then later rebuild entry_positions, value_hash_positions, and
hash_positions for disjointness; simplify by removing the first three temp sets
and instead build entry_positions, value_hash_positions, and hash_positions once
(from self.entries, self.node_value_hashes, self.node_hashes) and use their
lengths vs. the original vectors' lengths to detect duplicates (i.e., if
set.len() != self.entries.len() then return the same InvalidProof error),
keeping the existing error messages and the subsequent ancestor_set/auth-path
check that iterates over entry_positions and node_hashes.

Copy link
Member Author

@QuantumExplorer QuantumExplorer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self Reviewed

@QuantumExplorer QuantumExplorer merged commit 6e4855f into develop Feb 22, 2026
8 checks passed
@QuantumExplorer QuantumExplorer deleted the feat/dense-fixed-sized-merkle-tree branch February 22, 2026 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant