Open
Conversation
added 5 commits
April 4, 2026 09:41
The condition `b->ptr + len <= b->len` was inverted: it entered the resize branch only when there was already room, and skipped it when the write would overflow. Combined with using b->len (used bytes) instead of b->max (allocated bytes) in the while-loop, this meant fresh write buffers were never resized beyond the initial 1 KB, causing heap corruption when indexing large directories.
duc_free() is a #define alias for free() at the top of the file. Using it everywhere makes memory management patterns uniform and makes future audits / allocator swaps easier.
index.c: - Guard histogram access with histogram_buckets > 0 to avoid a write through a zero-length array when the feature is disabled. - Fix off-by-one: clamp to histogram_buckets-1, not histogram_buckets, so the index stays within the allocated array. - Explicitly null-terminate topn_array[0]->name after strncpy. buffer.c: - In buffer_get_index_report, bound the strncpy to DUC_PATH_MAX-1 and add an explicit null-terminator for safety.
Replace the open-coded inline bounds checks (flagged by the original FIXME) with a single static helper that appends a fragment to the options string only if there is room, returning -1 on overflow. This also fixes the mixed-indentation formatting and replaces the unchecked sprintf() with snprintf().
The original code stored raw pointer values from topn_array[] (sizeof(duc_topn_file*) per entry) which become stale garbage immediately after the indexing run. Fix: flatten the pointer array into a contiguous block of duc_topn_file structs (sizeof(duc_topn_file) per entry) so the actual file names and sizes are persisted. Also free the intermediate buffer and the previously-retrieved DB value to avoid memory leaks.
Author
|
@l8gravely This MR replaces #351 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR addresses the feedback on #351 by splitting the original patch into five independent, reviewable commits.
Commit summaries
1.
fix(buffer): correct buffer_put growth conditionThe resize condition
b->ptr + len <= b->lenwas inverted: it entered the realloc branch only when there was already room, and skipped it when the write would overflow. Combined with usingb->len(used bytes) instead ofb->max(allocated capacity) in thewhile-loop guard, write buffers were never grown beyond their initial 1 KB, causing heap corruption when indexing large directory trees.2.
fix(libduc): replace barefree()with duc_free() for consistencyduc_free() is now a real function in duc.c (not just a
#define), so callers in index.c andcanonicalize.cthat still used barefree()were inconsistent. This commit aligns them.3.
fix(index,buffer): histogram bounds +strncpynull-termination in topnhistogram_buckets > 0to avoid writing through a zero-length array when the feature is disabled.histogram_bucketsas the index (one past the end); changed tohistogram_buckets - 1.topn_array[0]->nameafterstrncpyin index.c.strncpyin buffer_get_index_report toDUC_PATH_MAX - 1and adds an explicit null-terminator.4.
fix(db-tkrzw): add options_append() helper for safe options buildingReplaces the open-coded inline bounds checks in db_open (which were both incorrectly formatted and incomplete) with a single
statichelper that appends a fragment to theoptionsstring only if there is room, returning-1on overflow. Also replaces the uncheckedsprintf()withsnprintf().5.
fix(db): restore topn persistence with correct struct serializationThe original code stored raw pointer values from
topn_array[]—sizeof(duc_topn_file*)bytes per entry — which become stale garbage immediately after the indexing run. This commit re-enables the disabled block and fixes it: the pointer array is now flattened into a contiguous block ofduc_topn_filestructs (sizeof(duc_topn_file)per entry) before being written to the database, so actual file names and sizes are persisted. Also frees the intermediate flat buffer and the previously-retrieved DB value to avoid memory leaks.