Skip to content

Fix crashes on indexing#360

Open
caco3 wants to merge 5 commits intozevv:v1.5.0-rc2from
caco3:fix-crashes-on-indexing-v2
Open

Fix crashes on indexing#360
caco3 wants to merge 5 commits intozevv:v1.5.0-rc2from
caco3:fix-crashes-on-indexing-v2

Conversation

@caco3
Copy link
Copy Markdown

@caco3 caco3 commented Apr 4, 2026

This PR addresses the feedback on #351 by splitting the original patch into five independent, reviewable commits.


Commit summaries

1. fix(buffer): correct buffer_put growth condition

The resize condition b->ptr + len <= b->len was inverted: it entered the realloc branch only when there was already room, and skipped it when the write would overflow. Combined with using b->len (used bytes) instead of b->max (allocated capacity) in the while-loop guard, write buffers were never grown beyond their initial 1 KB, causing heap corruption when indexing large directory trees.

2. fix(libduc): replace bare free() with duc_free() for consistency

duc_free() is now a real function in duc.c (not just a #define), so callers in index.c and canonicalize.c that still used bare free() were inconsistent. This commit aligns them.

3. fix(index,buffer): histogram bounds + strncpy null-termination in topn

  • Guards all histogram array writes with histogram_buckets > 0 to avoid writing through a zero-length array when the feature is disabled.
  • Fixes an off-by-one: the clamp used histogram_buckets as the index (one past the end); changed to histogram_buckets - 1.
  • Explicitly null-terminates topn_array[0]->name after strncpy in index.c.
  • Bounds the strncpy in buffer_get_index_report to DUC_PATH_MAX - 1 and adds an explicit null-terminator.

4. fix(db-tkrzw): add options_append() helper for safe options building

Replaces the open-coded inline bounds checks in db_open (which were both incorrectly formatted and incomplete) with a single static helper that appends a fragment to the options string only if there is room, returning -1 on overflow. Also replaces the unchecked sprintf() with snprintf().

5. fix(db): restore topn persistence with correct struct serialization

The original code stored raw pointer values from topn_array[]sizeof(duc_topn_file*) bytes per entry — which become stale garbage immediately after the indexing run. This commit re-enables the disabled block and fixes it: the pointer array is now flattened into a contiguous block of duc_topn_file structs (sizeof(duc_topn_file) per entry) before being written to the database, so actual file names and sizes are persisted. Also frees the intermediate flat buffer and the previously-retrieved DB value to avoid memory leaks.

CaCO3 added 5 commits April 4, 2026 09:41
The condition `b->ptr + len <= b->len` was inverted: it entered the
resize branch only when there was already room, and skipped it when
the write would overflow.  Combined with using b->len (used bytes)
instead of b->max (allocated bytes) in the while-loop, this meant
fresh write buffers were never resized beyond the initial 1 KB,
causing heap corruption when indexing large directories.
duc_free() is a #define alias for free() at the top of the file.
Using it everywhere makes memory management patterns uniform and
makes future audits / allocator swaps easier.
index.c:
- Guard histogram access with histogram_buckets > 0 to avoid a write
  through a zero-length array when the feature is disabled.
- Fix off-by-one: clamp to histogram_buckets-1, not histogram_buckets,
  so the index stays within the allocated array.
- Explicitly null-terminate topn_array[0]->name after strncpy.

buffer.c:
- In buffer_get_index_report, bound the strncpy to DUC_PATH_MAX-1
  and add an explicit null-terminator for safety.
Replace the open-coded inline bounds checks (flagged by the original
FIXME) with a single static helper that appends a fragment to the
options string only if there is room, returning -1 on overflow.
This also fixes the mixed-indentation formatting and replaces the
unchecked sprintf() with snprintf().
The original code stored raw pointer values from topn_array[]
(sizeof(duc_topn_file*) per entry) which become stale garbage
immediately after the indexing run.

Fix: flatten the pointer array into a contiguous block of
duc_topn_file structs (sizeof(duc_topn_file) per entry) so the
actual file names and sizes are persisted.  Also free the
intermediate buffer and the previously-retrieved DB value to
avoid memory leaks.
@caco3
Copy link
Copy Markdown
Author

caco3 commented Apr 4, 2026

@l8gravely This MR replaces #351
Feel free to close the other one if you are happy. I hope I took into account all your feedback (and understood it correctly). Its now easier to study each commit individually

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant