Fix: Split single lines exceeding chunk_size limit #10

echobt · 2026-01-19T16:46:14Z

Summary

This PR fixes a bug where single lines longer than the configured \chunk_size\ were not being split, resulting in chunks larger than the limit.

Problem

The previous chunking logic in \src/core/indexer.rs\ would simply append a line to the current chunk if the chunk was empty, even if that line itself exceeded the \chunk_size\ limit. This meant that a file containing a single very long line (e.g., minified code or a large data string) would produce a single massive chunk, potentially causing issues with downstream embedding models that have strict token limits.

Solution

The fix involves detecting if a line exceeds the \chunk_size\ before attempting to add it to the current chunk. If it does:

The current pending chunk is flushed.
The long line is hard-split into multiple segments of \chunk_size.
These segments are added as separate chunks.

Testing

Verified with a reproduction test case where a string of length 25 (limit 10) was previously resulting in 1 chunk of size 25, and now correctly splits into 3 chunks (size 10, 10, 5).

Related Issue

Fixes PlatformNetwork/bounty-challenge#52

Fix: Split single lines exceeding chunk_size limit

0b30dcc

echobt mentioned this pull request Jan 19, 2026

[BUG] Single lines exceeding chunk_size are not split, creating oversized chunks PlatformNetwork/bounty-challenge#52

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Split single lines exceeding chunk_size limit #10

Fix: Split single lines exceeding chunk_size limit #10

Uh oh!

echobt commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix: Split single lines exceeding chunk_size limit #10

Are you sure you want to change the base?

Fix: Split single lines exceeding chunk_size limit #10

Uh oh!

Conversation

echobt commented Jan 19, 2026

Summary

Problem

Solution

Testing

Related Issue

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants