Skip to content

Reduce size of search index#338

Merged
oleeskild merged 4 commits intooleeskild:mainfrom
davidkopp:optimize-search-index
Mar 21, 2026
Merged

Reduce size of search index#338
oleeskild merged 4 commits intooleeskild:mainfrom
davidkopp:optimize-search-index

Conversation

@davidkopp
Copy link
Copy Markdown
Contributor

The JSON file for the search index had a size of 2 MB on my site:

Image

I think this is too big. With the changes in this PR the search index on my site is now reduced to 292 kB:

image

Changes:

  • remove newlines and whitespace
  • remove link tags from markdown links properly (striptags(true) | link was not sufficient)
  • reduce the length of the content per note in the search index to 500 characters
  • minify the whole json

@oleeskild
Copy link
Copy Markdown
Owner

Nice, great start!
Do you have a good justification for the 500 character cutoff? I think it's reasonable to assume that the search will search the entire note, and not just the 500 first characters.
Maybe we could default it to not reduce the lenght, but introduce a config value somewhere to reduce it for those that needs it.
Thoughts?

@davidkopp
Copy link
Copy Markdown
Contributor Author

My goal was to reduce the index size with the trade off of having slightly worse search results. I made the decision to only use 500 characters, as I assume that the first 500 characters already contain the most relevant keywords.

But it makes sense to make it configurable.
As this is only relevant if the search feature is enabled, would it make sense to put it under "Enable search" in the default note settings? Or would you prefer it as part of the advanced settings?
At the moment I think a simple boolean flag would be sufficient for this, like SEARCH_INDEX_USE_FULL_NOTE or SEARCH_INDEX_TRUNCATE_CONTENT. Not sure if anybody would like to have a different character limit than 500.

oleeskild and others added 3 commits March 21, 2026 14:00
Keeps full note content in the search index to avoid reducing search
recall for content beyond the first 500 characters. The other size
optimizations (HTML stripping, whitespace collapsing, JSON minification)
are sufficient.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolve conflict in .eleventy.js: keep stripForSearch filter and
use main's searchableTags function signature.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The date field may be used by consumers for display or sorting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@oleeskild
Copy link
Copy Markdown
Owner

I've removed the 500 character limit, and kept the date field. Not sure why you deleted that in your PR? The minification is a nice addition. If you ever want to add support for setting a content limit in settings I'd be happy to have a look at the PR. But for now I'll make it so everything in every note is searchable.

@oleeskild oleeskild merged commit 6ac51f3 into oleeskild:main Mar 21, 2026
1 check passed
@davidkopp
Copy link
Copy Markdown
Contributor Author

Thanks for modifying and merging the PR!
I removed the date field, because it is not used in the search UI. So if I understand it correctly, it is currently "waste" in the search index.

If I will find the time in the (near) future, I maybe create a PR regarding the content limit configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants