Feat/memory tags by pkeging · Pull Request #3381 · Hmbown/CodeWhale

pkeging · 2026-06-22T06:50:52Z

Summary

Testing

cargo fmt --all -- --check
cargo clippy --workspace --all-targets --all-features
cargo test --workspace --all-features

Checklist

Updated docs or comments as needed
Added or updated tests where relevant
Verified TUI behavior manually if UI changes
Harvested/co-authored credit uses a GitHub numeric noreply address

Structured memory with tag support: - MemoryEntry parser for timestamped bullets with #tag extraction - Tag indexing via MemoryIndex (inverted index for tags + full-text) - auto_tag() for automatic suggestion from capitalized terms - append_entry() extracts inline tags and accepts extra_tags parameter New CLI commands: - /memory tags — list tags with occurrence counts - /memory search <query> — full-text search across body and tags - /memory search --tag <tag> — exact tag filter Tool integration: - Remember tool accepts optional tags parameter - Auto-tag fallback when model provides no explicit tags Seam enrichment: - extract_topic_tags() scans messages for capitalized technical terms - MemoryIndex context injected into layered_context_checkpoint (max 20 entries) - memory_context section added to soft-seam and recompact prompts - SeamMetadata.tags populated for cross-reference

…e dead_code - search_by_tags: case-insensitive matching (was case-sensitive in memory.rs, inconsistent with memory_index.rs) - auto_tag: change redundant if/if to if/else for capitalized+special words - memory_index: remove stale #[allow(dead_code)] on search_by_tags (used by engine.rs)

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

github-actions · 2026-06-22T06:51:02Z

Thanks @pkeging for taking the time to contribute.

This repository is observing a maintainer-managed PR intake gate in dry-run mode, so this pull request is staying open. This note helps maintainers prepare the allowlist before any enforcement is considered.

Please read CONTRIBUTING.md for the expected contribution shape. A maintainer can grant recurring PR access by commenting /lgtm on a pull request.

gemini-code-assist

Code Review

This pull request introduces a tag-based indexing and search system for the user's persistent memory file, integrating it into TUI commands, the RememberTool, and the LLM context-enrichment pipeline (seams). Key changes include adding a lightweight inverted index (MemoryIndex) for fast tag and full-text searches, updating the /memory command with tags and search subcommands, and allowing the seam manager to extract topic tags from conversations to inject relevant memory context into LLM prompts. The review feedback highlights several issues and optimization opportunities: a formatting bug in engine.rs that appends a trailing # when tags are empty; a potential runtime panic in split_subcommand when slicing strings with multi-byte whitespace; missing validation for empty tags in /memory search --tag; case-sensitive tag deduplication that can lead to duplicates; and an opportunity to optimize MemoryIndex search performance by using sorted slice intersections instead of HashSet operations.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-22T06:52:55Z

+                matched
+                    .iter()
+                    .map(|e| format!("- ({}) {} #{}", e.timestamp, e.body, e.tags.join(" #")))
+                    .collect::<Vec<_>>()
+                    .join("\n")


If e.tags is empty, e.tags.join(" #") is empty, resulting in a trailing # at the end of the formatted string (e.g., "- (timestamp) body #"). This is a formatting bug that can confuse the LLM. We should only append the tags if they are not empty.

matched .iter() .map(|e| { if e.tags.is_empty() { format!("- ({}) {}", e.timestamp, e.body) } else { format!("- ({}) {} #{}", e.timestamp, e.body, e.tags.join(" #")) } }) .collect::<Vec<_>>() .join("\n")

gemini-code-assist · 2026-06-22T06:52:55Z

+fn split_subcommand(arg: Option<&str>) -> (&str, Option<&str>) {
+    match arg {
+        Some(a) => {
+            let trimmed = a.trim();
+            match trimmed.find(char::is_whitespace) {
+                Some(pos) => (&trimmed[..pos], Some(trimmed[pos + 1..].trim_start())),
+                None => (trimmed, None),
+            }
+        }
+        None => ("show", None),
+    }
+}


Slicing the string using a byte index pos + 1 after finding whitespace can panic at runtime if the whitespace character is multi-byte (e.g., the ideographic space \u{3000}). Using split_once is completely safe from UTF-8 boundary panics and is much more idiomatic Rust.

Suggested change

fn split_subcommand(arg: Option<&str>) -> (&str, Option<&str>) {

match arg {

Some(a) => {

let trimmed = a.trim();

match trimmed.find(char::is_whitespace) {

Some(pos) => (&trimmed[..pos], Some(trimmed[pos + 1..].trim_start())),

None => (trimmed, None),

}

}

None => ("show", None),

}

}

fn split_subcommand(arg: Option<&str>) -> (&str, Option<&str>) {

match arg {

Some(a) => {

let trimmed = a.trim();

match trimmed.split_once(char::is_whitespace) {

Some((sub, rest)) => (sub, Some(rest.trim_start())),

None => (trimmed, None),

}

}

None => ("show", None),

}

}

gemini-code-assist · 2026-06-22T06:52:55Z

+            let results: Vec<&crate::memory::MemoryEntry> = if query.starts_with("--tag ") {
+                let tag = query.trim_start_matches("--tag ").trim();
+                crate::memory::search_by_tags(&entries, &[tag])
+            } else {
+                crate::memory::search_text(&entries, query)
+            };


If the user types /memory search --tag without specifying a tag, or with trailing spaces but no tag name, the current implementation either searches for the literal text "--tag" or searches for an empty tag (returning no results). We should validate that the tag is not empty and return a helpful error message.

let results: Vec<&crate::memory::MemoryEntry> = if query == "--tag" || query.starts_with("--tag ") { let tag = query.trim_start_matches("--tag").trim(); if tag.is_empty() { return CommandResult::error("Error: Please specify a tag to search for."); } crate::memory::search_by_tags(&entries, &[tag]) } else { crate::memory::search_text(&entries, query) };

gemini-code-assist · 2026-06-22T06:52:55Z

+    let mut all_tags: Vec<&str> = Vec::new();
+    for t in inline_tags.into_iter().chain(extra_tags.iter().copied()) {
+        let tag = t.trim_start_matches('#');
+        if !tag.is_empty() && !all_tags.contains(&tag) {
+            all_tags.push(tag);
+        }
+    }


Deduplicating tags case-sensitively can lead to duplicate tags with different casing (e.g., #Rust and #rust). Deduplicating case-insensitively keeps the tags clean and consistent.

Suggested change

let mut all_tags: Vec<&str> = Vec::new();

for t in inline_tags.into_iter().chain(extra_tags.iter().copied()) {

let tag = t.trim_start_matches('#');

if !tag.is_empty() && !all_tags.contains(&tag) {

all_tags.push(tag);

}

}

let mut all_tags: Vec<&str> = Vec::new();

for t in inline_tags.into_iter().chain(extra_tags.iter().copied()) {

let tag = t.trim_start_matches('#');

if !tag.is_empty() && !all_tags.iter().any(|existing| existing.eq_ignore_ascii_case(tag)) {

all_tags.push(tag);

}

}

gemini-code-assist · 2026-06-22T06:52:56Z

+    let mut seen = Vec::new();
+    let tags: Vec<String> = tag_strs
+        .iter()
+        .map(|t| t.trim_start_matches('#').to_string())
+        .filter(|t| {
+            if seen.contains(t) {
+                false
+            } else {
+                seen.push(t.clone());
+                true
+            }
+        })
+        .collect();


Case-sensitive deduplication of tags during parsing can result in duplicate tags with different casing. Case-insensitive deduplication is cleaner and more consistent.

Suggested change

let mut seen = Vec::new();

let tags: Vec<String> = tag_strs

.iter()

.map(|t| t.trim_start_matches('#').to_string())

.filter(|t| {

if seen.contains(t) {

false

} else {

seen.push(t.clone());

true

}

})

.collect();

let mut seen = Vec::new();

let tags: Vec<String> = tag_strs

.iter()

.map(|t| t.trim_start_matches('#').to_string())

.filter(|t| {

if seen.iter().any(|existing: &String| existing.eq_ignore_ascii_case(t)) {

false

} else {

seen.push(t.clone());

true

}

})

.collect();

gemini-code-assist · 2026-06-22T06:52:56Z

+    pub fn search_text(&self, query: &str) -> Vec<&MemoryEntry> {
+        let words: Vec<String> = query
+            .split_whitespace()
+            .filter_map(|w| {
+                let clean: String = w
+                    .chars()
+                    .filter(|c| c.is_alphanumeric() || *c == '_' || *c == '-')
+                    .collect();
+                if clean.len() >= 2 {
+                    Some(clean.to_lowercase())
+                } else {
+                    None
+                }
+            })
+            .collect();
+
+        if words.is_empty() {
+            return self.entries.iter().collect();
+        }
+
+        // Find intersection of all word matches
+        let mut result: Option<HashSet<usize>> = None;
+        for word in &words {
+            if let Some(indices) = self.text_index.get(word) {
+                let set: HashSet<usize> = indices.iter().copied().collect();
+                result = match result {
+                    Some(existing) => Some(existing.intersection(&set).copied().collect()),
+                    None => Some(set),
+                };
+            } else {
+                // A required word has no matches → empty result
+                return Vec::new();
+            }
+        }
+
+        let mut indices: Vec<usize> = result.unwrap_or_default().into_iter().collect();
+        indices.sort_unstable();
+        indices.iter().map(|&i| &self.entries[i]).collect()
+    }
+
+    /// Combined search: filter by tags (OR) and text (AND).
+    /// Returns entries that match both criteria.
+    #[must_use]
+    pub fn search(&self, tags: &[&str], text: Option<&str>) -> Vec<&MemoryEntry> {
+        if tags.is_empty() && text.is_none() {
+            return self.entries.iter().collect();
+        }
+
+        let entry_set: HashSet<usize> = (0..self.entries.len()).collect();
+
+        let tag_indices: HashSet<usize> = if tags.is_empty() {
+            entry_set.clone()
+        } else {
+            let mut s = HashSet::new();
+            for tag in tags {
+                let key = tag.trim_start_matches('#').to_lowercase();
+                if let Some(indices) = self.tag_index.get(&key) {
+                    for &i in indices {
+                        s.insert(i);
+                    }
+                }
+            }
+            s
+        };
+
+        let text_indices: HashSet<usize> = if let Some(query) = text {
+            let words: Vec<String> = query
+                .split_whitespace()
+                .filter_map(|w| {
+                    let clean: String = w
+                        .chars()
+                        .filter(|c| c.is_alphanumeric() || *c == '_' || *c == '-')
+                        .collect();
+                    if clean.len() >= 2 {
+                        Some(clean.to_lowercase())
+                    } else {
+                        None
+                    }
+                })
+                .collect();
+            if words.is_empty() {
+                entry_set
+            } else {
+                let mut result: Option<HashSet<usize>> = None;
+                for word in &words {
+                    if let Some(indices) = self.text_index.get(word) {
+                        let set: HashSet<usize> = indices.iter().copied().collect();
+                        result = Some(match result {
+                            Some(existing) => existing.intersection(&set).copied().collect(),
+                            None => set,
+                        });
+                    } else {
+                        return Vec::new();
+                    }
+                }
+                result.unwrap_or_default()
+            }
+        } else {
+            entry_set
+        };
+
+        let mut indices: Vec<usize> = tag_indices
+            .intersection(&text_indices)
+            .copied()
+            .collect();
+        indices.sort_unstable();
+        indices.iter().map(|&i| &self.entries[i]).collect()
+    }


The search methods construct multiple HashSets and perform intersections on them for every query word. Since the indices in text_index and tag_index are already sorted, we can perform a highly optimized sorted slice intersection in $O(N + M)$ time with zero hash table overhead and fewer allocations.

pub fn search_text(&self, query: &str) -> Vec<&MemoryEntry> { let indices = self.search_text_indices(query); indices.iter().map(|&i| &self.entries[i]).collect() } fn search_text_indices(&self, query: &str) -> Vec<usize> { let words: Vec<String> = query .split_whitespace() .filter_map(|w| { let clean: String = w .chars() .filter(|c| c.is_alphanumeric() || *c == '_' || *c == '-') .collect(); if clean.len() >= 2 { Some(clean.to_lowercase()) } else { None } }) .collect(); if words.is_empty() { return (0..self.entries.len()).collect(); } let mut result: Option<Vec<usize>> = None; for word in &words { if let Some(indices) = self.text_index.get(word) { result = match result { Some(existing) => Some(Self::intersect_sorted(&existing, indices)), None => Some(indices.clone()), }; } else { return Vec::new(); } } result.unwrap_or_default() } /// Combined search: filter by tags (OR) and text (AND). /// Returns entries that match both criteria. #[must_use] pub fn search(&self, tags: &[&str], text: Option<&str>) -> Vec<&MemoryEntry> { if tags.is_empty() && text.is_none() { return self.entries.iter().collect(); } let tag_indices = if tags.is_empty() { (0..self.entries.len()).collect() } else { let mut s = Vec::new(); for tag in tags { let key = tag.trim_start_matches('#').to_lowercase(); if let Some(indices) = self.tag_index.get(&key) { s.extend_from_slice(indices); } } s.sort_unstable(); s.dedup(); s }; let text_indices = if let Some(query) = text { self.search_text_indices(query) } else { (0..self.entries.len()).collect() }; let indices = Self::intersect_sorted(&tag_indices, &text_indices); indices.iter().map(|&i| &self.entries[i]).collect() } fn intersect_sorted(a: &[usize], b: &[usize]) -> Vec<usize> { let mut result = Vec::with_capacity(a.len().min(b.len())); let (mut i, mut j) = (0, 0); while i < a.len() && j < b.len() { match a[i].cmp(&b[j]) { std::cmp::Ordering::Equal => { result.push(a[i]); i += 1; j += 1; } std::cmp::Ordering::Less => i += 1, std::cmp::Ordering::Greater => j += 1, } } result }

…ty tag, case-sensitive dedup, sorted intersection - engine.rs: avoid trailing '#' when tags list is empty - memory.rs (commands): fix split_subcommand panic on multi-byte whitespace - memory.rs (commands): validate --tag is not empty before searching - memory.rs (core): case-insensitive tag dedup in append_entry - memory_index.rs: replace HashSet intersections with sorted slice O(m+n) - memory_index.rs: remove HashSet import

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

pkeging added 2 commits June 22, 2026 14:39

pkeging requested a review from Hmbown as a code owner June 22, 2026 06:50

greptile-apps Bot reviewed Jun 22, 2026

View reviewed changes

gemini-code-assist Bot reviewed Jun 22, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/memory tags#3381

Feat/memory tags#3381
pkeging wants to merge 3 commits into
Hmbown:mainfrom
pkeging:feat/memory-tags

pkeging commented Jun 22, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

github-actions Bot commented Jun 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 22, 2026

Uh oh!

gemini-code-assist Bot Jun 22, 2026

Uh oh!

gemini-code-assist Bot Jun 22, 2026

Uh oh!

gemini-code-assist Bot Jun 22, 2026

Uh oh!

gemini-code-assist Bot Jun 22, 2026

Uh oh!

gemini-code-assist Bot Jun 22, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pkeging commented Jun 22, 2026

Summary

Testing

Checklist

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant