Skip to content

Feat/memory tags#3381

Open
pkeging wants to merge 3 commits into
Hmbown:mainfrom
pkeging:feat/memory-tags
Open

Feat/memory tags#3381
pkeging wants to merge 3 commits into
Hmbown:mainfrom
pkeging:feat/memory-tags

Conversation

@pkeging

@pkeging pkeging commented Jun 22, 2026

Copy link
Copy Markdown

Summary

Testing

  • cargo fmt --all -- --check
  • cargo clippy --workspace --all-targets --all-features
  • cargo test --workspace --all-features

Checklist

  • Updated docs or comments as needed
  • Added or updated tests where relevant
  • Verified TUI behavior manually if UI changes
  • Harvested/co-authored credit uses a GitHub numeric noreply address

pkeging added 2 commits June 22, 2026 14:39
Structured memory with tag support:
- MemoryEntry parser for timestamped bullets with #tag extraction
- Tag indexing via MemoryIndex (inverted index for tags + full-text)
- auto_tag() for automatic suggestion from capitalized terms
- append_entry() extracts inline tags and accepts extra_tags parameter

New CLI commands:
- /memory tags — list tags with occurrence counts
- /memory search <query> — full-text search across body and tags
- /memory search --tag <tag> — exact tag filter

Tool integration:
- Remember tool accepts optional tags parameter
- Auto-tag fallback when model provides no explicit tags

Seam enrichment:
- extract_topic_tags() scans messages for capitalized technical terms
- MemoryIndex context injected into layered_context_checkpoint (max 20 entries)
- memory_context section added to soft-seam and recompact prompts
- SeamMetadata.tags populated for cross-reference
…e dead_code

- search_by_tags: case-insensitive matching (was case-sensitive in memory.rs,
  inconsistent with memory_index.rs)
- auto_tag: change redundant if/if to if/else for capitalized+special words
- memory_index: remove stale #[allow(dead_code)] on search_by_tags (used by engine.rs)
@pkeging pkeging requested a review from Hmbown as a code owner June 22, 2026 06:50

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@github-actions

Copy link
Copy Markdown

Thanks @pkeging for taking the time to contribute.

This repository is observing a maintainer-managed PR intake gate in dry-run mode, so this pull request is staying open. This note helps maintainers prepare the allowlist before any enforcement is considered.

Please read CONTRIBUTING.md for the expected contribution shape. A maintainer can grant recurring PR access by commenting /lgtm on a pull request.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a tag-based indexing and search system for the user's persistent memory file, integrating it into TUI commands, the RememberTool, and the LLM context-enrichment pipeline (seams). Key changes include adding a lightweight inverted index (MemoryIndex) for fast tag and full-text searches, updating the /memory command with tags and search subcommands, and allowing the seam manager to extract topic tags from conversations to inject relevant memory context into LLM prompts. The review feedback highlights several issues and optimization opportunities: a formatting bug in engine.rs that appends a trailing # when tags are empty; a potential runtime panic in split_subcommand when slicing strings with multi-byte whitespace; missing validation for empty tags in /memory search --tag; case-sensitive tag deduplication that can lead to duplicates; and an opportunity to optimize MemoryIndex search performance by using sorted slice intersections instead of HashSet operations.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +2815 to +2819
matched
.iter()
.map(|e| format!("- ({}) {} #{}", e.timestamp, e.body, e.tags.join(" #")))
.collect::<Vec<_>>()
.join("\n")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If e.tags is empty, e.tags.join(" #") is empty, resulting in a trailing # at the end of the formatted string (e.g., "- (timestamp) body #"). This is a formatting bug that can confuse the LLM. We should only append the tags if they are not empty.

                matched
                    .iter()
                    .map(|e| {
                        if e.tags.is_empty() {
                            format!("- ({}) {}", e.timestamp, e.body)
                        } else {
                            format!("- ({}) {} #{}", e.timestamp, e.body, e.tags.join(" #"))
                        }
                    })
                    .collect::<Vec<_>>()
                    .join("\n")

Comment on lines +50 to +61
fn split_subcommand(arg: Option<&str>) -> (&str, Option<&str>) {
match arg {
Some(a) => {
let trimmed = a.trim();
match trimmed.find(char::is_whitespace) {
Some(pos) => (&trimmed[..pos], Some(trimmed[pos + 1..].trim_start())),
None => (trimmed, None),
}
}
None => ("show", None),
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Slicing the string using a byte index pos + 1 after finding whitespace can panic at runtime if the whitespace character is multi-byte (e.g., the ideographic space \u{3000}). Using split_once is completely safe from UTF-8 boundary panics and is much more idiomatic Rust.

Suggested change
fn split_subcommand(arg: Option<&str>) -> (&str, Option<&str>) {
match arg {
Some(a) => {
let trimmed = a.trim();
match trimmed.find(char::is_whitespace) {
Some(pos) => (&trimmed[..pos], Some(trimmed[pos + 1..].trim_start())),
None => (trimmed, None),
}
}
None => ("show", None),
}
}
fn split_subcommand(arg: Option<&str>) -> (&str, Option<&str>) {
match arg {
Some(a) => {
let trimmed = a.trim();
match trimmed.split_once(char::is_whitespace) {
Some((sub, rest)) => (sub, Some(rest.trim_start())),
None => (trimmed, None),
}
}
None => ("show", None),
}
}

Comment on lines +153 to +158
let results: Vec<&crate::memory::MemoryEntry> = if query.starts_with("--tag ") {
let tag = query.trim_start_matches("--tag ").trim();
crate::memory::search_by_tags(&entries, &[tag])
} else {
crate::memory::search_text(&entries, query)
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If the user types /memory search --tag without specifying a tag, or with trailing spaces but no tag name, the current implementation either searches for the literal text "--tag" or searches for an empty tag (returning no results). We should validate that the tag is not empty and return a helpful error message.

            let results: Vec<&crate::memory::MemoryEntry> = if query == "--tag" || query.starts_with("--tag ") {
                let tag = query.trim_start_matches("--tag").trim();
                if tag.is_empty() {
                    return CommandResult::error("Error: Please specify a tag to search for.");
                }
                crate::memory::search_by_tags(&entries, &[tag])
            } else {
                crate::memory::search_text(&entries, query)
            };

Comment thread crates/tui/src/memory.rs
Comment on lines +166 to +172
let mut all_tags: Vec<&str> = Vec::new();
for t in inline_tags.into_iter().chain(extra_tags.iter().copied()) {
let tag = t.trim_start_matches('#');
if !tag.is_empty() && !all_tags.contains(&tag) {
all_tags.push(tag);
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Deduplicating tags case-sensitively can lead to duplicate tags with different casing (e.g., #Rust and #rust). Deduplicating case-insensitively keeps the tags clean and consistent.

Suggested change
let mut all_tags: Vec<&str> = Vec::new();
for t in inline_tags.into_iter().chain(extra_tags.iter().copied()) {
let tag = t.trim_start_matches('#');
if !tag.is_empty() && !all_tags.contains(&tag) {
all_tags.push(tag);
}
}
let mut all_tags: Vec<&str> = Vec::new();
for t in inline_tags.into_iter().chain(extra_tags.iter().copied()) {
let tag = t.trim_start_matches('#');
if !tag.is_empty() && !all_tags.iter().any(|existing| existing.eq_ignore_ascii_case(tag)) {
all_tags.push(tag);
}
}

Comment thread crates/tui/src/memory.rs
Comment on lines +228 to +240
let mut seen = Vec::new();
let tags: Vec<String> = tag_strs
.iter()
.map(|t| t.trim_start_matches('#').to_string())
.filter(|t| {
if seen.contains(t) {
false
} else {
seen.push(t.clone());
true
}
})
.collect();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Case-sensitive deduplication of tags during parsing can result in duplicate tags with different casing. Case-insensitive deduplication is cleaner and more consistent.

Suggested change
let mut seen = Vec::new();
let tags: Vec<String> = tag_strs
.iter()
.map(|t| t.trim_start_matches('#').to_string())
.filter(|t| {
if seen.contains(t) {
false
} else {
seen.push(t.clone());
true
}
})
.collect();
let mut seen = Vec::new();
let tags: Vec<String> = tag_strs
.iter()
.map(|t| t.trim_start_matches('#').to_string())
.filter(|t| {
if seen.iter().any(|existing: &String| existing.eq_ignore_ascii_case(t)) {
false
} else {
seen.push(t.clone());
true
}
})
.collect();

Comment on lines +113 to +220
pub fn search_text(&self, query: &str) -> Vec<&MemoryEntry> {
let words: Vec<String> = query
.split_whitespace()
.filter_map(|w| {
let clean: String = w
.chars()
.filter(|c| c.is_alphanumeric() || *c == '_' || *c == '-')
.collect();
if clean.len() >= 2 {
Some(clean.to_lowercase())
} else {
None
}
})
.collect();

if words.is_empty() {
return self.entries.iter().collect();
}

// Find intersection of all word matches
let mut result: Option<HashSet<usize>> = None;
for word in &words {
if let Some(indices) = self.text_index.get(word) {
let set: HashSet<usize> = indices.iter().copied().collect();
result = match result {
Some(existing) => Some(existing.intersection(&set).copied().collect()),
None => Some(set),
};
} else {
// A required word has no matches → empty result
return Vec::new();
}
}

let mut indices: Vec<usize> = result.unwrap_or_default().into_iter().collect();
indices.sort_unstable();
indices.iter().map(|&i| &self.entries[i]).collect()
}

/// Combined search: filter by tags (OR) and text (AND).
/// Returns entries that match both criteria.
#[must_use]
pub fn search(&self, tags: &[&str], text: Option<&str>) -> Vec<&MemoryEntry> {
if tags.is_empty() && text.is_none() {
return self.entries.iter().collect();
}

let entry_set: HashSet<usize> = (0..self.entries.len()).collect();

let tag_indices: HashSet<usize> = if tags.is_empty() {
entry_set.clone()
} else {
let mut s = HashSet::new();
for tag in tags {
let key = tag.trim_start_matches('#').to_lowercase();
if let Some(indices) = self.tag_index.get(&key) {
for &i in indices {
s.insert(i);
}
}
}
s
};

let text_indices: HashSet<usize> = if let Some(query) = text {
let words: Vec<String> = query
.split_whitespace()
.filter_map(|w| {
let clean: String = w
.chars()
.filter(|c| c.is_alphanumeric() || *c == '_' || *c == '-')
.collect();
if clean.len() >= 2 {
Some(clean.to_lowercase())
} else {
None
}
})
.collect();
if words.is_empty() {
entry_set
} else {
let mut result: Option<HashSet<usize>> = None;
for word in &words {
if let Some(indices) = self.text_index.get(word) {
let set: HashSet<usize> = indices.iter().copied().collect();
result = Some(match result {
Some(existing) => existing.intersection(&set).copied().collect(),
None => set,
});
} else {
return Vec::new();
}
}
result.unwrap_or_default()
}
} else {
entry_set
};

let mut indices: Vec<usize> = tag_indices
.intersection(&text_indices)
.copied()
.collect();
indices.sort_unstable();
indices.iter().map(|&i| &self.entries[i]).collect()
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The search methods construct multiple HashSets and perform intersections on them for every query word. Since the indices in text_index and tag_index are already sorted, we can perform a highly optimized sorted slice intersection in $O(N + M)$ time with zero hash table overhead and fewer allocations.

    pub fn search_text(&self, query: &str) -> Vec<&MemoryEntry> {
        let indices = self.search_text_indices(query);
        indices.iter().map(|&i| &self.entries[i]).collect()
    }

    fn search_text_indices(&self, query: &str) -> Vec<usize> {
        let words: Vec<String> = query
            .split_whitespace()
            .filter_map(|w| {
                let clean: String = w
                    .chars()
                    .filter(|c| c.is_alphanumeric() || *c == '_' || *c == '-')
                    .collect();
                if clean.len() >= 2 {
                    Some(clean.to_lowercase())
                } else {
                    None
                }
            })
            .collect();

        if words.is_empty() {
            return (0..self.entries.len()).collect();
        }

        let mut result: Option<Vec<usize>> = None;
        for word in &words {
            if let Some(indices) = self.text_index.get(word) {
                result = match result {
                    Some(existing) => Some(Self::intersect_sorted(&existing, indices)),
                    None => Some(indices.clone()),
                };
            } else {
                return Vec::new();
            }
        }
        result.unwrap_or_default()
    }

    /// Combined search: filter by tags (OR) and text (AND).
    /// Returns entries that match both criteria.
    #[must_use]
    pub fn search(&self, tags: &[&str], text: Option<&str>) -> Vec<&MemoryEntry> {
        if tags.is_empty() && text.is_none() {
            return self.entries.iter().collect();
        }

        let tag_indices = if tags.is_empty() {
            (0..self.entries.len()).collect()
        } else {
            let mut s = Vec::new();
            for tag in tags {
                let key = tag.trim_start_matches('#').to_lowercase();
                if let Some(indices) = self.tag_index.get(&key) {
                    s.extend_from_slice(indices);
                }
            }
            s.sort_unstable();
            s.dedup();
            s
        };

        let text_indices = if let Some(query) = text {
            self.search_text_indices(query)
        } else {
            (0..self.entries.len()).collect()
        };

        let indices = Self::intersect_sorted(&tag_indices, &text_indices);
        indices.iter().map(|&i| &self.entries[i]).collect()
    }

    fn intersect_sorted(a: &[usize], b: &[usize]) -> Vec<usize> {
        let mut result = Vec::with_capacity(a.len().min(b.len()));
        let (mut i, mut j) = (0, 0);
        while i < a.len() && j < b.len() {
            match a[i].cmp(&b[j]) {
                std::cmp::Ordering::Equal => {
                    result.push(a[i]);
                    i += 1;
                    j += 1;
                }
                std::cmp::Ordering::Less => i += 1,
                std::cmp::Ordering::Greater => j += 1,
            }
        }
        result
    }

…ty tag, case-sensitive dedup, sorted intersection

- engine.rs: avoid trailing '#' when tags list is empty
- memory.rs (commands): fix split_subcommand panic on multi-byte whitespace
- memory.rs (commands): validate --tag is not empty before searching
- memory.rs (core): case-insensitive tag dedup in append_entry
- memory_index.rs: replace HashSet intersections with sorted slice O(m+n)
- memory_index.rs: remove HashSet import

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant