🔴 Required Information
Describe the Bug:
InMemoryMemoryService.search_memory() always returns 0 results when the
query consists entirely of CJK (Chinese/Japanese/Korean) characters.
The internal helper _extract_words_lower() uses a \w+ regex to tokenize
text into a word set for keyword matching. In Python, \w+ does not match
CJK characters as individual tokens — an entire CJK string is treated as a
single token that never appears in the stored event text, causing every
CJK-only query to produce an empty set and therefore zero matches.
Steps to Reproduce:
- Install
pip install google-adk
- Create an
InMemoryMemoryService and add a session whose events contain
CJK text (e.g. "你好,我是 Foy")
- Call
search_memory() with a CJK query (e.g. "你知道我叫什麼名字嗎")
- Observe that the result always contains 0 memories
Expected Behavior:
search_memory() should return matching memories when the query shares
CJK characters with stored event content, consistent with how ASCII keyword
matching works for English text.
Observed Behavior:
search_memory() always returns an empty SearchMemoryResponse for any
query that contains only CJK characters, even when the stored session events
contain overlapping CJK characters.
Root cause — in in_memory_memory_service.py:
# _extract_words_lower uses re.findall(r'\w+', text.lower())
# For CJK input this returns an empty set:
_extract_words_lower('你知道我叫什麼名字嗎') # → set()
_extract_words_lower('你好,我是 Foy') # → {'foy'} (CJK stripped)
# The match condition therefore never fires:
if any(query_word in words_in_event for query_word in words_in_query):
# never reached for CJK-only queries
Environment Details:
- ADK Library Version:
1.32.0
- Desktop OS: macOS
- Python Version:
3.13.9
Model Information:
- Are you using LiteLLM: Yes (for reproduction; issue is in the memory layer, model-independent)
- Which model is being used:
gemini-2.5-flash / Ollama gemma4
🟡 Optional Information
Regression:
N/A — not tested on earlier versions.
Logs:
[PreloadMemory] 🔍 搜尋 query: 你知道我叫什麼名字嗎
[PreloadMemory] 結果數: 0
After patching search_memory to handle CJK characters:
[PreloadMemory] 🔍 搜尋 query: 你知道我叫什麼名字嗎
[PreloadMemory] 結果數: 2
[PreloadMemory] → 你好,我是 Foy
Minimal Reproduction Code:
import asyncio
from google.adk.memory.in_memory_memory_service import InMemoryMemoryService
from google.adk.sessions.in_memory_session_service import InMemorySessionService
from google.genai import types
async def main():
session_service = InMemorySessionService()
memory_service = InMemoryMemoryService()
session = await session_service.create_session(
app_name="test", user_id="user"
)
# Simulate a user event with CJK content
session.events.append(
type("Event", (), {
"content": types.Content(parts=[types.Part(text="你好,我是 Foy")]),
"author": "user",
"timestamp": 0.0,
"id": "evt1",
})()
)
await memory_service.add_session_to_memory(session)
result = await memory_service.search_memory(
app_name="test", user_id="user", query="你知道我叫什麼名字嗎"
)
print(f"memories found: {len(result.memories)}") # prints 0, expected >= 1
asyncio.run(main())
Suggested Fix:
Extend _extract_words_lower (or the matching logic) to also tokenize CJK
text at the individual character level:
import re
def _extract_words_lower(text: str) -> set[str]:
ascii_words = set(re.findall(r'[a-zA-Z0-9]+', text.lower()))
cjk_chars = set(re.findall(r'[\u4e00-\u9fff\u3400-\u4dbf]', text))
return ascii_words | cjk_chars
How often has this issue occurred?:
🔴 Required Information
Describe the Bug:
InMemoryMemoryService.search_memory()always returns 0 results when thequery consists entirely of CJK (Chinese/Japanese/Korean) characters.
The internal helper
_extract_words_lower()uses a\w+regex to tokenizetext into a word set for keyword matching. In Python,
\w+does not matchCJK characters as individual tokens — an entire CJK string is treated as a
single token that never appears in the stored event text, causing every
CJK-only query to produce an empty set and therefore zero matches.
Steps to Reproduce:
pip install google-adkInMemoryMemoryServiceand add a session whose events containCJK text (e.g.
"你好,我是 Foy")search_memory()with a CJK query (e.g."你知道我叫什麼名字嗎")Expected Behavior:
search_memory()should return matching memories when the query sharesCJK characters with stored event content, consistent with how ASCII keyword
matching works for English text.
Observed Behavior:
search_memory()always returns an emptySearchMemoryResponsefor anyquery that contains only CJK characters, even when the stored session events
contain overlapping CJK characters.
Root cause — in
in_memory_memory_service.py:Environment Details:
1.32.03.13.9Model Information:
gemini-2.5-flash/ Ollamagemma4🟡 Optional Information
Regression:
N/A — not tested on earlier versions.
Logs:
After patching
search_memoryto handle CJK characters:Minimal Reproduction Code:
Suggested Fix:
Extend
_extract_words_lower(or the matching logic) to also tokenize CJKtext at the individual character level:
How often has this issue occurred?: