Skip to content

feat(server): add memory health statistics API endpoints#706

Open
mvanhorn wants to merge 5 commits intovolcengine:mainfrom
mvanhorn:feat/memory-health-stats-api
Open

feat(server): add memory health statistics API endpoints#706
mvanhorn wants to merge 5 commits intovolcengine:mainfrom
mvanhorn:feat/memory-health-stats-api

Conversation

@mvanhorn
Copy link
Contributor

Problem Statement

OpenViking has the infrastructure for memory observability (hotness_score, retrieval_stats, eval recorder) but no API to query aggregate memory health. Operators can't answer basic questions without digging into the database directly:

  • How many memories exist per category?
  • What's the hotness distribution? Are most memories going cold?
  • How much storage is consumed?
  • Which memories haven't been accessed in 30 days?

Proposed Solution

Add two API endpoints:

GET /stats/memories - Global memory statistics

{
  "total_memories": 1247,
  "by_category": {
    "profile": 3,
    "preferences": 42,
    "entities": 186,
    "events": 89,
    "cases": 412,
    "patterns": 67,
    "tools": 298,
    "skills": 150
  },
  "hotness_distribution": {
    "cold": 312,
    "warm": 687,
    "hot": 248
  },
  "staleness": {
    "not_accessed_7d": 89,
    "not_accessed_30d": 312,
    "oldest_memory_age_days": 45
  },
  "total_vectors": 8941
}

GET /sessions/{session_id}/stats - Per-session extraction stats

{
  "session_id": "abc123",
  "total_turns": 5,
  "memories_extracted": 3,
  "contexts_used": 2,
  "skills_used": 1
}

Supports ?category=cases query parameter to filter by a single memory category.

Alternatives Considered

Extending the TUI only (#664) - but the TUI isn't programmatically accessible, and automated monitoring needs an API.

Implementation

  • openviking/storage/stats_aggregator.py - Core StatsAggregator class that queries VikingDB for category counts, hotness distribution (cold <0.2, warm 0.2-0.6, hot >0.6), and staleness metrics. Uses the existing hotness_score() function from memory_lifecycle.py.
  • openviking/server/routers/stats.py - FastAPI router with two endpoints, following the pattern in routers/sessions.py.
  • Router registered in app.py and routers/__init__.py following existing conventions.
  • No new dependencies required. No new storage introduced.

Evidence

Source Evidence Engagement
#640 Request-level trace metrics just merged - observability is active priority Merged by zhoujh01
#529 Oversized prompts destabilize VLM calls - storage metrics would catch this Closed (fixed)
#350 Ingestion/indexing decoupling - users need ingestion progress visibility 3 thumbs up
Reddit "Why AI Coding Agents Waste Half Their Context Window" - demand for context visibility 56 upvotes, 40 comments
Discussion Community requests for evaluation/observability module Active discussion

Test Plan

  • Unit tests for StatsAggregator with mocked VikingDB (empty store, category counts, hotness buckets, staleness, error handling)
  • Unit tests for API router (response shape, invalid category validation, session stats, session not found)
  • _parse_datetime helper tested for None, datetime objects, ISO strings, invalid input
  • Integration test with live VikingDB (manual)

Generated with Claude Code

Add two new API endpoints for querying aggregate memory health:
- GET /stats/memories - global memory stats (counts by category,
  hotness distribution, staleness metrics)
- GET /stats/sessions/{id} - per-session extraction statistics

The StatsAggregator reads from existing VikingDB indexes and the
hotness_score function without introducing new storage.

Includes unit tests with mocked VikingDB backend.
Replace the audio parser stub with a working implementation that:
- Extracts metadata (duration, sample rate, channels, bitrate) via mutagen
- Transcribes speech via Whisper API with timestamped segments
- Builds structured ResourceNode tree with L0/L1/L2 content tiers
- Falls back to metadata-only output when Whisper is unavailable
- Adds mutagen as optional dependency under [audio] extra
- Adds audio_summary prompt template for semantic indexing
- Includes unit tests with mocked Whisper API and mutagen
Copy link
Collaborator

@qin-ctx qin-ctx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bug] (blocking) This PR bundles two unrelated features — memory health stats API (commit 1) and a complete audio parser rewrite with Whisper transcription (commit 2). These have no code dependency and should be separate PRs. Mixing them makes review, rollback, and changelog tracking harder.

Additional findings:

  • [Design] (non-blocking) PR description says GET /sessions/{session_id}/stats but the actual route is GET /api/v1/stats/sessions/{session_id}. Please update the description to match.
  • [Design] (non-blocking) _asr_transcribe and _asr_transcribe_with_timestamps duplicate the OpenAI client creation code (get_openviking_config() + openai.AsyncOpenAI(...)). Extract to a shared helper. Also, config.llm.api_key may not be the correct credential for OpenAI Whisper if the project is configured for a different LLM provider.
  • [Suggestion] (non-blocking) audio_summary.yaml prompt template is added but never referenced in any code path — dead code.
  • [Suggestion] (non-blocking) _generate_semantic_info accepts a viking_fs parameter that is never used in the method body.
  • [Suggestion] (non-blocking) CI lint / lint check is failing.

total_vectors = 0

for cat in categories:
records = await self._query_memories_by_category(ctx, cat)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bug] (blocking) N+1 query: _query_memories_by_category executes the same Eq("context_type", "memory") query with limit=10000 for each of the 8 categories, then filters by URI prefix in Python. This means 8 identical DB round-trips, each returning up to 10,000 records.

Fetch once and group in memory instead:

all_records = await self._query_all_memories(ctx)
by_cat = defaultdict(list)
for r in all_records:
    uri = r.get("uri", "")
    for cat in categories:
        if f"/{cat}/" in uri:
            by_cat[cat].append(r)
            break

"by_category": by_category,
"hotness_distribution": hotness_dist,
"staleness": staleness,
"total_vectors": total_vectors,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bug] (blocking) total_vectors is always identical to total_memories — both are sum(by_category.values()). The PR description shows them as different numbers (1247 vs 8941), implying total_vectors should represent the actual vector embedding count (a memory can have multiple vectors). The current implementation is misleading.

Either compute the real vector count from VikingDB index stats, or remove this field until it can report a meaningful value.

try:
result = await aggregator.get_session_extraction_stats(session_id, service, _ctx)
return Response(status="ok", result=result)
except Exception as e:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bug] (blocking) Catching bare Exception and returning NOT_FOUND swallows all error types — DB timeouts, permission errors, serialization failures, etc. are all misreported as "session not found".

Distinguish session-not-found from other failures:

try:
    result = await aggregator.get_session_extraction_stats(session_id, service, _ctx)
    return Response(status="ok", result=result)
except KeyError:
    return Response(
        status="error",
        error=ErrorInfo(code="NOT_FOUND", message=f"Session not found: {session_id}"),
    )
except Exception as e:
    logger.error("Failed to get session stats for %s: %s", session_id, e)
    return Response(
        status="error",
        error=ErrorInfo(code="INTERNAL", message="Internal error retrieving session stats"),
    )

(Adjust the specific exception type to match what session.load() actually raises for missing sessions.)

mvanhorn and others added 2 commits March 18, 2026 06:51
The audio parser feature is unrelated to memory health stats and
belongs in its own PR (volcengine#707). Reverts audio.py to pre-rewrite state,
removes the unused audio_summary.yaml template and audio parser tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…or handling

- Replace per-category _query_memories_by_category with single
  _query_all_memories call, grouping by category in Python (1 DB
  round-trip instead of 8)
- Remove misleading total_vectors field (was identical to
  total_memories). Will add real vector count from VikingDB index
  stats in a follow-up
- Distinguish KeyError (session not found) from other failures in
  stats.py endpoint, returning INTERNAL_ERROR for unexpected exceptions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mvanhorn
Copy link
Contributor Author

Addressed all feedback in 898cc8e and c0d13ad:

  • Reverted the audio parser commit from this branch (it belongs in feat(parse): implement audio resource parser with Whisper transcription #707)
  • Replaced per-category queries with a single _query_all_memories call that groups by category in Python (1 DB round-trip instead of 8)
  • Removed the misleading total_vectors field (was identical to total_memories)
  • Distinguished KeyError from other exceptions in the session stats endpoint, returning INTERNAL_ERROR for unexpected failures instead of NOT_FOUND

@mvanhorn
Copy link
Contributor Author

Addressed all blocking feedback in 898cc8e and c0d13ad:

  1. Split PR: Reverted audio parser changes from this branch. The audio parser feature lives in feat(parse): implement audio resource parser with Whisper transcription #707 as a standalone PR.

  2. N+1 query: Replaced per-category _query_memories_by_category (8 identical DB round-trips) with a single _query_all_memories call that fetches once and groups by URI prefix in Python.

  3. total_vectors removed: Dropped the misleading field entirely. Will add real vector count from VikingDB index stats in a follow-up if useful.

  4. Error handling: Replaced bare Exception catch with KeyError for session-not-found, returning INTERNAL_ERROR for unexpected failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

2 participants