Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/container.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ jobs:
- name: Run Trivy vulnerability scanner
continue-on-error: true
if: github.event_name != 'pull_request'
uses: aquasecurity/trivy-action@0.34.1
uses: aquasecurity/trivy-action@0.35.0
with:
image-ref: ${{ env.QUAY_IMAGE }}:latest
format: "table"
Expand Down
5 changes: 5 additions & 0 deletions gourmand-exceptions.toml
Original file line number Diff line number Diff line change
Expand Up @@ -122,3 +122,8 @@ justification = "_store_one is called from both index_file (sync) and index_path
check = "verbose_comments"
path = "src/mcp_trove_crunchtools/indexer.py"
justification = "Phase comments (Phase 1b, Phase 2) document the concurrent pipeline stages for the async indexer."

[[exceptions]]
check = "lint_suppression"
path = "src/mcp_trove_crunchtools/database.py"
justification = "noqa: S608 on query_errors() f-string SQL — WHERE clause is built from hardcoded strings only, all user values are bound via params tuple."
2 changes: 0 additions & 2 deletions src/mcp_trove_crunchtools/database.py
Original file line number Diff line number Diff line change
Expand Up @@ -329,8 +329,6 @@ def log_run_error(run_id: int, error_message: str) -> None:
)


# --- Per-file error tracking ---

_TRANSIENT_PATTERNS = (
"connection reset",
"dns",
Expand Down
1 change: 0 additions & 1 deletion src/mcp_trove_crunchtools/tools/status.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,6 @@ async def trove_quality(
resolved_filter: bool | None = None if show_resolved else False
errors = db.query_errors(resolved=resolved_filter, path=path, limit=limit)

# Compute aggregate counts across all errors (not just the page returned)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While this PR removes a verbose comment, the underlying implementation for computing aggregate counts has a significant performance issue and a potential bug. Fetching up to 10,000 error records just to count them in Python is inefficient and will report incorrect statistics if the actual number of errors exceeds this limit.

A more scalable and performant approach is to compute these aggregates directly in the database. I recommend adding new functions to database.py to handle this, using SQL's COUNT(*) and GROUP BY. This avoids transferring large amounts of data and ensures the counts are always correct.

For example, you could replace the fetch-and-process logic with two more efficient queries:

  1. To get total and resolved counts:
SELECT COUNT(*) AS total, SUM(resolved) AS resolved_count
FROM index_errors
WHERE path LIKE ?; -- Optional path filter
  1. To get counts by error type:
SELECT error_type, COUNT(*)
FROM index_errors
WHERE path LIKE ? -- Optional path filter
GROUP BY error_type;

This would be a valuable improvement for the tool's robustness.

all_errors = db.query_errors(resolved=None, path=path, limit=10_000)
total = len(all_errors)
resolved_count = sum(1 for e in all_errors if e["resolved"])
Expand Down
2 changes: 0 additions & 2 deletions tests/test_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -329,13 +329,11 @@ async def test_quality_after_error(self, in_memory_db: sqlite3.Connection) -> No
assert result["by_type"]["permanent"] == 1
assert len(result["errors"]) == 2

# Resolve one error and verify
test_db.resolve_errors("/nonexistent/test/bad.pdf")
result = await trove_quality(show_resolved=True)
assert result["resolved"] == 1
assert result["unresolved"] == 1

# Default (show_resolved=False) should only return unresolved
result = await trove_quality()
assert len(result["errors"]) == 1
assert result["errors"][0]["path"] == "/nonexistent/test/corrupt.pdf"
Loading