diff --git a/.cursor/scratchpad.md b/.cursor/scratchpad.md index f2e0a0a..6a2843e 100644 --- a/.cursor/scratchpad.md +++ b/.cursor/scratchpad.md @@ -1,7 +1,5 @@ # Scratchpad -Pylint CI: `_parse_float_param` один return + `math.isnan`; `_stringify_numeric_param`; переносы строк в views; фикстуры через `fixture(name=...)` (W0621); pgvector/settings/tasks длина строк; `tasks.delete_instance_task` сигнатура. +Index status: `count_documents` на всех vector backends + `index_coverage.get_index_coverage`, админка `/admin/graph-search/index-status/`, команда `search_index_status` с таблицей покрытия. Searcher: если у `graph.invoke()` нет ключа `final_results`, вызывается `postprocess_results_node` (LangGraph + dict state). -DONE — `pylint $(git ls-files '*.py')` 10/10; pytest по затронутым модулям 16 passed. Полный pytest: 2 fail в `test_langgraph_search` (Searcher + LANGGRAPH), вне этого PR. - -Prerelease **0.3.0a1**: `setup.cfg` + `CHANGELOG.md`; `python -m build` → `dist/*.whl` и `dist/*.tar.gz` (папка в `.gitignore`). +DONE — полный pytest 87 passed, 1 skipped; ruff на изменённых файлах — ok. diff --git a/CHANGELOG.md b/CHANGELOG.md index 94b14ca..c3aceb8 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,26 @@ project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [Unreleased] +## [0.3.1a1] — 2026-05-19 + +**Pre-release** of the **0.3.1** line. Install for smoke tests: + +`pip install --pre django-graph-search==0.3.1a1` + +### Added +- **Admin index coverage:** page `/admin/graph-search/index-status/` shows DB row counts + vs vector-store document counts per configured model (metadata `model`), overall + percentage, and static progress bars (no auto-refresh). Link from the existing + Graph Search admin page. +- **`count_documents(filters)`** on all built-in vector backends (ChromaDB, FAISS, + Qdrant, pgvector) plus coverage output in the `search_index_status` management + command. + +### Fixed +- **LangGraph + `graph.invoke()`:** when the compiled graph omits `final_results` + from the returned dict, `Searcher` runs `postprocess_results_node` so search + results are not empty. + ## [0.3.0a1] — 2026-05-18 **Pre-release** of the upcoming **0.3.0** line. Install for smoke tests: @@ -182,6 +202,7 @@ and signal handlers behave exactly as before. - REST endpoints `/api/search/` and `/api/search/similar///`. - `build_search_index` management command. +[0.3.1a1]: https://github.com/svalench/django_graph_search/releases/tag/v0.3.1a1 [0.3.0a1]: https://github.com/svalench/django_graph_search/releases/tag/v0.3.0a1 [0.2.0]: https://github.com/svalench/django_graph_search/releases/tag/v0.2.0 [0.1.2]: https://github.com/svalench/django_graph_search/releases/tag/v0.1.2 diff --git a/README.md b/README.md index 711960b..1c7ce70 100644 --- a/README.md +++ b/README.md @@ -45,6 +45,27 @@ pip install django-graph-search[cohere] pip install django-graph-search[all] ``` +## What's new in 0.3 (pre-release **0.3.1a1**) + +This line is a **pre-release** for smoke-testing packaging and integrations. Install with: + +```bash +pip install --pre django-graph-search==0.3.1a1 +``` + +Highlights vs **0.2.0** (full detail in [CHANGELOG.md](CHANGELOG.md)): + +| Area | Change | +|------|--------| +| **REST hits** | Each result includes `score` (0.0–1.0) and `text`. Optional `min_score` query parameter filters weak matches; responses may include `min_score_applied`. | +| **Indexing** | `weight_fields` is always honored, including with `fields: "__all__"`; weight `0.0` drops a field from indexed text. | +| **Async signals** | `ASYNC_INDEXING` (Celery, `thread`, or django-q) plus `django_graph_search.tasks` so `AUTO_INDEX` can avoid blocking the request thread. | +| **Backends / embeddings** | **Pgvector** backend (`[pgvector]`). **OpenAI** / **Cohere** embedding backends (`[openai]`, `[cohere]`). | +| **Scores** | ChromaDB / FAISS / Qdrant normalize distances to similarity scores in 0–1 for consistent API output. | +| **Security / API** | Optional `GRAPH_SEARCH["API"]`: `PERMISSION_CLASSES`, `THROTTLE_CLASSES`, `THROTTLE_RATES`, `REQUIRE_AUTHENTICATION` via `django_graph_search.permissions` (defaults keep behaviour open). | +| **Validation** | Invalid or negative `limit` on search, streaming, conversational, and similar endpoints returns **400** (not 500); values above 1000 are clamped with a log warning. | +| **Fixes** | ChromaDB cosine metadata and distance mapping; file delta cache TTL and `purge_search_cache`; conversational in-memory registry + `RuntimeWarning` when `DEBUG` is false. | + ## Quick Start (5 minutes) ### 1. Add to INSTALLED_APPS diff --git a/setup.cfg b/setup.cfg index b223f62..625047d 100644 --- a/setup.cfg +++ b/setup.cfg @@ -1,6 +1,6 @@ [metadata] name = django-graph-search -version = 0.3.0a1 +version = 0.3.1a1 description = Vector search for Django models with graph relations, optional LangGraph pipeline, conversational search, smart indexing and streaming. long_description = file: README.md long_description_content_type = text/markdown diff --git a/src/django_graph_search/admin.py b/src/django_graph_search/admin.py index 375e9b0..1994afb 100644 --- a/src/django_graph_search/admin.py +++ b/src/django_graph_search/admin.py @@ -4,6 +4,7 @@ from django.template.response import TemplateResponse from django.urls import path +from .index_coverage import get_index_coverage from .searcher import Searcher from .settings import get_settings @@ -29,17 +30,33 @@ def graph_search_view(request): return TemplateResponse(request, "django_graph_search/admin/search.html", context) +def graph_search_index_status_view(request): + """Статичный снимок покрытия индекса (без автообновления).""" + report = get_index_coverage() + context = dict( + admin.site.each_context(request), + title="Статус индексации", + report=report, + ) + return TemplateResponse(request, "django_graph_search/admin/index_status.html", context) + + def _inject_admin_urls(admin_site): original_get_urls = admin_site.get_urls def get_urls(): urls = original_get_urls() custom = [ + path( + "graph-search/index-status/", + admin_site.admin_view(graph_search_index_status_view), + name="graph-search-index-status", + ), path( "graph-search/", admin_site.admin_view(graph_search_view), name="graph-search", - ) + ), ] return custom + urls diff --git a/src/django_graph_search/backends/base.py b/src/django_graph_search/backends/base.py index 50adba1..4918e4d 100644 --- a/src/django_graph_search/backends/base.py +++ b/src/django_graph_search/backends/base.py @@ -42,3 +42,9 @@ def delete(self, doc_ids: Iterable[str]) -> None: def clear_collection(self) -> None: raise NotImplementedError + @abstractmethod + def count_documents(self, filters: Optional[Dict[str, Any]] = None) -> int: + """Число документов в коллекции; при filters — только совпадение + по всем ключам (как в search).""" + raise NotImplementedError + diff --git a/src/django_graph_search/backends/chromadb.py b/src/django_graph_search/backends/chromadb.py index 0a8bd91..9240cbc 100644 --- a/src/django_graph_search/backends/chromadb.py +++ b/src/django_graph_search/backends/chromadb.py @@ -90,3 +90,10 @@ def delete(self, doc_ids: Iterable[str]) -> None: def clear_collection(self) -> None: self.collection.delete(where={}) + def count_documents(self, filters: Optional[Dict[str, Any]] = None) -> int: + if filters: + data = self.collection.get(where=filters, include=[]) + ids = data.get("ids") or [] + return len(ids) + return int(self.collection.count()) + diff --git a/src/django_graph_search/backends/faiss.py b/src/django_graph_search/backends/faiss.py index 323173a..764ad6e 100644 --- a/src/django_graph_search/backends/faiss.py +++ b/src/django_graph_search/backends/faiss.py @@ -100,6 +100,13 @@ def clear_collection(self) -> None: self._metas = [] self._embeddings = [] + def count_documents(self, filters: Optional[Dict[str, Any]] = None) -> int: + if not self._metas: + return 0 + if filters is None: + return len(self._metas) + return sum(1 for m in self._metas if self._match_filters(m, filters)) + def _match_filters(self, metadata: Dict[str, Any], filters: Dict[str, Any]) -> bool: for key, value in filters.items(): if metadata.get(key) != value: diff --git a/src/django_graph_search/backends/pgvector.py b/src/django_graph_search/backends/pgvector.py index cdd19e3..9f03fa2 100644 --- a/src/django_graph_search/backends/pgvector.py +++ b/src/django_graph_search/backends/pgvector.py @@ -184,3 +184,18 @@ def clear_collection(self) -> None: conn = connections[self.using] with conn.cursor() as cursor: cursor.execute(f"DELETE FROM {tbl};") + + def count_documents(self, filters: Optional[Dict[str, Any]] = None) -> int: + self._ensure_table() + tbl = self.table_name + conn = connections[self.using] + if filters: + sql = f"SELECT COUNT(*) FROM {tbl} WHERE metadata @> %s::jsonb" + params: List[Any] = [json.dumps(filters)] + else: + sql = f"SELECT COUNT(*) FROM {tbl}" + params = [] + with conn.cursor() as cursor: + cursor.execute(sql, params) + row = cursor.fetchone() + return int(row[0]) if row else 0 diff --git a/src/django_graph_search/backends/qdrant.py b/src/django_graph_search/backends/qdrant.py index 080df44..91d5187 100644 --- a/src/django_graph_search/backends/qdrant.py +++ b/src/django_graph_search/backends/qdrant.py @@ -89,3 +89,21 @@ def delete(self, doc_ids: Iterable[str]) -> None: def clear_collection(self) -> None: self.client.delete_collection(collection_name=self.collection_name) + def count_documents(self, filters: Optional[Dict[str, Any]] = None) -> int: + if not self.client.collection_exists(self.collection_name): + return 0 + query_filter = None + if filters: + conditions = [ + self.qmodels.FieldCondition( + key=key, match=self.qmodels.MatchValue(value=value) + ) + for key, value in filters.items() + ] + query_filter = self.qmodels.Filter(must=conditions) + result = self.client.count( + collection_name=self.collection_name, + count_filter=query_filter, + ) + return int(result.count) + diff --git a/src/django_graph_search/index_coverage.py b/src/django_graph_search/index_coverage.py new file mode 100644 index 0000000..3bf4425 --- /dev/null +++ b/src/django_graph_search/index_coverage.py @@ -0,0 +1,96 @@ +"""Покрытие индекса: строки в БД vs точки в векторном хранилище по metadata.model.""" +from __future__ import annotations + +from dataclasses import dataclass +from typing import TYPE_CHECKING, List, Optional + +from django.apps import apps +from django.utils.module_loading import import_string + +from .settings import GraphSearchConfig, get_settings + +if TYPE_CHECKING: + from .backends.base import BaseVectorStore + + +@dataclass(frozen=True) +class IndexCoverageRow: + """Одна настроенная модель: сколько строк в ORM и сколько точек с тем же model label.""" + + model_label: str + db_count: int + indexed_count: int + # Доля проиндексированных относительно БД (может быть >100 при «хвостах» в индексе) + percent: float + # Для полоски: не выше 100 + bar_percent: int + + +@dataclass(frozen=True) +class IndexCoverageReport: + rows: List[IndexCoverageRow] + total_db: int + total_indexed: int + overall_percent: float + overall_bar_percent: int + vector_store_backend: str + + +def get_index_coverage( + config: Optional[GraphSearchConfig] = None, + *, + vector_store: Optional["BaseVectorStore"] = None, +) -> IndexCoverageReport: + """ + Снимок на момент вызова (без автообновления). + + При db_count == 0 считаем покрытие тривиально полным (100%): индексировать нечего. + """ + cfg = config or get_settings() + if vector_store is None: + backend_cls = import_string(cfg.vector_store.backend) + vector_store = backend_cls(**cfg.vector_store.options) + + rows: List[IndexCoverageRow] = [] + total_db = 0 + total_indexed = 0 + + for model_cfg in cfg.models: + app_label, model_name = model_cfg.model.split(".", 1) + model_cls = apps.get_model(app_label, model_name) + label = model_cls._meta.label + db_count = model_cls.objects.count() + indexed_count = vector_store.count_documents({"model": label}) + total_db += db_count + total_indexed += indexed_count + + if db_count == 0: + percent = 100.0 + else: + percent = 100.0 * indexed_count / db_count + bar_percent = max(0, min(100, int(round(percent)))) + + rows.append( + IndexCoverageRow( + model_label=label, + db_count=db_count, + indexed_count=indexed_count, + percent=percent, + bar_percent=bar_percent, + ) + ) + + if total_db == 0: + overall_percent = 100.0 + else: + overall_percent = 100.0 * total_indexed / total_db + overall_bar_percent = max(0, min(100, int(round(overall_percent)))) + + return IndexCoverageReport( + rows=rows, + total_db=total_db, + total_indexed=total_indexed, + overall_percent=overall_percent, + overall_bar_percent=overall_bar_percent, + vector_store_backend=cfg.vector_store.backend, + ) diff --git a/src/django_graph_search/management/commands/search_index_status.py b/src/django_graph_search/management/commands/search_index_status.py index 323cd0d..b886804 100644 --- a/src/django_graph_search/management/commands/search_index_status.py +++ b/src/django_graph_search/management/commands/search_index_status.py @@ -1,10 +1,11 @@ from django.core.management.base import BaseCommand +from ...index_coverage import get_index_coverage from ...settings import get_settings class Command(BaseCommand): - help = "Show configured vector search index settings." + help = "Show configured vector search index settings and index coverage (DB vs store)." def handle(self, *args, **options): config = get_settings() @@ -17,3 +18,15 @@ def handle(self, *args, **options): for model_cfg in config.models: self.stdout.write(f" - {model_cfg.model} (fields: {', '.join(model_cfg.fields)})") + report = get_index_coverage(config=config) + self.stdout.write("") + self.stdout.write( + f"Coverage: {report.overall_percent:.1f}% " + f"({report.total_indexed} indexed / {report.total_db} in DB)" + ) + self.stdout.write(f"{'Model':<40} {'DB':>8} {'Indexed':>10} {'%':>8}") + for row in report.rows: + self.stdout.write( + f"{row.model_label:<40} {row.db_count:>8} {row.indexed_count:>10} " + f"{row.percent:>7.1f}" + ) diff --git a/src/django_graph_search/searcher.py b/src/django_graph_search/searcher.py index 4402939..bde8212 100644 --- a/src/django_graph_search/searcher.py +++ b/src/django_graph_search/searcher.py @@ -132,6 +132,12 @@ def _search_via_graph( "rerank_top_k": self.config.langgraph.rerank_top_k, } out = graph.invoke(state) + # LangGraph + StateGraph(dict): invoke() может не вернуть ключ final_results, + # хотя узел postprocess отработал (см. stream). Добираем тем же постпроцессом. + if "final_results" not in out: + from .langgraph_agent import postprocess_results_node + + out = postprocess_results_node(dict(out)) results = out.get("final_results") or [] return [self._format_result(item) for item in results] diff --git a/src/django_graph_search/static/django_graph_search/css/index_status.css b/src/django_graph_search/static/django_graph_search/css/index_status.css new file mode 100644 index 0000000..1821b3c --- /dev/null +++ b/src/django_graph_search/static/django_graph_search/css/index_status.css @@ -0,0 +1,69 @@ +.index-status { + max-width: 960px; +} + +.index-status__nav { + margin-bottom: 12px; +} + +.index-status__meta { + color: #666; + margin-bottom: 20px; +} + +.index-status__overall { + margin-bottom: 28px; + padding: 16px 20px; + border: 1px solid #ddd; + border-radius: 4px; + background: var(--body-bg, #fafafa); +} + +.index-status__overall-number { + font-size: 2.25rem; + font-weight: 700; + line-height: 1.2; + margin: 0 0 8px; +} + +.index-status__overall-caption { + margin: 0 0 12px; + color: #444; +} + +.index-status__table { + width: 100%; + border-collapse: collapse; +} + +.index-status__table th, +.index-status__table td { + border: 1px solid #ddd; + padding: 8px 10px; + text-align: left; + vertical-align: middle; +} + +.index-status__cell-bar { + min-width: 140px; + width: 40%; +} + +.index-status__bar { + height: 10px; + background: #e8e8e8; + border-radius: 5px; + overflow: hidden; +} + +.index-status__bar--large { + height: 14px; + max-width: 480px; +} + +.index-status__bar-fill { + height: 100%; + background: #417690; + border-radius: 5px; + min-width: 0; +} diff --git a/src/django_graph_search/static/django_graph_search/css/search.css b/src/django_graph_search/static/django_graph_search/css/search.css index bbfced3..2aa93e9 100644 --- a/src/django_graph_search/static/django_graph_search/css/search.css +++ b/src/django_graph_search/static/django_graph_search/css/search.css @@ -2,6 +2,10 @@ max-width: 960px; } +.graph-search__nav { + margin-bottom: 12px; +} + .graph-search__form { margin-bottom: 16px; } diff --git a/src/django_graph_search/templates/django_graph_search/admin/index_status.html b/src/django_graph_search/templates/django_graph_search/admin/index_status.html new file mode 100644 index 0000000..34f5817 --- /dev/null +++ b/src/django_graph_search/templates/django_graph_search/admin/index_status.html @@ -0,0 +1,56 @@ +{% extends "admin/base_site.html" %} +{% load static %} + +{% block extrastyle %} + +{% endblock %} + +{% block content %} +
+ +

Статус индексации

+

Vector store: {{ report.vector_store_backend }}

+ +
+
{{ report.overall_percent|floatformat:1 }}%
+

+ В индексе: {{ report.total_indexed }} — в БД (сумма по моделям): {{ report.total_db }} +

+ +
+ + + + + + + + + + + + + {% for row in report.rows %} + + + + + + + + {% endfor %} + +
МодельВ БДВ индексе%
{{ row.model_label }}{{ row.db_count }}{{ row.indexed_count }}{{ row.percent|floatformat:1 }}% + +
+
+{% endblock %} diff --git a/src/django_graph_search/templates/django_graph_search/admin/search.html b/src/django_graph_search/templates/django_graph_search/admin/search.html index e3e6c18..42b072a 100644 --- a/src/django_graph_search/templates/django_graph_search/admin/search.html +++ b/src/django_graph_search/templates/django_graph_search/admin/search.html @@ -7,6 +7,9 @@ {% block content %}