Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 2 additions & 4 deletions .cursor/scratchpad.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
# Scratchpad

Pylint CI: `_parse_float_param` один return + `math.isnan`; `_stringify_numeric_param`; переносы строк в views; фикстуры через `fixture(name=...)` (W0621); pgvector/settings/tasks длина строк; `tasks.delete_instance_task` сигнатура.
Index status: `count_documents` на всех vector backends + `index_coverage.get_index_coverage`, админка `/admin/graph-search/index-status/`, команда `search_index_status` с таблицей покрытия. Searcher: если у `graph.invoke()` нет ключа `final_results`, вызывается `postprocess_results_node` (LangGraph + dict state).

DONE — `pylint $(git ls-files '*.py')` 10/10; pytest по затронутым модулям 16 passed. Полный pytest: 2 fail в `test_langgraph_search` (Searcher + LANGGRAPH), вне этого PR.

Prerelease **0.3.0a1**: `setup.cfg` + `CHANGELOG.md`; `python -m build` → `dist/*.whl` и `dist/*.tar.gz` (папка в `.gitignore`).
DONE — полный pytest 87 passed, 1 skipped; ruff на изменённых файлах — ok.
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,26 @@ project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [0.3.1a1] — 2026-05-19

**Pre-release** of the **0.3.1** line. Install for smoke tests:

`pip install --pre django-graph-search==0.3.1a1`

### Added
- **Admin index coverage:** page `/admin/graph-search/index-status/` shows DB row counts
vs vector-store document counts per configured model (metadata `model`), overall
percentage, and static progress bars (no auto-refresh). Link from the existing
Graph Search admin page.
- **`count_documents(filters)`** on all built-in vector backends (ChromaDB, FAISS,
Qdrant, pgvector) plus coverage output in the `search_index_status` management
command.

### Fixed
- **LangGraph + `graph.invoke()`:** when the compiled graph omits `final_results`
from the returned dict, `Searcher` runs `postprocess_results_node` so search
results are not empty.

## [0.3.0a1] — 2026-05-18

**Pre-release** of the upcoming **0.3.0** line. Install for smoke tests:
Expand Down Expand Up @@ -182,6 +202,7 @@ and signal handlers behave exactly as before.
- REST endpoints `/api/search/` and `/api/search/similar/<model>/<pk>/`.
- `build_search_index` management command.

[0.3.1a1]: https://github.com/svalench/django_graph_search/releases/tag/v0.3.1a1
[0.3.0a1]: https://github.com/svalench/django_graph_search/releases/tag/v0.3.0a1
[0.2.0]: https://github.com/svalench/django_graph_search/releases/tag/v0.2.0
[0.1.2]: https://github.com/svalench/django_graph_search/releases/tag/v0.1.2
21 changes: 21 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,27 @@ pip install django-graph-search[cohere]
pip install django-graph-search[all]
```

## What's new in 0.3 (pre-release **0.3.1a1**)

This line is a **pre-release** for smoke-testing packaging and integrations. Install with:

```bash
pip install --pre django-graph-search==0.3.1a1
```

Highlights vs **0.2.0** (full detail in [CHANGELOG.md](CHANGELOG.md)):

| Area | Change |
|------|--------|
| **REST hits** | Each result includes `score` (0.0–1.0) and `text`. Optional `min_score` query parameter filters weak matches; responses may include `min_score_applied`. |
| **Indexing** | `weight_fields` is always honored, including with `fields: "__all__"`; weight `0.0` drops a field from indexed text. |
| **Async signals** | `ASYNC_INDEXING` (Celery, `thread`, or django-q) plus `django_graph_search.tasks` so `AUTO_INDEX` can avoid blocking the request thread. |
| **Backends / embeddings** | **Pgvector** backend (`[pgvector]`). **OpenAI** / **Cohere** embedding backends (`[openai]`, `[cohere]`). |
| **Scores** | ChromaDB / FAISS / Qdrant normalize distances to similarity scores in 0–1 for consistent API output. |
| **Security / API** | Optional `GRAPH_SEARCH["API"]`: `PERMISSION_CLASSES`, `THROTTLE_CLASSES`, `THROTTLE_RATES`, `REQUIRE_AUTHENTICATION` via `django_graph_search.permissions` (defaults keep behaviour open). |
| **Validation** | Invalid or negative `limit` on search, streaming, conversational, and similar endpoints returns **400** (not 500); values above 1000 are clamped with a log warning. |
| **Fixes** | ChromaDB cosine metadata and distance mapping; file delta cache TTL and `purge_search_cache`; conversational in-memory registry + `RuntimeWarning` when `DEBUG` is false. |

## Quick Start (5 minutes)

### 1. Add to INSTALLED_APPS
Expand Down
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[metadata]
name = django-graph-search
version = 0.3.0a1
version = 0.3.1a1
description = Vector search for Django models with graph relations, optional LangGraph pipeline, conversational search, smart indexing and streaming.
long_description = file: README.md
long_description_content_type = text/markdown
Expand Down
19 changes: 18 additions & 1 deletion src/django_graph_search/admin.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from django.template.response import TemplateResponse
from django.urls import path

from .index_coverage import get_index_coverage
from .searcher import Searcher
from .settings import get_settings

Expand All @@ -29,17 +30,33 @@ def graph_search_view(request):
return TemplateResponse(request, "django_graph_search/admin/search.html", context)


def graph_search_index_status_view(request):
"""Статичный снимок покрытия индекса (без автообновления)."""
report = get_index_coverage()
context = dict(
admin.site.each_context(request),
title="Статус индексации",
report=report,
)
return TemplateResponse(request, "django_graph_search/admin/index_status.html", context)


def _inject_admin_urls(admin_site):
original_get_urls = admin_site.get_urls

def get_urls():
urls = original_get_urls()
custom = [
path(
"graph-search/index-status/",
admin_site.admin_view(graph_search_index_status_view),
name="graph-search-index-status",
),
path(
"graph-search/",
admin_site.admin_view(graph_search_view),
name="graph-search",
)
),
]
return custom + urls

Expand Down
6 changes: 6 additions & 0 deletions src/django_graph_search/backends/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,3 +42,9 @@ def delete(self, doc_ids: Iterable[str]) -> None:
def clear_collection(self) -> None:
raise NotImplementedError

@abstractmethod
def count_documents(self, filters: Optional[Dict[str, Any]] = None) -> int:
"""Число документов в коллекции; при filters — только совпадение
по всем ключам (как в search)."""
raise NotImplementedError

7 changes: 7 additions & 0 deletions src/django_graph_search/backends/chromadb.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,3 +90,10 @@ def delete(self, doc_ids: Iterable[str]) -> None:
def clear_collection(self) -> None:
self.collection.delete(where={})

def count_documents(self, filters: Optional[Dict[str, Any]] = None) -> int:
if filters:
data = self.collection.get(where=filters, include=[])
ids = data.get("ids") or []
return len(ids)
return int(self.collection.count())

7 changes: 7 additions & 0 deletions src/django_graph_search/backends/faiss.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,13 @@ def clear_collection(self) -> None:
self._metas = []
self._embeddings = []

def count_documents(self, filters: Optional[Dict[str, Any]] = None) -> int:
if not self._metas:
return 0
if filters is None:
return len(self._metas)
return sum(1 for m in self._metas if self._match_filters(m, filters))

def _match_filters(self, metadata: Dict[str, Any], filters: Dict[str, Any]) -> bool:
for key, value in filters.items():
if metadata.get(key) != value:
Expand Down
15 changes: 15 additions & 0 deletions src/django_graph_search/backends/pgvector.py
Original file line number Diff line number Diff line change
Expand Up @@ -184,3 +184,18 @@ def clear_collection(self) -> None:
conn = connections[self.using]
with conn.cursor() as cursor:
cursor.execute(f"DELETE FROM {tbl};")

def count_documents(self, filters: Optional[Dict[str, Any]] = None) -> int:
self._ensure_table()
tbl = self.table_name
conn = connections[self.using]
if filters:
sql = f"SELECT COUNT(*) FROM {tbl} WHERE metadata @> %s::jsonb"
params: List[Any] = [json.dumps(filters)]
else:
sql = f"SELECT COUNT(*) FROM {tbl}"
params = []
with conn.cursor() as cursor:
cursor.execute(sql, params)
row = cursor.fetchone()
return int(row[0]) if row else 0
18 changes: 18 additions & 0 deletions src/django_graph_search/backends/qdrant.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,3 +89,21 @@ def delete(self, doc_ids: Iterable[str]) -> None:
def clear_collection(self) -> None:
self.client.delete_collection(collection_name=self.collection_name)

def count_documents(self, filters: Optional[Dict[str, Any]] = None) -> int:
if not self.client.collection_exists(self.collection_name):
return 0
query_filter = None
if filters:
conditions = [
self.qmodels.FieldCondition(
key=key, match=self.qmodels.MatchValue(value=value)
)
for key, value in filters.items()
]
query_filter = self.qmodels.Filter(must=conditions)
result = self.client.count(
collection_name=self.collection_name,
count_filter=query_filter,
)
return int(result.count)

96 changes: 96 additions & 0 deletions src/django_graph_search/index_coverage.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
"""Покрытие индекса: строки в БД vs точки в векторном хранилище по metadata.model."""
from __future__ import annotations

from dataclasses import dataclass
from typing import TYPE_CHECKING, List, Optional

from django.apps import apps
from django.utils.module_loading import import_string

from .settings import GraphSearchConfig, get_settings

if TYPE_CHECKING:
from .backends.base import BaseVectorStore


@dataclass(frozen=True)
class IndexCoverageRow:
"""Одна настроенная модель: сколько строк в ORM и сколько точек с тем же model label."""

model_label: str
db_count: int
indexed_count: int
# Доля проиндексированных относительно БД (может быть >100 при «хвостах» в индексе)
percent: float
# Для полоски: не выше 100
bar_percent: int


@dataclass(frozen=True)
class IndexCoverageReport:
rows: List[IndexCoverageRow]
total_db: int
total_indexed: int
overall_percent: float
overall_bar_percent: int
vector_store_backend: str


def get_index_coverage(
config: Optional[GraphSearchConfig] = None,
*,
vector_store: Optional["BaseVectorStore"] = None,
) -> IndexCoverageReport:
"""
Снимок на момент вызова (без автообновления).

При db_count == 0 считаем покрытие тривиально полным (100%): индексировать нечего.
"""
cfg = config or get_settings()
if vector_store is None:
backend_cls = import_string(cfg.vector_store.backend)
vector_store = backend_cls(**cfg.vector_store.options)

rows: List[IndexCoverageRow] = []
total_db = 0
total_indexed = 0

for model_cfg in cfg.models:
app_label, model_name = model_cfg.model.split(".", 1)
model_cls = apps.get_model(app_label, model_name)
label = model_cls._meta.label
db_count = model_cls.objects.count()
indexed_count = vector_store.count_documents({"model": label})
total_db += db_count
total_indexed += indexed_count

if db_count == 0:
percent = 100.0
else:
percent = 100.0 * indexed_count / db_count
bar_percent = max(0, min(100, int(round(percent))))

rows.append(
IndexCoverageRow(
model_label=label,
db_count=db_count,
indexed_count=indexed_count,
percent=percent,
bar_percent=bar_percent,
)
)

if total_db == 0:
overall_percent = 100.0
else:
overall_percent = 100.0 * total_indexed / total_db
overall_bar_percent = max(0, min(100, int(round(overall_percent))))

return IndexCoverageReport(
rows=rows,
total_db=total_db,
total_indexed=total_indexed,
overall_percent=overall_percent,
overall_bar_percent=overall_bar_percent,
vector_store_backend=cfg.vector_store.backend,
)
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
from django.core.management.base import BaseCommand

from ...index_coverage import get_index_coverage
from ...settings import get_settings


class Command(BaseCommand):
help = "Show configured vector search index settings."
help = "Show configured vector search index settings and index coverage (DB vs store)."

def handle(self, *args, **options):
config = get_settings()
Expand All @@ -17,3 +18,15 @@ def handle(self, *args, **options):
for model_cfg in config.models:
self.stdout.write(f" - {model_cfg.model} (fields: {', '.join(model_cfg.fields)})")

report = get_index_coverage(config=config)
self.stdout.write("")
self.stdout.write(
f"Coverage: {report.overall_percent:.1f}% "
f"({report.total_indexed} indexed / {report.total_db} in DB)"
)
self.stdout.write(f"{'Model':<40} {'DB':>8} {'Indexed':>10} {'%':>8}")
for row in report.rows:
self.stdout.write(
f"{row.model_label:<40} {row.db_count:>8} {row.indexed_count:>10} "
f"{row.percent:>7.1f}"
)
6 changes: 6 additions & 0 deletions src/django_graph_search/searcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,12 @@ def _search_via_graph(
"rerank_top_k": self.config.langgraph.rerank_top_k,
}
out = graph.invoke(state)
# LangGraph + StateGraph(dict): invoke() может не вернуть ключ final_results,
# хотя узел postprocess отработал (см. stream). Добираем тем же постпроцессом.
if "final_results" not in out:
from .langgraph_agent import postprocess_results_node

out = postprocess_results_node(dict(out))
results = out.get("final_results") or []
return [self._format_result(item) for item in results]

Expand Down
Loading
Loading